Statistics Matter
EBM & AI - challenges to confront
“Statistics emerged as a distinct discipline around the beginning of the 20th century. Dur-ing this time, fundamental concepts were developed including the use of randomization in clinical trials, hypothesis testing, p-values, likelihood ratios, Baysian analysis and decision theory. Statistics rapidly became an essential element of the applied sciences with the ad-vent of the “evidenced-based medicine” movement taking place in the 1990’s and 2000 ushering in a plethora of “clinical practice guidelines” written by various specialties. The application of statistics to medicine was proclaimed as one of the top 11 developments in medicine in the past 1000 years! Statistics and evidenced-based medicine was primed to take medicine biased by both physician performing it and patient having it done to them, and wash it from subjective contamination. The Randomized Controlled Trial that was dou-ble-blinded, placebo controlled, multi centered with a large N was placed at the top of that hierarchical data.
The problem with the EBM movement was at least two-fold: 1) RCTs of the highest caliber were fairly infrequent, very expensive, took a long time to perform and publish, and were so highly controlled to prove their efficacy, that it was unclear whether they could perform with effectiveness in “real world medicine.”; and 2) the only entities that could afford to do such high-level studies were Big Pharma which immediately raised issues with regard to conflict of interest and studies with statistical tricks (eg. combined end-points of non-equivalent outcomes) that could “bamboozle” the average physician reading the journal ar-ticles. The EBM response to this dilemma was the “meta-analysis” which in its initial debut was going to fill in the gaps of knowledge by taking many small (N) studies and giving them power of probability in a population by making the N large. The failure of the meta-analysis which has mostly taken it off the map as the new guide to statistical truth is that it was not usually clear, certainly not to the average physician reading a published meta-analysis what the quality and variability the meta-analysis conclusion contained. Experts in statistics could tell us “garbage in; garbage out” and we understood, but this would have been rarely apparent to the practicing physician. By increasing the statistical complexity of the method by which the data was found and included, the physician lost the ability to carefully critique the integrity of the data.
With regard to “real world” effectiveness, not to mention safety concerning less common adverse effects, the RCT and the meta analysis left a gaping hole. The drug or procedure once approved by the FDA often based upon a large RCT or met analysis of mixed types of trials, had to simply “monitor closely” to find out about rare adverse effects, or how the drug might be effected in unique populations or complex circumstances (female having pe-riods while on multiple medications at the same time). Many began to look for ways to more quickly and accurately introduce a drug into a “real world” scenario. This call for quicker, larger real world data paralleled the national requirements for an electronic health record and national database.
Enter AI in the form of “Big Data”, large language models, as well as concurrent i-phone (2009) and “open access journals” explosion during and after COVID-19 and now we had “real world” medicine almost instantly on our wrists, and so did patients. This massive ac-cess to all variations and shades of data made a type of “meta-analysis” of “real world data”.
The statistical down-side to this was that the data is so large and so complex that there is quite literally no-one who can sort it out and edit data that is “bad” or “too variable”. There is a good reason why RCTs had to be “controlled” and went through a “peer-review” pro-cess. Now we get immediate answers but we don’t really know where they came from or how to evaluate the scientific integrity of the large language model conclusion? Is this simply a larger version of “garbage in; garbage out” meta analysis? And won’t the large lan-guage models that create their own conclusions then become part of their own data-set— imagine a snake of data eating its own tail- cannabilization of data.
And even if we use AI to try and help sort-out and clean up the statistical “bugs” we are en-countering with large language models in medicine; large language model usage is a com-mercial application. There is no ethical grid or professional organization who can control this “cat which has already been let out of the bag”. Will we see the commercialization of opinion similar to what we experienced during the pandemic where medical hucksters could make a buck off of bad medical information? Will health data just become a subset of marketing data that your watch and the ghost in the machine leads you to?
REFERENCES
1. Hunter, D et al. “Where Medical statistics Meets artificial intelligence” (review) New Engl J Med Sept. 28, 2023; 389:13:1211-1219.
2. Concato, J. et al. “Real-World Evidence—Where Are We Now?’ New Engl J Med May 5, 2022;386 (18) 1680-1682.