Health Care AI Limited by Reproducibility Challenges

While ML has powered up, accuracy is a concern

Helen Hwang, Contributor

February 16, 2023

2 Min Read

At a Glance

  • The effectiveness of AI in health care can be hampered by its inability to reproduce the same results.
  • Reproducibility errors were found across 17 fields, including medicine.
  • Data leakage is another problem.

The reliability of health care AI is being challenged by its ability to reproduce its results. It’s an issue that data scientists are grappling with in order to implement AI in a sector that sorely needs its capabilities, according to a recent Nature paper.

For example, lung cancer can often be detected with low-dose computed tomography (CT) scans. However, there’s a shortage of radiologists who can evaluate the millions of images produced from millions of CT tests.

Researchers sponsored a Data Science Bowl, an online competition to see if lung cancer diagnosis could be automated from chest CT scans from 1,397 patients. Five of the winning models exceeded 90% accuracy in the detection of lung nodules.

When the same algorithms were applied to a subset of the original dataset, the models demonstrated only a 60% to 70% accuracy rate. “Almost all of these award-winning models failed miserably,” said Dr. Kun-Hsing Yu, assistant professor at Harvard Medical School.

The AI community faces a reproducibility crisis, said Sayash Kapoor, a doctoral candidate at Princeton University. He found reproducibility errors in 329 studies across 17 fields, including medicine.

ML has aided treatment plans and sped up diagnosis with boosts in digital data, advancements in computer power, and algorithmic improvements. The caveat is that in health care AI, the model should be reproducible with readily available data and algorithms. Regulatory challenges and privacy concerns have hamstrung the process, said Michael Roberts, assistant professor in ML at the University of Cambridge.

Roberts reviewed 62 studies that used AI to diagnose COVID-19 and concluded that none of the models were ready to be deployed for clinical use. There were flaws in methodology, data biases, and reproducibility. Health care ML models demonstrated results below the average of other ML models.

Data “leakage” is another concern for the AI community. Data sets that are used to train a model and test the model should not have any overlap. Kapoor created a checklist for ML, just as you would see in health care, to improve data leakage.

Proprietary models also make it difficult to find errors in algorithms, said Casey Greene, professor at the University of Colorado School of Medicine in Aurora. “Given the exploding nature and how widely these things are being used, I think we need to get better more quickly than we are,” he added.

About the Author(s)

Helen Hwang

Contributor, AI Business

Helen Hwang is an award-winning journalist, author, and mechanical engineer. She writes about technology, health care, travel, and food. She's based in California.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like