Faulty methods and ‘Frankenstein datasets’ are undermining the use of AI in COVD research
Report criticizes bad workmen, exonerates AI tools
Report criticizes bad workmen, exonerates AI tools
The power of Artificial Intelligence (AI) is being wasted in the war on COVID, according to a review published in Nature Machine Intelligence, and it’s the fault of the work methods, not the tools.
The study, by researchers from the University of Cambridge and University of Manchester, took in a sample of 2,200 papers placed in open research repositories.
It analyzed the use of machine learning techniques by AI researchers and specialists in fields like infectious disease, radiology and ontology. From this sample, 62 papers were identified for a systematic review.
Further analysis revealed a number of common procedural faults that have undermined the effectiveness of attempts to speed up COVID diagnosis with AI.
Among the faults found were a lack of external validation of training data, failures to assess models, and inattention to demographic detail. All of these factors can affect the integrity of the training data used for machine learning.
Must do better
Of the 62 papers included in the analysis, roughly half made no attempt to perform external validation of training data, did not assess model sensitivity or robustness, and did not report the demographics of people represented in training data. Many machine learning models used medical imaging data built on an inadequate range of images, low quality samples, and with scant assessment of balance – only six papers were considered at low risk of bias.
Machine learning models for novel Coronavirus diagnosis were often trained on ‘Frankenstein datasets’, containing duplicate images obtained from similar datasets. Another reported failing was that only one in five COVID-19 diagnosis or prognosis models shared their code, a practice which would allow third parties to reproduce or verify results described in literature.
The research paper acknowledged “the unique challenges researchers face when developing classical machine learning and deep learning models using imaging data.” It highlighted methodological flaws and concluded with detailed recommendations in five domains: collation of COVID-19 imaging datasets; methodological considerations for algorithm developers; the reproducibility of results in the literature; accurate descriptions of methods; and openness to peer review.
The report concluded with a criticism of overly optimistic expectations of AI: “Despite the huge efforts of researchers to develop machine learning models for COVID-19 diagnosis and prognosis, we found methodological flaws and many biases throughout the literature, leading to highly optimistic reported performance.
“In their current reported form, none of the machine learning models included in this review are likely candidates for clinical translation for the diagnosis/prognosis of COVID-19.”
About the Author
You May Also Like