AI is matching doctors at image-based diagnosis

A study looking at comparisons carried out since 2012 has produced surprising results

by Max Smolaks 26 September 2019

Research suggests deep learning algorithms have become as good at image-based medical diagnosis as human doctors – and maybe even better.

A research paper published in the medical journal Lancet Digital Health analyzed the results of third-party studies into the effectiveness of AI systems, as compared to human doctors.

Tens of thousands of such studies have been carried out, but the team from the University Hospitals Birmingham NHS Foundation Trust in the UK focused on 14 published since 2012, with the best quality data and most transparent testing methods.

It found that in aggregate, deep learning systems correctly classified a disease 87 percent of the time – compared with 86.4 percent for healthcare professionals – and correctly gave the all-clear 92.5 percent of the time, as opposed to 90.5 percent for human doctors.

“There are a lot of headlines about AI outperforming humans, but our message is that it can at best be equivalent,” Dr Xiaoxuan Liu, the lead author of the study, told The Guardian.

The authors noted that in this limited comparison, the healthcare professionals were not given access to additional patient information that they would have in the real world to help steer their diagnosis.

Another major finding of the review was that despite their huge number, few studies presented externally validated results or compared the performance of deep learning models and healthcare professionals using the same sample.

“New reporting standards that address specific challenges of deep learning could improve future studies, enabling greater confidence in the results of future evaluations of this promising technology,” the authors said.

Training algorithms to interpret medical images has long been one of the more immediate, tangible goals for AI developers. Such diagnosis works using the same principles used in facial recognition, or image classification apps like Google Lens or Samsung’s Bixby Vision.

Image recognition, along with radar, lidar and infrared sensors is also indispensable for driverless cars, so it’s not surprising that this particular application of deep learning technologies has matured quickly.

“Despite the varying quality of AI research noted in the report, we can’t overlook the great progress that’s been made in deep learning in healthcare and the potential it has to augment human skills,” commented Björn Brinne, chief AI officer at Peltarion, a specialist in neural networks.

“Real world use cases are emerging which will help with everything from diagnosis to treatment, such as this recent study that used deep learning to diagnose Celiac disease in Children with 93.4 percent accuracy.

“However, the report is correct that there are a number of urgent challenges that need to be addressed. Many deep learning projects to date have been focused on small pockets of research, which presents issues in relation to repeatability, auditability and scalability which are needed to make a global impact.

“Also, lack of skills, cost and complexity remain as barriers. For the NHS, this is a major challenge as budgets and talent are already limited.”

“The feasibility of this approach must be considered hand-in-hand with the issue of trust,” added James Duez, CEO and co-founder of automation software vendor Rainbird.

“Any AI-powered decision must have an interpretable rationale if the technology is to be trusted and scale, a limitation to many machine-learnt approaches. That’s why in most cases, a human domain expert (in this case a radiologist) must review the AI opinion and put their name to it.”

Earlier this year, the UK government announced it would spend £250 million ($303m) on artificial intelligence initiatives within the National Health Service (NHS) – including a taxpayer-funded AI Lab.