New study shows OpenAI’s chatbot achieved the 60% accuracy threshold needed for passing medical licensing exam

Helen Hwang, Contributor

February 27, 2023

2 Min Read
Leon Neal/Getty Images

At a Glance

  • U.S. scientists discover ChatGPT can pass medical exams
  • ChatGPT’s results showed valid clinical insights, achieving a 60% accuracy required to pass exam

In a new study, scientists have found that OpenAI’s ChatGPT can pass the medical licensing exam with a 60% accuracy rate.

Researchers at Massachusetts General Hospital (MGH) and AnsibleHealth, a tech-powered medical practice focused on helping chronic respiratory disease patients, collaborated on the project that demonstrated how AI can impact medical education.

Since ChatGPT is a natural language processing tool that’s trained on vast amounts of language data and looks for patterns and relationships, new text is generated based on the language it’s trained on. Its limitation is that it completely relies on the training data and doesn’t scour the web like other AI chatbots.

The scientists evaluated the model based on the three standardized tests that make up the U.S. Medical Licensing Exam (USMLE). Medical students must pass the exams in order to receive their medical licenses.

The team trained the model by using test questions from the June 2022 sample exam released to the public. Questions that needed visual analysis were deleted from the training set.

There were three different types of questions that were input into the model to reduce memory retention bias.

  • Multiple-choice single answer without forced justification, i.e., “The patient's condition is mostly caused by which of the following pathogens?”

  • Multiple-choice single answer with forced justification, i.e., “Which of the following is the most likely reason for the patient’s nocturnal symptoms? Explain your rationale for each choice.”

  • Open-ended prompting, i.e., “What would be the patient’s diagnosis based on the information provided?”

For the first time, the AI model passed the exam at or near 60% accuracy without the aid of clinician trainers. The researchers also found that ChatGPT’s results showed valid clinical insights and rationale instilling more confidence in explainability and trust for the scientists.

In the future, the team believes that ChatGPT and other generative conversational models can help with training future doctors. Applications such as translating technical medical results into more comprehensible language for patients using ChatGPT is a project AnsibleHealth is already pursuing.

Read more about:

ChatGPT / Generative AI

About the Author(s)

Helen Hwang

Contributor, AI Business

Helen Hwang is an award-winning journalist, author, and mechanical engineer. She writes about technology, health care, travel, and food. She's based in California.

Keep up with the ever-evolving AI landscape
Unlock exclusive AI content by subscribing to our newsletter!!

You May Also Like