ChatGPT Basically Sucks at Diagnosing Patients

ChatGPT may be good for advising your workouts, but it’s got a long way to go before it replaces a doctor. A recent experiment found that the popular artificial intelligence chatbot makes wrong medical calls more often than not.

“ChatGPT in its current form is not accurate as a diagnostic tool,” the researchers behind the study, published today in the journal PLOS ONE, wrote. “ChatGPT does not necessarily give factual correctness, despite the vast amount of information it was trained on.”

In February 2023, ChatGPT was able to barely pass the United States Medical Licensing Exam with no extra specialized inputs from human trainers. Despite the program not coming close to acing the test, the researchers behind the experiment hailed the result as a “notable milestone” for AI.

However, the scientists behind the new study noted that, although passing the licensing exam demonstrated ChatGPT’s ability to answer concise medical questions, “the quality of its responses to complex medical cases remains unclear.”

To determine how well ChatGPT 3.5 performs in those more complicated cases, the researchers presented the program with 150 cases designed to challenge healthcare professionals’ diagnostic abilities. The information provided to ChatGPT included patient history, physical exam findings, and some lab or imaging results. ChatGPT was then asked to make a diagnosis or concoct an appropriate treatment plan. The researchers rated the bot’s answers on whether it gave the correct response. They also graded ChatGPT on how well it showed its work, scoring the clarity of the rationale behind a diagnosis or prescribed treatment and the relevancy of cited medical information.

While ChatGPT has been trained on hundreds of terabytes of data from across the Internet, it only got the answer right 49% of the time. It scored a bit better on the relevancy of its explanations, offering complete and relevant explanations 52% of the time. The researchers observed that, while the AI was fairly good at eliminating wrong answers, that’s not the same as making the right call in a clinical setting. “Precision and sensitivity are crucial for a diagnostic tool because missed diagnoses can lead to significant consequences for patients, such as the lack of necessary treatments or further diagnostic testing, resulting in worse health outcomes,” they wrote.

Overall, the chatbot was described as having “moderate discriminative ability between correct and incorrect diagnoses” and having a “mediocre” overall performance on the test. While ChatGPT shouldn’t be counted on to accurately diagnose patients, the researchers said it may still have relevant uses for aspiring physicians thanks to its access to huge amounts of medical data.

“In conjunction with traditional teaching methods, ChatGPT can help students bridge gaps in knowledge and simplify complex concepts by delivering instantaneous and personalized answers to clinical questions,” they wrote.

All this said, the AI might surpass human doctors in one area: A study from April 2023 found that ChatGPT was able to write more empathetic emails to patients than the real docs.

Source link