Dear Editor,
I read with great interest the recent article titled “Evaluating the Performance of ChatGPT, Gemini, and Bing Compared with Resident Surgeons in the Otorhinolaryngology In-service Training Examination” published in your journal (1). The study offers valuable insights into the evolving role of large language model (LLM) in healthcare.
The study’s comparative evaluation of Artificial intelligence (AI)-driven language models with resident surgeons is both timely and significant. It highlights the fact that while LLMs, such as ChatGPT and Gemini, exhibit impressive capabilities in answering factual and guideline-based questions. However, they are still far from replacing human expertise (2), especially in highly specialized fields like otorhinolaryngology. The complexity involved in medical decision-making require not only the recall of information but also the ability to apply it in context, an area where general-purpose LLMs like ChatGPT remain limited as it depends on the input (3).
While these tools excel at providing broad and evidence-based responses, they often struggle with the subtleties of case-specific clinical reasoning (4). A summary of potential difference between the LLM and human in various aspect of healthcare is shown in Table 1. Usage of LLMs is an adjunct tool rather than replacements in healthcare education and clinical practice (5). By supporting residents in understanding core concepts, reviewing evidence-based guidelines, or simulating basic diagnostic scenarios, LLMs can serve as a valuable supplementary resource in training environments. However, the integration of AI into medical education and diagnostics must be approached with caution. AI is still some way from being able to reliably make critical healthcare decisions independently. Hence, the use of AI tools should be geared toward enhancing human decision-making rather than substituting it.