Repository logo
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.
Repository logo

Repositorio Institucional de la Universidad de Murcia

Repository logoRepository logo
  • Communities & Collections
  • All of DSpace
  • Statistics
  • menu.section.collectors
  • menu.section.acerca
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Svenska
  • Türkçe
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Log In
    or
    New user? Click here to register.
  1. Home
  2. Browse by Subject

Browsing by Subject "EUNACOM"

Now showing 1 - 1 of 1
Results Per Page
Sort Options
  • Loading...
    Thumbnail Image
    Publication
    Open Access
    Evaluating the Performance of DeepSeek 3, Claude Sonnet 4, and Gemini 2.5 in the Chilean Medical Licensing Examination: Observational Study.
    (Servicio de Publicaciones. Universidad de Murcia, 2025) Jerez Yañez, Oscar; Edgardo, Vicente Alberto; Silva Arroyo, Jesús; Vera Cartes, Marcos Jeremías Giovanny; Herrera Alcaíno, Alvaro Andrés; Lancellotti Guajardo, Anaís Aracelly; Sin departamento asociado
    Introduction: Artificial intelligences and their continuous improvement have revolutionized medical education, but their performance in specific evaluative contexts still requires further exploration. Methods: This study qualitatively evaluated and compared the performance of three state-of-the-art language models — Claude Sonnet 4, Gemini 2.5, and DeepSeek 3 — in simulations of the National Medical Knowledge Examination (EUNACOM) in Chile. Three mock exams with 180 questions each were used, covering various medical areas and question types, including those based on clinical cases. Results: The results show that all AI models consistently passed the exams, with Claude Sonnet 4 achieving the highest overall performance (89% accuracy) and the greatest consistency across attempts. Clinical case-based questions were answered more accurately than theoretical knowledge questions, highlighting the models' strength in contextual clinical reasoning. Claude excelled in Internal Medicine and Psychiatry, DeepSeek in Surgery, and Gemini demonstrated balanced performance. However, specific gaps were identified in areas such as Public Health and clinical follow-up, suggesting the need for model-specific adjustments. Conclusion: The findings support the educational potential of these tools but also emphasize the importance of their ethical, supervised, and complementary use alongside traditional medical training. This study contributes to understanding the emerging role of artificial intelligence in professional assessments, as well as its limitations and opportunities within the Chilean medical context.

DSpace software copyright © 2002-2026 LYRASIS

  • Cookie settings
  • Accessibility
  • Send Feedback