Mon Dec 26 2022 - Top Trending AI Papers

Wed Dec 28 2022

Mon Dec 26 2022

Large Language Models Encode Clinical Knowledge

Medicine

Natural Language Processing, Machine Learning

Medical question answering, clinical knowledge and reasoning, AI in medicine

The paper presents MultiMedQA, a benchmark combining six existing open question answering datasets spanning professional medical exams, research, and consumer queries, to evaluate large language models' (LLMs) clinical knowledge. It proposes a framework for human evaluation of model answers along multiple axes, and evaluates PaLM and its instruction-tuned variant, Flan-PaLM, on MultiMedQA. Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset, but human evaluation reveals key gaps in Flan-PaLM responses. The paper introduces instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars, and shows that comprehension, recall of knowledge, and medical reasoning improve with model scale and instruction prompt tuning.

LLMs have the potential utility in medicine, but current models have limitations in clinical applications. The framework and approaches presented in this paper can help evaluate and improve LLMs for clinical knowledge and reasoning, and facilitate the implementation of AI in medical processes and workflows.

https://arxiv.org/pdf/2212.13138.pdf

https://arxiv.org/abs/2212.13138

https://twitter.com/arankomatsuzaki/status/1607551644497244160/photo/1