Mini Logo
AIDAS Lab Logo

Medical Knowledge Corpus Dataset (for fine-tuning Medical LLMs)

May 1, 2024

Project Thumbnail

05/2024 – 12/2024 | Hyper-scale AI Ecosystem Expansion Project, NIA

  • Constructed and released a high-quality bilingual (Korean-English) medical corpus of 200 million tokens, focusing on professional medical knowledge and essential medical domains (obstetrics & gynecology, pediatrics, emergency medicine)
  • Developed two lightweight, domain-specific medical AI models based on large language models using Q&A data authored and reviewed by medical experts
  • The curated dataset is intended for fine-tuning medical AI models, aiming to enhance natural language processing performance in the medical domain
  • Applicable to various use cases such as clinical decision support, education, and research, laying the groundwork to strengthen Korea’s competitiveness in medical AI

Collaborating organizations