FetalMind: Epistemic-aware Vision–Language Foundation Model for Fetal Ultrasound Interpretation

Motivations

Fetal Ultrasound interpretation is fundamentally constrained by challenges such as multi-view heterogeneity, complex fetal anatomy, and a wide spectrum of developmental abnormalities, which collectively hinder the development of robust vision–language models for downstream diagnostic and report generation tasks. Moreover, labeling fetal ultrasound data requires specialized obstetric expertise and careful cross-view correlation, making large-scale annotation both time-consuming and resource-intensive. Consequently, an important research objective is to effectively leverage existing multi-center clinical data by uncovering latent associations among views, diseases, and textual findings to enhance learning efficiency and generalization across gestational stages.

Obstetric Ultrasound Report (Deep Learning Version)

The generalized version of our obstetric ultrasound report template via deep learning, established with reference to multiple international clinical guidelines. It provides a consistent and clinically grounded format for training and evaluating deep learning systems.

EarthSynth Model

Illustration of FetalMind and GPT-5 Case Study. (Case 127858) Correct answer is skeletal dysplasia. GPT-5 misclassified it as normal, while FetalMind correctly identified skeletal dysplasia by integrating multi-view structures and blood flow features.

Citation

Please consider cite us if you find our dataset, or model is useful to you.


      @misc{he2025fetalmind,
        title={Epistemic-aware Vision-Language Foundation Model for Fetal Ultrasound Interpretation}, 
        author={Xiao He, Huangxuan Zhao, Guojia Wan, Wei Zhou, Yanxing Liu, Juhua Liu, Yongchao Xu, Yong Luo, Dacheng Tao, and Bo Du},
        year={2025},
        eprint={2505.12953},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2510.12953}, 
        }

FetalMind: Epistemic-aware Vision–Language Foundation Model for Fetal Ultrasound Interpretation

1Wuhan University, 2Nanyang Technological University

Motivations

Obstetric Ultrasound Report (Deep Learning Version)

EarthSynth Model

Citation

¹Wuhan University, ²Nanyang Technological University