Title : Enhancing Thalassemia Gene Carrier Identification in Non-anemic Populations Using AI Erythrocyte Morphology Analysis and Machine Learning
Abstract:
Background:
Non-anemic thalassemia trait (TT) accounts for over 60% of TT cases in South China, yet remains challenging to identify due to absent anemia and the high cost of gold-standard genetic testing—especially in resource-limited regions. Traditional erythrocyte morphology analysis is subjective and time-consuming, limiting its use for screening.
Case presentation:
We conducted the first study leveraging AI for quantitative abnormal erythrocyte analysis to identify non-anemic TT carriers. Digital morphological data from 76 non-anemic TT carriers (69.7% α-TT) and 97 healthy controls were collected using the AI-powered Mindray MC-100i analyzer. Machine learning (ML) models were trained and validated, with external validation in 54 non-anemic TT carriers and 97 controls.
Results:
A Random Forest-based ML model (TT@Normal) was developed, with target cells, microcytes, and teardrop cells as the top three predictive features. TT@Normal exhibited exceptional performance: training/validation set metrics (AUC, sensitivity, specificity) all >94%, and external validation achieving AUC=97.65%, sensitivity=92.59%, and specificity=93.81%. It outperformed four conventional indexes (MI, EFI, GKI, RDW) in discriminative power. TT@Normal is freely accessible as an online tool (URL provided), enabling rapid screening even with rough estimation of abnormal erythrocyte percentages when automatic analyzers are unavailable.
Conclusion:
TT@Normal is the first AI/ML-driven tool for non-anemic TT carrier identification, addressing an unmet clinical need. Its high accuracy, reliance on routine erythrocyte morphology, and user-friendly online access make it a practical screening solution—particularly valuable for underdeveloped regions. Elevated target cells, microcytes, and teardrop cells warrant TT suspicion. This work advances thalassemia prevention by enabling efficient, low-cost carrier screening.
Keywords
Thalassemia; machine learning; artificial intelligence; erythrocyte; morphology

