EHR-Based Machine Learning Predicts Alzheimer Risk Years Before Diagnosis in US Veterans
Key Takeaways:
- Analysis of longitudinal Veterans Health Administration (VHA) electronic health records (EHRs) shows Alzheimer disease (AD)–related symptoms appear in clinical notes up to 15 years before diagnosis.
- Keyword-based machine learning models using unstructured clinical notes outperformed models relying only on structured EHR data, predicting AD risk up to 10 years in advance.
- Early signals included neuropsychiatric symptoms, physiological changes, and functional decline, with patterns consistent across age, sex, and race/ethnicity subgroups.
A large US study using VHA EHRs suggests Alzheimer disease risk can be predicted years before diagnosis by analyzing routine clinical notes. Using machine learning and keyword-based analysis, investigators found that early cognitive and noncognitive symptoms documented during standard care visits preceded formal Alzheimer diagnoses by more than a decade.
Study Findings
Investigators analyzed longitudinal EHR data from 61 537 veterans diagnosed with Alzheimer disease and 234 105 matched controls without dementia between 2000 and 2022. Participants ranged from 45 to 103 years of age, and 98.4% were male. The study focused on unstructured clinical notes from primary care, emergency, mental health, geriatrics, neurology, and other settings.
Researchers curated 122 expert-defined keywords reflecting subjective cognitive decline and Alzheimer-related signs, including memory, language, mood, neuropsychiatric symptoms, and functional changes. Over the 15 years preceding diagnosis, Alzheimer cases showed an exponential rise in keyword mentions—from 9.4 to 57.7 per patient per year—while controls exhibited a slower, linear increase from 8.2 to 20.3.
Using random forest models, keyword-based predictors achieved an area under the receiver operating characteristic curve (AUROC) of 0.577 at 10 years before diagnosis and 0.861 a day before diagnosis. In comparison, models using structured EHR data alone achieved AUROCs of 0.497 and 0.682 at the same time points. Combining structured and keyword-derived features provided modest additional gains.
Predictive performance remained stable across hold-out VHA medical centers and demographic subgroups. Mental health and geriatric care notes showed the steepest increases in symptom documentation closer to diagnosis, while primary care notes captured early signals beginning more than a decade earlier.
Clinical Implications
For clinicians, the findings highlight the potential value of routine clinical documentation as an early warning system for Alzheimer disease. Subjective cognitive complaints, mood changes, functional decline, and neuropsychiatric symptoms—often noted during general medical visits—may signal increased risk long before formal cognitive testing or diagnosis.
The approach could support scalable, population-level risk stratification without reliance on invasive or costly biomarkers such as neuroimaging or cerebrospinal fluid testing. In primary care settings, where early symptoms are often first observed, automated screening tools embedded in EHR systems could prompt earlier referral, monitoring, and care planning for high-risk patients.
The study also underscores that nonmemory symptoms may emerge earlier than classic cognitive complaints, reinforcing the importance of holistic symptom assessment in aging patients.
According to the study authors, “signs and symptoms of early Alzheimer’s disease are reported in clinical notes many years before a clinical diagnosis is made,” and their frequency “increases the closer one is to the diagnosis.” The researchers note that a “simple keyword-based approach can capture these signals and help identify individuals at high risk of future Alzheimer’s disease,” supporting earlier detection in real-world health care settings.
Conclusion
Analysis of unstructured EHR notes from US veterans shows that Alzheimer-related symptoms are documented years before diagnosis and can be leveraged by machine learning to predict disease risk. The findings suggest routine clinical notes may offer a practical, scalable tool for earlier Alzheimer detection and intervention planning.
Reference
Li R, Berlowitz D, Mez J, et al. Early prediction of Alzheimer’s disease using longitudinal electronic health records of US military veterans. Commun Med. 2026;6(23). doi:10.1038/s43856-025-01206-w


