Using Natural Language Processing for Identification of Negative Symptoms in Schizophrenia from Electronic Health Records Data in the United States: A Retrospective Cohort Study
Background: Negative symptoms (NS) in schizophrenia (blunted affect, alogia, avolition, asociality, and anhedonia) significantly contribute to long-term disability and diminished functional capacity in affected individuals.
Objectives: Use natural language processing (NLP) to identify NS in unstructured electronic health records (EHR) of patients with schizophrenia and describe their characteristics.
Methods: This retrospective cohort study used 2016–2023 patient records from the Veradigm Network EHR and included adults (≥18 years) with ≥2 schizophrenia diagnoses and 12 months of EHR activity prior to first schizophrenia diagnosis (index date). An NS key terms list was created in collaboration with medical informatics specialists and clinical experts. For the NLP component, Python scripts extracted NS-related keywords which were evaluated for false-positives using regular expressions and rule-based techniques.
Results: This study included 79,326 schizophrenia patients, with 18.9% having documented evidence of NS. Among those with 50+ notes, 41.5% documented NS versus 2.3% for 1–5 notes. Avolition was identified most often (44.0%), followed by blunted affect (41.9%), alogia (25.0%), anhedonia (22.0%), and asociality (5.4%). Patients with documented NS were slightly younger (49.3 vs 51.0 years, p < .01) than those without and had higher anxiety (18.7% vs 15.7%, p < .01), depression (19.0% vs. 17.3%, p < .01), and substance use disorder (17.7% vs 15.4%, p < .01).
Conclusions: This study successfully employed NLP to identify NS from EHR, highlights the importance of advanced data analytics in psychiatric research, and informs the need for future healthcare strategies to improve NS management for patients with schizophrenia.
Funding: Boehringer Ingelheim Pharmaceuticals, Inc.


