This study was designed to validate use of electronic health records for diagnosing bipolar disorder and classifying control subjects. Electronic health record data were obtained from a health care system of more than 4.6 million patients spanning more than 20 years. Experienced clinicians reviewed charts to identify text features and coded data consistent or inconsistent with a diagnosis of bipolar disorder. Natural language processing was used to train a diagnostic algorithm with 95% specificity for classifying bipolar disorder.
The positive predictive value of bipolar disorder defined by natural language processing was 0.85. Coded classification based on strict filtering achieved a value of 0.79, but classifications based on less stringent criteria performed less well. No electronic health record-classified control subject received a diagnosis of bipolar disorder on the basis of direct interview. For most subphenotypes, values exceeded 0.80.
The authors concluded that semiautomated mining of electronic health records can be used to ascertain bipolar disorder patients and control subjects with high specificity and predictive value compared with diagnostic interviews.
Castro VM, Minnier J, Murphy SN, Kohane I, Churchill SE, Gainer V, Cai T, Hoffnagle AG, Dai Y, Block S, Weill R, Nadal-Vicens M, Pollastri AR, Rosenquist JN, Gorvachev S, Ongur D, Sklar P, Perlis RH, Smoller JW; International Cohort Collection for Bipolar Disorder Consortium: Validation of electronic health record phenotyping of bipolar disorder cases and controls. Amer. J. Psychiatry 172(4): 363-372 (2015).