loading page

HCM-AF-Risk Model to Identify Cases and Predictors of Atrial Fibrillation in Hypertrophic Cardiomyopathy
  • +7
  • Moumita Bhattacharya,
  • Dai-Yin Lu,
  • Gabriela Greenland,
  • Hulya Yalcin,
  • Joseph Marine,
  • Jeffrey Olgin,
  • Stefan Zimmerman,
  • Theodore Abraham,
  • Hagit Shatkay,
  • M. Roselle Abraham
Moumita Bhattacharya
University of Delaware
Author Profile
Dai-Yin Lu
Johns Hopkins University
Author Profile
Gabriela Greenland
University of California San Francisco
Author Profile
Hulya Yalcin
Johns Hopkins University
Author Profile
Joseph Marine
Johns Hopkins Hospital
Author Profile
Jeffrey Olgin
UCSF
Author Profile
Stefan Zimmerman
Johns Hopkins University
Author Profile
Theodore Abraham
University of California San Francisco
Author Profile
Hagit Shatkay
University of Delaware
Author Profile
M. Roselle Abraham
University of California San Francisco
Author Profile

Abstract

Background AF in HCM is associated with high stroke risk despite low CHA2DS2-VASc scores. Hence, there is need to understand AF pathophysiology and predict AF in HCM. We develop/apply a data-driven, machine learning-based method to identify AF cases and clinical features associated with AF in HCM, using electronic health record (EHR) data. Methods Patients with documented paroxysmal/ persistent/permanent AF (n=191) were considered AF cases, and the remaining patients in sinus rhythm (n=640) were tagged as No-AF. We evaluated 93 clinical variables; the most informative variables useful for distinguishing AF from No-AF cases were selected based on the 2-sample t-test and information gain criterion. Results We identified 18 highly informative clinical variables: 11 are positively associated (e.g. LA-diameter, LV-diastolic dysfunction, LV-LGE), and 7 are negatively correlated (e.g. several exercise parameters) with AF in HCM. Next, patient records were represented via these 18 variables. Data imbalance resulting from the relatively low number of AF cases was addressed via a combination of over- and under-sampling strategies. We trained and tested multiple classifiers under this sampling approach, showing effective classification. Specifically, an ensemble of logistic regression and naïve Bayes classifiers, trained based on the 18 variables and corrected for data imbalance, proved most effective for separating AF from No-AF cases (sensitivity=0.74, specificity=0.72, C-index=0.80). Conclusions Our model (HCM-AF-Risk Model), the first machine learning-based method for identification of AF cases in HCM, demonstrates good performance, and suggests that AF is associated with a more severe cardiac HCM-phenotype.