Development and validation of parsimonious algorithms to classify acute respiratory distress syndrome phenotypes: a secondary analysis of randomised controlled trials

Pratik Sinha, Kevin L. Delucchi, Danny McAuley, Cecilia O'Kane, Michael Matthay, Carolyn S Calfee

Research output: Contribution to journalArticlepeer-review

50 Citations (Scopus)
535 Downloads (Pure)


Background Using latent class analysis (LCA) in five randomised controlled trial (RCT) cohorts, two distinctphenotypes of acute respiratory distress syndrome (ARDS) have been identified: hypoinflammatory and hyperinflammatory. The phenotypes are associated with differential outcomes and treatment response. The objective ofthis study was to develop parsimonious models for phenotype identification that could be accurate and feasible to usein the clinical setting.Methods In this retrospective study, three RCT cohorts from the National Lung, Heart, and Blood Institute ARDSNetwork (ARMA, ALVEOLI, and FACTT) were used as the derivation dataset (n=2022), from which the machinelearning and logistic regression classifer models were derived, and a fourth (SAILS; n=715) from the same networkwas used as the validation test set. LCA-derived phenotypes in all of these cohorts served as the reference standard.Machine-learning algorithms (random forest, bootstrapped aggregating, and least absolute shrinkage and selectionoperator) were used to select a maximum of six important classifier variables, which were then used to develop nestedlogistic regression models. Only cases with complete biomarker data in the derivation dataset were used for variableselection. The best logistic regression models based on parsimony and predictive accuracy were then evaluated in thevalidation test set. Finally, the models’ prognostic validity was tested in two external ARDS clinical trial datasets(START and HARP-2) by assessing mortality at days 28, 60, and 90 and ventilator-free days to day 28.Findings The six most important classifier variables were interleukin (IL)-8, IL-6, protein C, soluble tumour necrosisfactor receptor 1, bicarbonate, and vasopressor use. From the nested models, three-variable (IL-8, bicarbonate, andprotein C) and four-variable (3-variable plus vasopressor use) models were adjudicated to be the best performing. Inthe validation test set, both models showed good accuracy (AUC 0·94 [95% CI 0·92–0·95] for the three-variablemodel and 0·95 [95% CI 0·93–0·96] for the four-variable model) against LCA classifications. As with LCA-derivedphenotypes, the hyperinflammatory phenotype as identified by the classifier model was associated with highermortality at day 90 (87 [39%] of 223 patients vs 112 [23%] of 492 patients; p<0·0001) and fewer ventilator-free days(median 14 days [IQR 0–22] vs 22 days [0–25]; p<0·0001). In the external validation datasets, three-variable modelsdeveloped in the derivation dataset identified two phenotypes with distinct clinical features and outcomes consistentwith previous findings, including differential survival with simvastatin versus placebo in HARP-2 (p=0·023 forsurvival at 28 days).Interpretation ARDS phenotypes can be accurately identified with parsimonious classifier models using three or fourvariables. Pending the development of real-time testing for key biomarkers and prospective validation, these modelscould facilitate identification of ARDS phenotypes to enable their application in clinical trials and practice.
Original languageEnglish
Number of pages11
JournalThe Lancet Respiratory Medicine
Early online date13 Jan 2020
Publication statusEarly online date - 13 Jan 2020


Dive into the research topics of 'Development and validation of parsimonious algorithms to classify acute respiratory distress syndrome phenotypes: a secondary analysis of randomised controlled trials'. Together they form a unique fingerprint.

Cite this