Currently, many different methods are being used for pre-processing, statistical analysis and validation of data obtained by electronic nose technology from exhaled air. These various methods, however, have never been thoroughly compared. We aimed to empirically evaluate and compare the influence of different dimension reduction, classification and validation methods found in published studies on the diagnostic performance in several datasets. Our objective was to facilitate the selection of appropriate statistical methods and to support reviewers in this research area. We reviewed the literature by searching Pubmed up to the end of 2014 for all human studies using an electronic nose and methodological quality was assessed using the QUADAS-2 tool tailored to our review. Forty-six studies were evaluated regarding the range of different approaches to dimension reduction, classification and validation. From forty-six reviewed articles only seven applied external validation in an independent dataset, mostly with a case-control design. We asked their authors to share the original datasets with us. Four of the seven datasets were available for re-analysis. Published statistical methods for eNose signal analysis found in the literature review were applied to the training set of each dataset. The performance (area under the receiver operating characteristics curve (ROC-AUC)) was calculated for the training cohort (in-set) and after internal validation (leave-one-out cross validation). The methods were also applied to the external validation set to assess the external validity of the performance. Risk of bias was high in most studies due to non-random selection of patients. Internal validation resulted in a decrease in ROC-AUCs compared to in-set performance: -0.15,-0.14,-0.1,-0.11 in dataset 1 through 4, respectively. External validation resulted in lower ROC-AUC compared to internal validation in dataset 1 (-0.23) and 3 (-0.09). ROC-AUCs did not decrease in dataset 2 (+0.07) and 4 (+0.04). No single combination of dimension reduction and classification methods gave consistent results between internal and external validation sets in this sample of four datasets. This empirical evaluation showed that it is not meaningful to estimate the diagnostic performance on a training set alone, even after internal validation. Therefore, we recommend the inclusion of an external validation set in all future eNose projects in medicine.
ASJC Scopus subject areas
- Pulmonary and Respiratory Medicine