Computer-aided detection of tuberculosis from chest radiographs in a tuberculosis prevalence survey in South Africa: external validation and modelled impacts of commercially available artificial intelligence software

🔗2024

🔗Journal/Publication: The Lancet Digital Health

🔗Read it in full version: https://doi.org/10.1016/S2589-7500(24)00118-3

Abstract 

Background: Computer-aided detection (CAD) can help identify people with active tuberculosis left undetected. However, few studies have compared the performance of commercially available CAD products for screening in high tuberculosis and high HIV settings, and there is poor understanding of threshold selection across products in different populations. We aimed to compare CAD products’ performance, with further analyses on subgroup performance and threshold selection.
Methods: We evaluated 12 CAD products on a case–control sample of participants from a South African tuberculosis prevalence survey. Only those with microbiological test results were eligible. The primary outcome was comparing products’ accuracy using the area under the receiver operating characteristic curve (AUC) against microbiological evidence. Threshold analyses were performed based on pre-defined criteria and across all thresholds. We conducted subgroup analyses including age, gender, HIV status, previous tuberculosis history, symptoms presence, and current smoking status.
Findings: Of the 774 people included, 516 were bacteriologically negative and 258 were bacteriologically positive. Diverse accuracy was noted: Lunit and Nexus had AUCs near 0·9, followed by qXR, JF CXR-2, InferRead, Xvision, and ChestEye (AUCs 0·8–0·9). XrayAME, RADIFY, and TiSepX-TB had AUC under 0·8. Thresholds varied notably across these products and different versions of the same products. Certain products (Lunit, Nexus, JF CXR-2, and qXR) maintained high sensitivity (>90%) across a wide threshold range while reducing the number of individuals requiring confirmatory diagnostic testing. All products generally performed worst in older individuals, people with previous tuberculosis, and people with HIV. Variations in thresholds, sensitivity, and specificity existed across groups and settings.
Interpretation: Several previously unevaluated products performed similarly to those evaluated by WHO. Thresholds differed across products and demographic subgroups. The rapid emergence of products and versions necessitates a global strategy to validate new versions and software to support CAD product and threshold selections.