As artificial intelligence (AI) solutions continue to enter the healthcare space, decision-makers and clinicians face a critical challenge: how to assess, implement, and trust these emerging technologies in real-world screening programs. In the recent Delft webinar, Edwin Klinkenberg, a Clinical Application Specialist at Delft Imaging, addressed this issue head-on, drawing from both his clinical expertise and Delft Imaging’s journey with CAD4TB.

“The integration of AI technologies in healthcare is really rapidly evolving in the last couple of years,” he began. “I think it’s really crucial for all healthcare professionals to understand how to evaluate this new AI that’s going to the market.”

Three questions before introducing a new AI

Klinkenberg outlined three fundamental questions that healthcare teams must ask when considering any AI tool:

  1. Is the modality appropriate for the condition?
  2. Where does the AI fit in the patient journey?
  3. What happens after a positive AI result?

“For example, if you had an AI that would actually try and do a diagnosis of asthma, you could have to think like, is X-ray actually the tool that you use to make the diagnosis asthma? So you have to take this into consideration,” he said.

He noted that the position of the AI tool in the care pathway determines how it shapes decision-making. “For Cat for TB, yeah, it becomes quite clear that we put the AI before the confirmatory tests in which we send people with a high Cat for TB score for a gene expert, for example, and let the others go back home.”

The third question, how to manage positive cases, is just as crucial. “It includes determining the next steps, whether there are additional diagnostic tests, referrals to specialists, and treatment options. So there’s all kinds of logistics around it that you might have to organize if you have a positive case.”

Simplicity and clarity in screening tools

Klinkenberg emphasized that AI in screening settings should be simple, high-performing, and actionable. “We want to keep it simple. We want to have actually an increased performance of our AI. And we also know that radiological features are not really diseases.”

He explained how CAD4TB now combines three distinct scores to support this goal:

  • The TB Score, trained on a bacteriological reference.
  • The Abnormality Score, trained on a radiological reference and detecting features such as effusion, atelectasis, pneumothorax, and nodules.
  • The Cardiothoracic Ratio, used to detect cardiomegaly.

“We really combine all these three scores because we know that in a screening setting, overloading patients or users with information also affects and complicates the whole patient journey.”

Recognizing the limits of radiological features

While AI can assist in detecting radiological signs, it cannot replace diagnostic reasoning. Klinkenberg cautioned: “Radiological features could help diagnosis of a disease, but are very often non-specific.”

He offered the example of consolidation on a chest X-ray: “There are a lot of diseases that actually show and have consolidation in a chest X-ray. However, having consolidations does not mean that you have to treat the patient for this specific thing.”

Avoiding false positives in resource-limited settings

In screening programs, false positives come with serious consequences—especially where resources are stretched. “Every false positive that we do send through actually wastes resources. And we know that we’re in a resource-limited setting.”

“This also affects the amount of people that you can reach. So we want to focus really on outcomes in AI that impact the patient treatment.”

Reviewing AI evidence responsibility

As new AI solutions emerge, Klinkenberg encouraged healthcare professionals to critically evaluate available research. “Who are the authors? Who wrote it? Can we see from what institutions they came from? And are there any conflicts of interest which should often be declared in a paper?”

Understanding study populations is equally important. A study conducted in a low-TB-burden country, for example, may not apply to a high-burden setting. Klinkenberg urged implementers to ask: Does the study reflect our clinical environment? Does the data translate to the reality we’re working in?

The CAD4TB example: Trust built over time

Klinkenberg closed by reflecting on the CAD4TB journey as a case study in trust-building. “CAD4TB really started already in 2007 with its first prototype, meaning that it already started almost 20 years ago with its first implementation in 2011. CAD4TB was CE certified in 2015, but still, it took six more years before the WHO endorsed CAD4TB screening.”

He noted that this timeline highlights the importance of long-term validation, careful implementation, and clinical credibility across diverse populations.

The role of transparency in adoption

For Klinkenberg, trust doesn’t just come from evidence—it comes from openness. “We promote external validations and think that research is very essential before adopting a new and emerging AI. And yeah, in general, we think that transparency with clinical practice should also be maintained by manufacturers.”