A theoretical approach for applying Risk-Based Clinical Data Management to the Use of AI in Clinical Research


Jon Cresswell
ProPharma Group

AI is becoming more than just a buzzword in clinical research—it’s starting to show up in real workflows, from query generation to site risk prediction. As Clinical Data Managers, we’re being asked to engage with these tools in a meaningful way. But with that comes a responsibility to ensure that AI is used in a way that supports—not undermines—data quality, compliance, and patient safety.

This article outlines a practical, theory-informed approach to applying Risk-Based Clinical Data Management (RBCDM) principles to the use of AI. It’s not about prescribing a rigid framework, but about offering a way to think through the risks and responsibilities that come with integrating AI into our data management strategies.

Risk Assessment: Thinking Beyond the Algorithm

When we talk about risk in the context of AI, we’re not just talking about system failures or bugs. We’re talking about how the model behaves, what data it’s trained on, and how its outputs influence decisions. AI isn’t static—it learns, adapts, and sometimes surprises us.

So, when evaluating an AI tool, we need to ask: What’s the model doing? What decisions is it supporting? What happens if it gets something wrong? And just as importantly—can we explain how it got there?

For instance, consider a model designed to flag protocol deviations. It might perform well across most sites but struggle where data is sparse. In such a case, a reasonable mitigation could be to introduce a manual review step for lower-enrolling sites or apply a confidence threshold to trigger human oversight. This kind of layered approach helps balance automation with control.

Critical Data Identification: Knowing What the Model Depends On

AI models are only as good as the data they’re built on. That means we need to be clear about which data points are feeding into the model and which ones are being influenced by it. These are your AI-sensitive fields—and they need to be treated accordingly.

This isn’t just about flagging a few variables. It’s about understanding the data ecosystem around the model: how data flows, where it’s transformed, and how it’s used downstream.

As an example, in a model predicting patient dropout, visit adherence and demographic data might be key inputs. If those fields are inaccurate or inconsistently captured, the model’s outputs could be misleading. Identifying these dependencies early allows for targeted validation and monitoring strategies.

Targeted Oversight: Keeping an Eye on the Outputs

AI doesn’t always behave the way we expect. That’s why oversight can’t stop at deployment. We need to monitor how the model is performing, whether its outputs are still relevant, and whether it’s starting to drift.

This means setting up review cycles, defining what “good” looks like, and being ready to intervene if things go off track.

Hypothetically, if a model used for query generation begins producing a high volume of irrelevant queries, it could indicate that the training data no longer reflects current trial conditions. In such a case, pausing the tool, reviewing the training inputs, and introducing a validation step before future updates would be a sensible course of action.

Centralised Monitoring: Making AI Part of the Bigger Picture

AI insights are valuable—but only if they’re integrated into the broader monitoring framework. That means putting AI-generated metrics alongside your traditional KRIs and QTLs, and making sure they’re interpreted in context.

It also means making sure those insights are traceable. If a model flags a site as high-risk, we need to know why—and be able to explain it if asked.

To illustrate, imagine a site’s AI-generated risk score increases, but traditional indicators remain stable. This discrepancy could prompt a deeper review, potentially revealing subtle operational changes—such as staff turnover or workflow shifts—that haven’t yet impacted standard metrics. AI can offer early signals, but only if we’re prepared to act on them thoughtfully.

Adaptive Oversight: Planning for Change

AI tools evolve. They get retrained, updated, and sometimes re-scoped. Each of those changes introduces new risks—and we need to be ready to manage them.

That means having a change control process, knowing when to revalidate, and making sure everyone involved understands what’s changed and why.

For example, if an AI model used for eligibility screening is updated mid-study to reflect protocol amendments, it’s important to assess how that change might affect screening patterns across regions. Adjusting monitoring plans and ensuring consistent communication with sites would help maintain alignment and avoid unintended consequences.

Regulatory Alignment: Staying Grounded in GCP

The ICH E6(R3) guideline doesn’t mention AI directly, but its principles apply just the same. It’s about identifying what’s critical, applying proportionate controls, and maintaining oversight throughout the trial.

If we treat AI like any other system—with the same level of scrutiny, documentation, and governance—we’re already on the right track.
“The sponsor should implement a system to manage quality throughout all stages of the trial process… implementing proportionate risk-based approaches to manage them.”
— ICH E6(R3), Section 5.0

Final Thoughts

AI is here, and it’s not going away. As Clinical Data Managers, we don’t need to become data scientists—but we do need to understand how these tools work, what they depend on, and how to manage the risks they introduce.

By applying RBCDM principles to AI, we can make sure we’re not just using these tools—but using them well. That means being thoughtful, being critical, and staying focused on what matters most: data quality, patient safety, and trial integrity.

Thank you Jon for submitting this article. Have you an article you want to share, submit it here.

Back to Articles