Skip to main content
Interview

Frameworks for Real-World Data Quality: Implementing Best Practices in Oncology Research

Zhaohui Su


In this interview, Zhaohui Su, a real-world data expert at McKesson’s Ontada division, discusses the implementation of data quality frameworks to enhance the accuracy, transparency, and utility of real-world data in oncology, emphasizing the importance of measurement, collaboration, and validation—especially as data become increasingly multimodal and shaped by emerging technologies like NLP and AI.


Please introduce yourself by stating your name, title, and any relevant experience you’d like to share.

Zhaohui Su: Thank you for this opportunity. My name is Zhaohui Su. I work for the Ontada subdivision at McKesson. I have been working on real-world data (RWD) and real-world evidence for 25 years. It's a pleasure to present to you our recent work on real-world data quality initiatives in the framework.

Can you provide an overview of the abstract you are presenting at ISPOR 2025?

Su: We all know how important it is to measure and improve real-world data quality. There are many frameworks and initiatives published [about it] in recent years.

We did a literature review. We summarized the most recent work and, more importantly, we implemented it to tell people how we implement the framework and how this would help to improve real-world data quality.

Our lessons learned are, number one, we have to understand these frameworks. Number two, we have to understand our data. Number three, we have to be transparent about what we did and the impact of this framework implementation on the real studies that we have.

How did the implementation of new data quality frameworks impact the overall reliability and external validity of the On.Genuity RWD platform?

Su: People often say, “We manage what we can measure.” The first important topic is that we measure the real-world data quality. It's not easy. There are multiple frameworks available. The most important thing is to start working on it and start tracking the improvement. We set up a baseline and, over time, we improved it. Now we have a baseline measure.

Based on the most recent US Food and Drug Administration (FDA) initiative, we implemented and understand the baseline accuracy, and the completeness of our data so that we can track the improvement over the years. We understand the areas where improvements are needed, so we can implement interventions and put more effort into cleaning the data so that data quality could improve over the years.

How do you foresee these data quality frameworks influencing clinical decision-making and policy development in oncology research? Are there any emerging areas where these frameworks could be particularly impactful?

Su: Good question. We have to be transparent. This quality framework would highlight the areas where the data quality is good enough for decision-making, as well as the areas where more work needs to be done in order to get the data ready for decision-making. That's the key message I want to share with the broad audience: we need to measure. Without a measure, people don't know.

The impactful areas that are new emerging trends are that the data have become multimodal instead of [relying on] only one data source. Now, the real-world data have become multimodal with the integration of multiple data sources, including electronic medical record (EMR) claims, wearable device data, unstructured data, which we turned into a structured format through natural language processing (NLP) or machine learning.

All these emerging data sources and their integration cause new challenges. That's why having a framework to start tracking the improvement is so important for decision-making.

One emerging area is NLP and artificial intelligence (AI). Many companies are now implementing AI technology and solutions to get the data from an unstructured format into a structured format. It's an emerging field, it's scalable, it's the industry trend. However, NLP data quality is a big area we need to measure and work on.

Machine learning today is not providing as reliable data quality as human chart abstraction. If we blindly let this data flow into the data deliverables, it could have a negative impact. This is cause for our attention and action to actively measure and validate the NLP data from AI and machine learning before we use it.

Is there anything else you’d like audiences to take away from this?

Su: Start working on it. This is important. Execution and implementation are key. The real-world data quality frameworks are available, initiatives are available, and the data are available. We need to start working on it to improve and understand our data and work hard collaboratively to improve the data quality.

© 2025 HMP Global. All Rights Reserved.
Any views and opinions expressed are those of the author(s) and/or participants and do not necessarily reflect the views, policy, or position of the Journal of Clinical Pathways or HMP Global, their employees, and affiliates.