Finding hard-to-find patients: Integrating real-world data and AI

Identifying patients ‘outside the clinic’ can provide significant benefits for researchers and population health managers. Better understanding of patient cohorts can shed light on patient journeys, help optimize decisions around treatment choice and timing and inform the development of new therapies and intervention programs. Even without ‘in-clinic’ insight - the ability to directly interact with the patient, to confirm their membership in a cohort - we have plenty of population-level tools we can use to try to identify patients.

Even so, patient identification in many cases can become a challenging, even impossible task. For example, we may want to find a cohort with a certain health condition: in the simplest circumstances, if the condition is well-documented in structured medical records, we can query large real-world datasets to isolate the cohort using standardized coding. The same approach could apply to patients who have been treated with a certain medication, or those who have undergone a particular procedure. But what if these cohorts are not easily capturable using this kind of querying - because, for example, standardized coding does not exist? We may be able to proxy for some missing information, but in many real-world examples, researchers often run into insurmountable obstacles and are forced to move on from the problem entirely.

Iterative RWD-AI methods for challenging cohorts

In these kinds of more complicated cases, a powerful combination of real-world data analysis and artificial intelligence (AI) modeling can help identify patients beyond the reach of other methods. At the heart of this approach is AI’s ability to isolate patterning information associated with distinct patient cohorts, and then use that patterning information to look for others who display the same signals. If we can give a well-designed AI tool a place to start - a patient cohort with the characteristics we want to find - the tool can, in many cases, find this kind of patterning in datasets we can access and analyze more easily. In essence, we’re ‘translating’ one way of defining a cohort into a machine-learned definition, anchored in the same starting cohort and then using this new definition to look more broadly for similar patients. Of course, the ‘starting point’ for AI in this case needs to be established, and richer real-world datasets can provide exactly what’s needed. For example, if we can leverage unstructured data - like clinical narratives - we can use clinician attestation to provide a core group of ‘known positive’ patients with the condition or treatment characteristic we want to find to use as our AI’s starting point. Working with these unstructured data alone is generally more challenging than working with structured records, and more difficult to generalize because of their more limited availability and portability. If our AI tool can learn from this cohort, though, it can then be applied to much more commonly accessible data.

An example: finding patients with treatment-resistant depression

As an example of this approach, consider the problem of identifying patients with treatment-resistant depression (TRD). These patients comprise a significant minority (10-30%) of the major depressive disorder population¹, and resistance to treatment is certainly observable in-clinic in many cases: both the patient and their doctor know their symptoms of depression persist, despite treatment. At the same time, identifying these patients at the population level from the researcher’s perspective can be quite challenging. First, there is no simple, defined coding for TRD that could be used to isolate a cohort. Additionally, while definitions based on medication-use patterns exist, these can be difficult to implement in practice with sufficient confidence across large populations.

OM1 has used the approach outlined above to address these challenges. To establish a ‘starting point’ cohort of patients with known TRD, clinical narratives from OM1’s PremiOM™ Major Depressive Disorder Dataset were interrogated for clinician attestation of treatment resistance. Then, OM1 Patient Finder™, an AI tool built for patient identification, was calibrated using this cohort. The calibration process established distinct patterns in cohort members’ structured health history data and merged these into a consistent set of signals distinguishing these patients from others. With this set of structured data signals, OM1 Patient Finder™ can then be applied to much broader datasets to identify similar patients much more likely to have TRD, and who may have simply been inaccessible using other methods of trying to find them.

Challenging cases can be hard, but don’t need to be impossible

Accurately, reliably identifying patients can be tremendously beneficial but can also be very difficult without the right tools and approach. Cases where standardized coding does not exist, or operational definitions are difficult to implement broadly, can prevent isolation of cohorts of interest and the insights that follow. Working strategically with the right kinds of real-world data and AI tools, and iterating for improvement, can answer questions not feasible before these tools were available. A patient identification challenge that may have seemed impossible may actually be solvable now - and if the insights to be gained can help improve outcomes, it’s worth trying.

Resources

¹ Al-Harbi KS. Treatment-resistant depression: therapeutic trends, challenges, and future directions. Patient Prefer Adherence. 2012;6:369-88. doi: 10.2147/PPA.S29716. Epub 2012 May 1. PMID: 22654508; PMCID: PMC3363299.