Racial differences in medical testing could introduce bias to AI models

But tweaking the models could help overcome biased data sets

November 13, 2024 5:00 AM

floating AI-type images in red and blues and yellow on blue background

Justine Ross, Michigan Medicine

Black patients are less likely than white patients to receive certain medical tests that doctors use to diagnose severe disease such as sepsis, researchers at the University of Michigan have shown.

Because of the differences in testing rates, some sick Black patients are assumed to be healthy in data used to train AI, and the resulting models likely underestimate illness in Black patients. But that doesn’t mean the data is unusable—the same group developed a way to correct for this bias in data sets used to train AI.

These new insights are reported in a pair of studies: one published today in PLOS Global Public Health, and the other was presented at the International Conference on Machine Learning in Vienna, Austria, in July 2024.

In the PLOS study, the researchers found that medical testing rates for white patients are up to 4.5% higher than for Black patients with the same age, sex, medical complaints and emergency department triage score, a measure of the urgency of a patient’s medical needs.

The difference is partially explained by hospital admission rates, as white patients were more likely to be assessed as ill and admitted to the hospital than Black patients.

“If there are subgroups of patients who are systematically undertested, then you are baking this bias into your model,” said Jenna Wiens, U-M associate professor of computer science and engineering and corresponding author of the study.

“Adjusting for such confounding factors is a standard statistical technique, but it’s typically not done prior to training AI models. When training AI, it’s really important to acknowledge flaws in the available data and think about their downstream implications.”

The researchers found this bias in medical testing records from two locations: Michigan Medicine in Ann Arbor, Michigan, and one of the most widely used clinical datasets for training AI, the Medical Information Mart for Intensive Care.

The dataset contains the records of patients visiting the emergency room in the Beth Israel Deaconess Medical Center in Boston.

“This research highlights the risks of using health data to train AI models without a comprehensive understanding of the data,” said Michael Sjoding, M.D., associate professor of pulmonary and critical care medicine at Michigan Medicine.

“Because of these apparent testing difference, an AI model might infer that black patients are less sick than white patients and make predictions that are potentially biased.”

Computer scientists need to account for these biases so that AI can make accurate and equitable predictions of patient illness.

One option is to train the AI model with a less biased dataset, such as one that only includes records for patients that have received diagnostic medical tests.

A model trained on such data might be inaccurate for less ill patients, however.

To correct the bias without omitting patient records, the researchers developed a computer algorithm that identifies whether untested patients were likely ill based on their race and vital signs, such as blood pressure.

The algorithm accounts for race because the recorded health statuses of patients identified as Black are more likely to be affected by the testing bias.

The researchers tested the algorithm with simulated data, in which they introduced a known bias by relabeling patients identified as ill as “untested and healthy.”

The researchers then used this dataset to train a machine learning model, the results of which were presented at the International Conference on Machine Learning.

When the researcher-imposed bias was corrected with the algorithm, a textbook machine-learning model could accurately differentiate between patients with and without sepsis around 60% of the time.

Without the algorithm, the biased data made the model’s performance worse than random.

The improved accuracy was on par with a textbook model that was trained on unbiased, simulated data in which everyone was equitably tested.

Such unbiased datasets are unlikely to exist in the real world, but the researcher’s approach allowed the AI to work about as accurately as the idealized scenario despite being stuck with biased data.

“Approaches that account for systematic bias in data are an important step towards correcting some inequities in healthcare delivery, especially as more clinics turn toward AI-based solutions,” said Trenton Chang, a doctoral student in computer science and engineering and the first author of both studies.

Additional authors: Mark Nuppnau, Ying He, Keith E. Kocher and Thomas S. Valley.

Funding/disclosures: This work was supported by the National Heart, Lung, and Blood Institute NHLBI R01 HL158626.

Paper cited: "Racial differences in laboratory testing as a potential mechanism for bias in AI: A matched cohort analysis in emergency department visits," PLOS Glob Public Health. DOI: 10.1371/journal.pgph.0003555

Sign up for Health Lab newsletters today. Get medical tips from top experts and learn about new scientific discoveries every week.

Sign up for the Health Lab Podcast. Add us wherever you listen to your favorite shows.

Global Footer Secondary Navigation