Responsible AI Testing for Healthcare
AI is transforming healthcare by aiding in disease diagnosis, treatment, and drug development. From accelerating the drug discovery pipeline to identifying new treatments for chronic non-communicable diseases, AI is revolutionizing the way we approach medical challenges. Ensuring responsible AI testing is crucial to maintaining accuracy, fairness, and patient safety in these transformative healthcare solutions.
David Talby, CTO of John Snow Labs, spoke with AIWire to discuss the tools developed by John Snow Labs that can automate responsible AI testing and address biases in large language models (LLMs) used in healthcare and life sciences.
John Snow Labs, a leader in AI for healthcare, offers cutting-edge software, language models, and data to assist healthcare and life science organizations in rapidly developing, deploying, and managing AI, LLM, and NLP projects.
Talby clarified that while the John Snow Labs’ tool for AI testing can serve as a mechanism for automating controls needed to address fairness and bias issues, it doesn’t directly solve the issue. AI biases have emerged as a persistent challenge, affecting decision-making across various sectors, and undermining trust in automated systems.
According to Talby, the biases in healthcare AI can often manifest in ways that are subtle yet highly consequential. For example, an AI system used in healthcare may recommend a certain test based on the patient’s name or the perceived racial or ethnic identity.
The deep-rooted biases in the AI system can also lead to various issues based on gender, religion, profession, or even socioeconomic status,” Talby said. “A biased AI system in healthcare poses risks not only to patients but also to healthcare providers, who may face regulatory and legal challenges as a result.
Talby stated that if a hospital's AI system discriminates based on race or socioeconomic status, it becomes easier to prove in court that the hospital violated anti-discrimination laws. Unlike subtle real-world discrimination, AI bias can be clearly shown by altering variables like a patient's name and observing different outcomes, creating legal risks for organizations using biased AI.
To address such issues, John Snow Labs created an open-source project LangTest library that includes more than 100 test types for different aspects of responsible AI, from bias and security to toxicity and political leaning. The system can test for various types of AI biases.
"There was a paper just two weeks ago that showed that on the US medical licensing exam, changing a brand name to a generic version of the medications dropped scores by 1-10%, even if the question wasn’t about the medication,” Talby said. “It’s essentially just replacing synonyms at this point. Currently, we have more than 100 types of tests to address such issues”.
Talby highlights that John Snow Labs' testing system generates extensive test cases from a small set of core examples, covering a range of AI models and platforms, including Hugging Face, OpenAI, and other popular models. Additionally, the system adheres to best practices through version test suites that allow for regular testing, updates, and the ability to export test data.
Many professionals currently do not engage in AI testing due to its complexity, explained Talby. However, with increasing legal requirements, it is becoming essential for these experts to understand and trust the AI systems they use.
According to Talby, domain experts need to be involved in reviewing AI outputs to ensure the outputs are accurate and reliable. While AI systems should be free from bias, they should still be capable of adjusting their outputs based on factors that genuinely warrant a change.
"If we change this patient from male to female, we need to check whether the treatment remains the same,” Talby said. “Similarly, changing the patient’s age might require adjustments. For example, with lower abdominal pain, different tests are appropriate for men and women."
We asked Talby for his opinion on the strict regulations governing the healthcare industry, including those related to AI. Talby supports these regulations, including Section 1557 of the Affordable Care Act (ACA), viewing them as a good starting point. However, he emphasizes that these regulations need to be refined to address the complexities associated with AI in healthcare.
“I think it's a start, but it needs to evolve much more,” Talby said. “The current state is bad. To illustrate, think about the late 19th century and the beginning of cars. The regulation back then was like saying cars should have no safety issues, there should be no rules of the road.”
Looking ahead, Talby explained that John Snow Labs aims to keep healthcare and life sciences models state-of-the-art by continually updating their models based on the latest research and benchmarks. Not only will John Snow Labs continue to offer high-quality models, but it will also offer third-party validation services to ensure reliability and compliance.