Covering Scientific & Technical AI | Saturday, November 2, 2024

Spotting AI Hallucinations: MIT’s SymGen Speeds Up LLM Output Validation 

(Bisams/Shutterstock)

Imagine if your LLM could not only provide answers but also show you exactly where those answers came from—like a scholar meticulously citing sources. A new validation tool created at MIT aims to do just that, giving human validators the ability to trace every piece of information back to its origin in a dataset, which could lead to greater transparency and trust in the AI's responses.

The new tool is called SymGen, developed by MIT researchers to aid human validators to quickly verify an LLM’s responses, according to reporting from MIT News. SymGen enables an LLM to generate responses with citations pointing directly to a specific source document, down to the cell in the data.

The system allows a validator to hover over highlighted portions of a text response to see the data an AI model used to generate a specific word or phrase, MIT News said, adding that unhighlighted portions show which phrases are unlinked to specific data and need to be scrutinized more closely.

“We give people the ability to selectively focus on parts of the text they need to be more worried about. In the end, SymGen can give people higher confidence in a model’s responses because they can easily take a closer look to ensure that the information is verified,” says Shannon Shen, an electrical engineering and computer science graduate student and co-lead author of a paper on SymGen, as quoted in MIT News.

Using generative AI models to interpret complex data can be a high-consequence endeavor, especially in fields like healthcare and finance, or in scientific applications where accuracy is essential. While LLMs can process vast amounts of data and generate responses quickly, they also frequently hallucinate, giving information that can sound plausible but is erroneous, biased, or imprecise.

Human validation is a key factor in improving LLM accuracy because it provides a critical layer of oversight that AI models often lack. Human validators help ensure the quality of the output by cross-referencing facts, identifying inconsistencies, and correcting errors that the model may overlook. This iterative process not only refines the LLM's performance but also helps address issues like hallucinations and misinformation, making the model more reliable and trustworthy over time.

Generating citations is nothing new for LLMs, but they often point to external documents and sorting through them can be time-consuming. The researchers said they approached the time problem from the perspective of the humans doing this tedious validation work: “Generative AI is intended to reduce the user’s time to complete a task. If you need to spend hours reading through all these documents to verify the model is saying something reasonable, then it’s less helpful to have the generations in practice,” Shen said.

It appears SymGen could help validators work more quickly. Shen and his team say SymGen sped up verification time by about 20 percent compared to manual procedures, according to their results from a user study.

Data quality continues to be a vital factor in validating LLM output, even with tools like SymGen. As always, an AI model's reliability hinges on the quality and credibility of the data it’s trained on. One caveat is that SymGen’s current iteration requires structured data in a tabular format. The researchers are exploring ways to augment SymGen’s capabilities to include unstructured data and other formats. MIT News also noted that researchers are planning to test SymGen with physicians to study how it could identify errors in AI-generated clinical summaries.

SymGen is another promising tool in the fight against hallucinations. Another example is Google’s recently launched DataGemma, a system designed to connect LLMs with extensive real-world data drawn from Google's Data Commons, a large repository of public data. DataGemma integrates Data Commons within Google’s Gemma family of lightweight open models and uses two techniques, retrieval-interleaved generation and retrieval-augmented generation, to enhance LLM accuracy and reasoning.

With exciting new tools like SymGen and DataGemma leading the charge, we may soon envision a future where AI hallucinations are nothing but a distant memory.

Read more about the technical features of SymGen in the original MIT News report by Adam Zewe, found here. You can also check out the accompanying academic paper at this link.

AIwire