Amazon’s New AI Assistant Is an Editor to Prevent Hallucinations
Large-language models regularly spit out off-the-rails answers, and companies are introducing editors and guardrails to ensure that responses from AI are more on point.
Amazon this week announced the general availability of Amazon Q, an in-between gatekeeper that reduces hallucinatory answers and ensures customers get the most out of large-language models.
The technology has many uses, including automation of coding and business activities. Q helps companies move faster on making business decisions and upgrading IT systems, but it is also the kind of technology what will take away human jobs.
A video explains how Amazon Q works to automate coding. A simple prompt input of typical information in a DevOps methodology, including the programming language and the code release in a CI/CD cycle model, will generate code in a few minutes.
The tool also provides analytical insights into company data and automates business processes such as recruitment and sales pitches.
For decades, companies have struggled to make up unstructured data partially because of the inability to connect the data points. Amazon Q is one step in solving that problem.
Nvidia’s CEO Jensen Huang, in recent months, has said people would not need to learn how to program with AI, and Amazon’s AI assistant brings that to the spotlight. However, Amazon had a different view on this issue, saying it would help employees become more productive.
“Early indications signal Amazon Q could help our customers’ employees become more than 80% more productive at their jobs,” Amazon said.
Amazon said Q runs on its "Bedrock" backend, which provides many AI-hosted models to customers. The company has taken an open-source AI approach with models such as Llama and Mistral, and Claude is Anthropic’s proprietary model. Q could approach any of these models to generate the best answers.
Companies that use Amazon Web Services cloud and microservices will immediately benefit from Q. The Q plugs directly into databases, streaming, and analytical systems, which typically process a large amount of static and real-time data.
This open approach clears out layers of AI complexity that customers deal with today. For one, customers get the best results and don’t have to worry about the middleware and models being used to generate responses.
However, there could be problems with this approach, and it is not for customers who want full control of their type of large-language models, the customized AI stack, or programming models. Also, the approach could add inefficiencies as each AI system provides different levels of consistency in responses based on an understanding of company data. Additionally, there are AI stack complexities, including the algorithms and processing of information, the development tools, and the applications attached to each model.
Companies are already building AI assistants and guardrails on top of the proprietary prominent LLMs, including OpenAI’s GPT-3.5, GPT-4, and Google’s Gemini. These are mainly through APIs or customized programming layers plugged into the AI stack.
Nvidia’s NeMo-Guardrails is a guardrail placed between the LLM and the prompt. A query on the prompt first goes to the guardrail, which ensures the request for information is relevant to the application and then passes on to the LLM. The LLM’s response also goes through the guardrail – which ensures the answer is relevant and not too hallucinatory – after which it is sent back to the customer. If the answer is hallucinatory, it sends the information back to the LLM to get a more on-topic answer.
Service providers such as Glean provide customers with customized integrations of their internal business data through connectors to prominent large language models. The LLMs do not ingest the company data into its systems; it instead creates a customized AI system for the customer.