Affinda's Lead AI Engineer, Tarik Dzekman, discusses the risks of AI hallucinations and shares insights on grounding techniques to ensure reliable AI performance for businesses.

In May 2024 Google's AI search tool captured global attention instantly after its public launch. Not for its utility, but for telling people to put glue on their pizzas to stop the cheese sliding off and confidently asserting that dogs own hotels.

Suggestions that people should eat rocks in moderation and cook chicken to only 38 degrees Celsius prompted public alarm, ensured a huge PR headache and likely damaged business confidence in generative AI.

 

It would not be surprising if business leaders might have labelled the technology over hyped or undercooked and adjusted their ambitions to use generative AI in their own operations. If this happens to one of the largest technology companies in the world, how can businesses employ AI with confidence?

 

To understand what’s happened here, and how to prevent it, you need to know a little about how AI models are trained. Training an AI model to replicate human language requires enormous volumes of data and computing capacity.

 

The object of this training is to not factual accuracy but the generation of increasingly plausible, sophisticated answers that reflect the nuance of a user’s input.

 

After myriad training data and billions of machine learning iterations we get an AI whose linguistic outputs sound convincingly human. And seeming convincing has no bearing on whether the generated content is true. When these models make up facts, we call that a 'hallucination.'

 

This carries self-evident risks for business users. Imagine an AI reads an invoice and hallucinates that $100,000is due instead of $1,000. Businesses can use traditional processes like comparing amounts to purchase orders to guard against these risks. But these processes are meant to catch occasional mistakes made by humans, not as a bulwark against flawed use of AI.


The good news is that there are established, reliable processes to guard against 'hallucinations' by requiring models to give citations against verified data. This incredibly effective approach is referred to as grounding.

So how does it work?

 

Generative AI models can follow detailed instructions and give responses in structured, verifiable formats. Given this ability, an AI can be instructed to provide outputs in a way that can be verified against reliable data.

At Affinda we use large language models to parse, process, and transform documents. Rather than simply give us a completed answer we force these AI models to tell us where answers can be found – and as much as possible verifying the correctness of those answers.

 

Even in corporate settings, varying levels of accuracy are required and it’s essential to apply appropriate techniques to ensure precision.

 

'Strong' grounding prohibits the AI from doing anything except quoting verbatim from the grounding documents. Whenever we want information extracted, we prefer strong grounding. Making up a job title in a candidate’s resume might not be as bad as making up an account number to pay an invoice but we would rather avoid both if the answer is right there in the document.

 

Weaker grounding is necessary when we want to give an AI room to synthesise. This challenged us on a recent project where we used AI to process clinical records. Some medical conditions were implied but not expressly stated. We determined a low risk profile because the data was used in aggregate rather than for individual patient outcomes. We required the AI to justify its claims (leading to higher accuracy) without needing to be perfect (as even human data entry has noise).

 

Weaker grounding also suits customer service chatbots. If it were restricted to copying slabs of corporate policy it would be officious, cold, and not particularly useful, simply reproducing text the customer can easily search. The challenge is to allow the model to respond to the nuance of a customer’s questions without allowing it to completely fabricate an answer.

 

Weak grounding could let a chatbot reference a policy while completely ignoring what that policy says. Risks can be mitigated using a structured approach to synthesis, AI self-reflection, and using multiple models to verify responses. A response which lacks sufficient grounding could be a signal that there should be a 'human in the loop.'

 

Organisations that harness the potential of AI stand to achieve substantial productivity gains, provided they thoughtfully address the associated risks. AI 'hallucinations' are manageable with the right approach and by applying rigorous grounding techniques, companies can ensure that AI systems deliver reliable, accurate results.

 

Success with AI hinges not just on its ability to generate content, but on how effectively it can be controlled, verified, and trusted. With careful implementation, AI can become a dependable asset that enhances business operations.

 

Want to learn more? Discover how you can automate document processing with Affinda’s AI platform.