Usually, it starts with a screenshot.
If you work in this space full-time, you know the dread. A stakeholder forwards you a chat log where your agent confidently offered a loan amount the bank doesn't actually provide, or promised a refund policy that doesn't exist. Everyone panics. The engineers check the system. The prompt "looked fine." You literally wrote "do not make things up" in all caps. And yet, here we are.
We don't even have to use hypotheticals. We have real, very public data on this. In early 2024, Air Canada had to learn this lesson in front of a civil tribunal. Their customer service chatbot hallucinated a bereavement fare policy, promising a retroactive discount to a grieving passenger. When the passenger tried to claim it, the airline's defense was essentially: it's the bot's fault, we aren't responsible for what it says.
The tribunal disagreed. In their eyes, the chatbot is the company. You are fully liable for your system's output.
Or look at McDonald's pulling the plug on their AI drive-thru tests with IBM a few months later. Voice AI is even less forgiving than text. When a system hallucinates an order or breaks down in a voice channel, there's no UI to hide behind. It's frustrating, it's immediate, and it directly damages the brand.
The immediate business reaction to these incidents is almost always the same: "Fix the prompt." Add more instructions. Tell the model to really, seriously only use the provided context.
But that's a fundamental misunderstanding of what we are building. The problem isn't that the model "lied." The problem is that the architecture didn't differentiate between "brand-approved knowledge" and "knowledge the model has from its training data." To an LLM, it's all just one continuous probability distribution over tokens.
But to a compliance officer? It's the difference between a successful customer interaction and a lawsuit.
If you want to understand why this happens, you have to look at the actual research. We have years of academic literature documenting these failure modes, and it usually boils down to an uncomfortable truth:
LLMs are fundamentally engineered to be helpful, not factual.
Why "prompting harder" fails
When you just hand an LLM some data and tell it to answer questions, you are fighting against the model's core training. Here is what the data actually tells us about why "prompting harder" fails:
The model is a pathological people-pleaser
Look at the research on sycophancy in language models.
The very mechanism that makes models conversational and pleasant to use actively pushes them to align with a user's beliefs or requests, often at the expense of the truth. If a user aggressively asks for a discount, and the model predicts that agreeing to it will score high on "helpfulness," it will happily invent a policy to make the user happy. This is a known phenomenon called reward hacking.
You cannot prompt away a behavior the model was literally optimized to perform.
The context window illusion
The most common lazy fix is what I call the "dump and pray" method: just throw our entire 100-page policy manual into the context window, the model will figure it out. The famous "Lost in the Middle" paper effectively killed this idea. It proved that LLMs do not utilize long contexts evenly. They heavily weight the beginning and the end of a prompt, severely degrading in performance when the relevant information is buried in the middle.
If your crucial compliance disclaimer is on page 42 of the retrieved context, the model will likely ignore it and guess anyway. Throwing massive amounts of data at a model isn't information architecture; it's just hoping for the best.
The inability to just shut up
In banking, healthcare, or any regulated sector, the most valuable thing a conversational AI can say is, "I don't know." Yet, benchmark after benchmark — like AbstentionBench — shows that models consistently fail at proper abstention. They struggle to identify unanswerable questions. Unless you architecturally force a fallback, the probability distribution will eventually drag the model into making a confident guess.
Even standard RAG is not a silver bullet
A lot of people think RAG (Retrieval-Augmented Generation) is the final answer to grounding. It isn't. An OpenAI whitepaper from 2025 on why models hallucinate spells this out clearly: RAG helps, but it is not a panacea. If your retrieval system pulls the wrong chunk of text, your model will just synthesize a beautifully coherent, completely wrong answer based on bad data.
As frameworks like Corrective RAG demonstrate, RAG without strict quality control on the retrieved context can actually introduce errors into the conversation.
The takeaway here isn't that AI is broken. The takeaway is that relying on the model's internal "judgment" to strictly follow your business rules is a losing game.
The problem is structural
Therefore, the solution has to be structural.
People in the industry keep using the word "hallucination" like the model is having some sort of glitch or a psychotic break. It's not. When an LLM invents a brand-new return policy for your e-commerce store, it is functioning exactly as designed.
To understand why, we need to clearly define the problem. And to do that, we have to stop treating language models like databases or reasoning engines, and start treating them like what they are: probability distributions over tokens.
An LLM does not have a mental filing cabinet where it separates "Real Corporate Facts" from "Things I Read on Reddit in 2023." To the model, it is all just one massive, flat continuum of statistical weights. Your carefully vetted, legally approved, 50-page PDF on mortgage rates lives in the exact same neural soup as a sci-fi novel.
So, when a user asks a question, the model doesn't query a truth table. It calculates the most statistically plausible next word. The problem isn't that the model "lied." A model cannot lie because it has no fundamental concept of the truth.
The actual problem is a failure of Information Architecture.
Specifically, it's a failure to draw a hard architectural boundary between what the model can say and what the model actually knows. When we build enterprise conversational AI, we have to separate the generative engine (the part that formats the text, maintains the state of the conversation, and makes it sound human) from the informational engine (the actual, hard facts the business operates on).
When you don't build that separation, you run into two massive conceptual roadblocks:
1. The Knowledge Blending Problem. If you don't physically constrain the model's access to information, it will blend its parametric memory (the stuff it was trained on) with your brand's data. If a customer asks about a specific insurance premium, the model might start with your retrieved document, hit a gap in the information, and seamlessly fill that gap with a highly plausible — but entirely fictitious — number it pulled from its latent space. From a user experience perspective, it looks like a confident, factual answer. From a legal perspective, it's a disaster.
2. The Out-of-Scope (OOS) Failure. In a regulated sector, the phrase "I don't know, let me transfer you" is not a system failure. It is a highly successful, legally necessary outcome. We call this Abstention. The issue is that generative models are structurally allergic to abstention. They are designed to generate text. If a user asks a banking bot, "What's the best stock to buy right now?", the bot shouldn't just politely decline using a generative response that might accidentally include a caveat that sounds like financial advice. It shouldn't be generating an answer at all.
This brings us to the core thesis of designing for regulated AI: Compliance is not a generative output. It is a routing decision.
If you want a system that doesn't hallucinate your company's policies, you have to build an architecture where the system simply cannot reach outside of your approved knowledge base. Not won't. Can't.
The "fix the prompt" trap
So how does the industry usually solve this? With the cheapest, fastest, and most dangerously lazy fix available: We try to prompt our way out of it.
When the stakeholder sends that dreaded screenshot of a hallucinated loan offer, the immediate reflex is to open the system prompt and add a sternly worded rule. It usually looks something like this:
You deploy the update. You run a few test queries. The bot politely declines to answer questions about the weather, it sticks to the pricing sheet, and it says "I don't know" when asked about unreleased products. It works. The ticket is closed. You high-five the team.
And then week three in production hits.
A user comes in with a messy, multi-intent question, complains about their current interest rate, and suddenly your bot is back to inventing promotional tier upgrades that don't exist.
The twist here is that the prompt didn't break. It just fell apart under the reality of how language models actually work. You tried to solve a structural engineering problem with a strongly worded Post-it note.
You cannot ask a model to simply promise not to lie. You have to build a system where lying isn't on the menu.
The solution: architectural grounding
If we accept that we cannot prompt a model to be factual, it forces a complete mental shift in how we build these systems.
You have to stop thinking of the LLM as the "brain" of your operation. In a regulated environment, the LLM should be treated more like the vocal cords. It is there to format, to synthesize, and to maintain the conversational state. It is not there to store or retrieve facts.
The right approach requires architectural grounding. This means physically separating the sources of knowledge from the model's generative capabilities. You don't ask the agent to "keep the approved documents in mind." You build a pipeline where the model is functionally blind to anything outside of the exact evidence package you hand it for that specific conversational turn.
From "Generating" to "Reporting"
In a poorly designed system, a user asks: "Can I get a $50,000 business loan?" The system retrieves a chunk of text about business loans, feeds it to the LLM, and the LLM synthesizes a response, leaning on its parametric memory to fill in the blanks. It acts like a consultant guessing the answer based on a brochure.
In a grounded architecture, the process is entirely different.
First, the system classifies the intent. It realizes the user is asking about loan limits. The retrieval system queries the approved database. The database returns a strict, deterministic fact: [LOAN_LIMIT: MAX $30,000].
Now, the system passes a highly constrained task to the LLM. It doesn't ask the LLM to answer the user's question. It asks the LLM to report the retrieved fact in a conversational tone. The instruction isn't "be helpful." The instruction is: "State the maximum loan amount is $30,000. Do not add additional information."
If the model tries to say $50,000, a secondary evaluator (another smaller model or a deterministic script) can instantly flag that the output does not match the retrieved fact, block the response, and trigger a safe fallback.
Abstention by design, not by choice
The most powerful feature of a grounded system is how it handles the unknown.
Let's say the user asks: "Do you offer crypto-backed loans?" In a prompt-only setup, the model searches its context, finds nothing about crypto, panics because it wants to be helpful, and writes a three-paragraph essay about the volatility of Bitcoin before saying "probably not."
In an architecturally grounded system, the retrieval mechanism searches the approved knowledge base for "crypto." It returns a null result.
Here is the crucial insight: At this point, the LLM is cut out of the loop.
The system does not send the null result to the generator and hope it says "I don't know" politely. Instead, the system intercepts the flow. Because there is no approved knowledge, the system triggers a hardcoded, deterministically routed action. It serves a pre-written, legally approved string: "We do not offer cryptocurrency services. We only provide traditional fiat business loans."
No tokens are generated. No risk is taken. The lack of knowledge isn't a generative accident; it is an explicit, system-level routing decision.
When you build like this, you trade a little bit of conversational fluidity for a massive amount of control. You are building deterministic boundaries inside a probabilistic system.
It solves the auditability problem instantly. If a user receives a wrong answer, you don't have to guess what the LLM was "thinking." You just look at the logs. Did the retrieval system pull the wrong document? Did the router misclassify the intent? Did the generator ignore the constraint evaluator?
You have a trace. You have evidence. You have an architecture that treats brand safety as a hard engineering requirement rather than a polite suggestion.
The Approved Knowledge Architecture
So how do you actually build this? You don't just glue a vector database to an LLM, write a system prompt, and call it a day. That is a prototype, not a production system.
To survive in a regulated environment, you need a pipeline that enforces these boundaries at every step of the conversation. I use a structure I call the Approved Knowledge Architecture.
It removes the model's ability to improvise by breaking the system down into three distinct, controllable layers: the Knowledge Base, Stage-Level Grounding, and the Out-of-Scope Policy.
Layer 1: The Approved Knowledge Base
When most people say "Knowledge Base" in the context of AI, they mean a messy vector store filled with scraped websites and massive, unedited PDFs.
That is your first point of failure. If you feed the system garbage, you will get highly articulate, confidently hallucinated garbage out.
An Approved Knowledge Base is not a data dump. It is a strictly governed taxonomy of facts. Every piece of information in this layer has been vetted, categorized, and mathematically separated. Instead of a 50-page pricing PDF, you have specific "Items" (e.g., Tier 1 Pricing, Enterprise SLA) organized into "Categories" (e.g., Offer, Support, Compliance).
More importantly, every item is tagged with a scope. This dictates exactly who can access it and when. If a piece of knowledge isn't explicitly approved and tagged in this layer, the system physically cannot retrieve it. It doesn't exist to the agent.
Layer 2: Stage-Level Grounding
This is where we fix the "context illusion."
In a standard setup, every time the user asks a question, the system searches the entire knowledge base. If a user says, "What's the cost?", the system might retrieve the price of a consumer loan, the price of a corporate credit card, and a marketing blog post about "the cost of doing nothing." The LLM gets confused and mashes them together.
Stage-Level Grounding solves this by making the retrieval context-aware. You define the stages of your conversational flow (e.g., Greeting, Qualification, Pricing, Closing). The system only grounds the model in the knowledge approved for that specific stage.
If the conversation is in the Pricing stage, the retriever is locked into the Offer category. It is completely blind to the Support or Marketing categories. By narrowing the aperture of what the model can see at any given turn, you drastically reduce the cognitive load on the LLM and eliminate the chance of it cross-contaminating facts. You aren't just controlling what the model reads; you are controlling when it reads it.
Layer 3: The Out-of-Scope (OOS) Policy
This is the most critical layer for brand safety, and it is the one most companies completely ignore.
What happens when the user asks a question that has no answer in the Approved Knowledge Base? Or what happens if they ask something malicious, off-topic, or highly regulated — like asking for specific investment advice?
You do not let the model handle it.
The Out-of-Scope Policy is an explicit routing mechanism that bypasses the generative engine entirely. When the retrieval system hits a wall, or an intent classifier flags the topic as restricted, the system triggers a hardcoded action.
You define exactly what that action is:
- The Prescribed Response: The system outputs a pre-approved legal string. "I am a qualification assistant and cannot provide financial advice. Please consult your advisor."
- The Redirect: The system forcefully steers the conversation back to the active stage. "I don't have information on our future product roadmap, but I can help you set up your current account. Would you like to proceed?"
- The Escalation: The system silently creates a ticket and hands the thread over to a human agent, ending the AI's involvement.
This layer is the ultimate fail-safe. It guarantees that when the system doesn't know the answer, it fails predictably, safely, and quietly.
If there is one thing you take away from this architecture, let it be this: In a properly designed conversational AI, a lack of knowledge is a strict architectural decision. It is never a generative accident.
How it looks inside Bonsai
If you are building this by hand, gluing together Python scripts and LangChain components, it gets messy fast. In my daily work, I use Bonsai. We designed specific modules in the platform precisely to enforce the Approved Knowledge Architecture, stripping away the LLM's ability to "guess" and replacing it with deterministic injections.
Here is exactly how you map the architecture into Bonsai using its core modules.
1. The Knowledge Base: Routed RAG, not blind RAG
Most people treat RAG like a search engine: the user asks a question, the system runs a vector search across a massive pile of documents, and whatever comes back gets dumped into the prompt. That is how you get hallucinations based on bad context.
Bonsai's Knowledge Base module is built differently. It relies on a hybrid search engine (Semantic Routing + Embeddings) tied to explicit Keys. Instead of overloading your System Prompt or relying on blind vector proximity, you categorize your static knowledge, long documents, and structured data under these Keys. When a user asks a question, the system evaluates the intent and semantic routing triggers the specific Key.
Only the exact, verified fragment of knowledge required for that specific turn is injected dynamically into the {{knowledge}} tag in your prompt.
Why does this matter? Because of the boundary. The LLM cannot "accidentally" read the consumer pricing tier when talking to an enterprise client, because the enterprise Key was triggered and the consumer data simply wasn't passed to the {{knowledge}} tag. It drastically saves your context window, and more importantly, it physically isolates the model from irrelevant data.
2. Context Modules: Dictionary vs. Glossary
To stop a model from lying, you have to separate hard facts from linguistic understanding. Bonsai splits the Context module into two distinct sub-modules to enforce this:
The Dictionary (Hard Constants). Do not ever let an LLM generate a number from its own memory. Interest rates, maximum loan amounts, product names, and penalty fees do not belong in a prompt. They belong in the Dictionary. This is a repository of project constants with strictly defined data types. If the current promotional APR is 5.9%, it lives here. It is injected directly into the prompt as a variable. If the compliance team changes the rate to 6.1% on a Tuesday morning, you update it in the Dictionary (or via API), and the agent is instantly grounded in the new reality. Zero prompt engineering required.
The Glossary (Dynamic NLP Definitions). LLMs have their own generalized understanding of words, which often conflicts with your highly specific corporate jargon. To the LLM, a "grace period" might mean a month. To your bank, it means exactly 15 days. The Glossary is an intelligent NLP dictionary. Based on keyword matching or AI text analysis of the user's input, it dynamically appends your approved corporate definitions to the end of the prompt under the {{glossary}} tag. The model doesn't just get the user's question; it gets a forced, real-time education on exactly what the terms in that question mean in your business context.
3. Prompting for architecture: Limitations & Golden Rules
Once you have the data structurally isolated via the Knowledge Base and Context modules, your System Prompt completely changes. It stops being a database of facts and becomes a set of routing instructions.
In Bonsai, we structure these instructions using explicit Limitations and Golden Rules. You aren't asking the model to "be a nice assistant." You are giving it a behavioral contract.
Limitations are hard negative constraints tied directly to the injected data tags. You aren't asking the model to be careful; you are telling it exactly what it is forbidden to do unless specific data is present:
Golden Rules dictate the structural flow, particularly for your Out-of-Scope Policy. This is how you force abstention at the prompt level — after the routing layer has already done its job:
When you combine these techniques, the system becomes incredibly rigid — in a good way. The prompt sets the rules, the routing engine fetches the exact facts, the tags inject them safely, and the LLM acts merely as a linguistic synthesizer.
You aren't hoping the model tells the truth. You are architecting a system where it simply has no other materials to work with.
Real world: from speculation to qualification
Let's look at what this looks like in the real world. A B2B SaaS client had a standard goal: use an AI agent to qualify incoming leads via chat. It sounded like a simple "marketing project" until they realized their agent was effectively serving as a rogue sales engineer.
The Problem: The "Know-it-all" Agent. Before we stepped in, the agent was performing well on generic small talk but failing on high-stakes business questions. Because it had been trained on the entire internet's worth of SaaS documentation, it felt perfectly comfortable "filling in the gaps."
When a lead asked, "How long will it take to migrate our 5TB database?" or "What are the hidden costs of your API integration?", the agent didn't have access to the client's internal migration protocols. Instead, it pulled "general knowledge" from its training data. It invented migration timelines. It speculated on integration costs. It was being "helpful," but it was also providing false consulting advice that the sales team then had to spend hours cleaning up.
We rebuilt the agent's brain using the architecture we've been discussing. We didn't tell the agent to "be more careful" when talking about pricing. We locked it down.
- Strict Taxonomy. We set up a Knowledge Base with exactly three categories: Offer, Qualification Process, and FAQ. We didn't allow the agent to look at anything else.
- The Pricing Guardrail. We implemented a hard routing rule in the Out-of-Scope Policy. Any query related to custom integration costs or specific implementation timelines was automatically tagged as
OOS_PRICING.
- Deterministic Injection. We moved all valid data — like standard feature sets and qualification criteria — into the Dictionary. When the AI needed to schedule a call, the agent didn't "think" about availability. It used a webhook with office hours and timezone from the Dictionary to pull real available slots directly from the external calendar.
The Result: From Speculation to Qualification. The transformation was immediate. When the agent hit a question about migration timelines — something it didn't have specific, authorized data for — it no longer hallucinated a "three-week window."
We didn't "fix the model." We simply stopped treating the model like a human consultant who needed to be smarter. We started treating it like a system that only speaks when it has the right data in front of it.
That is the difference between a cool AI toy and a piece of enterprise software that you can actually trust with your brand.
