Building a Vocabulary for AI Assurance
Part II: Establishing Ground Truth via Verifiability and Accuracy
Part I in our series discussed the “why” behind model behavior, delving into the notions of explainability and interpretability. In some ways, however, beginning by investigating the reasons that a model produced a specific output is putting the cart before the horse:
It only makes sense to analyze the relationship between a model and its output if one has certainty that the output in question was authentically produced by the model in response to a given input.
The usability of the output is a function of its correctness; a model that routinely outputs factual errors or logical inconsistencies is of lower value and less worthy of investigation.
This post dives into these two prerequisites: (1) Verifiability and (2) Accuracy.
Verifiability
Defining Verifiability
Verifiability is the ability to prove the authenticity of model processes such as model inference or model training. In the case of inference, the party that is running inference provides a proof that the output generated did indeed come from the model it is claimed to have come from. These proofs can then be cryptographically confirmed by the model user, conferring trust in the legitimacy of the model output and enabling the user to base important decisions on the model output.
Verifiability Use Cases
One area of verifiability is within the scope of model training data. There are a few reasons why proving properties about the training process, including which datasets were used for training, would be valuable.
First, while LLMs perform well on topics covered in their training data, they may still produce responses with the same level of certainty on topics outside the scope of their training data. Validating whether a topic is part of a training dataset can increase confidence in responses and give the user a sense of when tool calling of external resources is necessary. Moreover, for data rights holders, being able to discern whether copyrighted material was used to train a model that produced a given output allows for airtight auditability. This is particularly important with high value data multimodal assets such as audio and video.
In the case of verification of model inference, an application powered by a specific model generates a proof that the model’s output indeed originated from the model that the application claims to be running. The application user then verifies the validity of the proof to gain assurance that the model hosted by the application is as claimed. For high-consequence use cases such as loan approvals or deployment of military resources, it is critical to ensure that decisions are a product of the same models that have been rigorously tested for these specific purposes.
Verifiability Technology
Verifiable proofs can be generated and validated via cryptographic algorithms such as zero-knowledge proofs or hardware-based guarantees such as trusted execution environments. The surge in AI over the past few years has motivated advances in these technologies, enabling them to be deployed in an increasing number of situations. Since inference is less computationally intensive than model training, inference verification tools are more readily usable today.
Why Verifiability matters
In a nutshell: Given the opacity of third-party model providers and the differences in model performance across providers, confirmation of a model’s identity is critical when the model’s output is being used for high-consequence decisions.
Accuracy (and hallucinations)
Defining Accuracy
Accuracy means giving correct answers to objective questions and being truthful when making assertions.
Why Accuracy matters
Models need to give correct information if we are going to rely on them to make decisions. Moreover, LLMs tend not to proactively share their level of certainty in a response; they are equally confident about a simple fact that they have looked up from a reputable source (e.g., via tool use) as they are about a result they have hallucinated out of thin air. Because it can be challenging to gauge an LLM’s confidence, it is doubly important that LLMs have high levels of accuracy when they make declarative assertions.
Failure Mode Examples
While some model inaccuracies are straightforward to spot, others are subtle enough to evade casual inspection and can persist undetected in production systems.
Models may give objectively incorrect answers to factual or logical questions, particularly when the correct answer is rare, counterintuitive, or underrepresented in training data. In high-stakes settings, even small factual errors can cascade into materially wrong decisions.
A more insidious failure type is fabricated supporting evidence. Models may cite books or other resources that do not exist, or confidently attribute claims to real sources that never made them. This problematic behavior has already manifested in academic journals.
Even when legitimate sources are referenced, models may misrepresent their contents, selectively or inaccurately summarizing them in ways that support the model’s conclusion rather than reflect the source material. Because the citations look plausible, these errors are often only discovered through careful manual verification—if at all.




