Verification Debt: Why AI's Speed Creates Technical Risk

Home » Blogs/Events » Verification Debt: Why AI’s Speed Creates Technical Risk

I’ve been writing code, managing software teams, and running software companies for almost 50 years. The scope of my efforts has ranged from bestselling entertainment products to test tools that validate many of the consumer and business products you use every day. The most impactful technical development in my long career is AI code generation, which is rapidly reaching the point where I doubt I will ever need to write another line of code.

This is good news and bad news. The good news is obvious — anyone with the ability to think logically can prompt an AI to generate an application to satisfy whatever need they have. The bad news is that AI-generated programs don’t always live up to our quality expectations. The paradox is that the slow, tedious act of writing code is one of the very things that helps ensure software lives up to those expectations. This is one of the great unsolved problems with AI code generation, and what I’d like to explore with you here.

Verification Debt: A Simple Example

Let’s start with a concrete example. I decide it would be interesting to get AI to help me with my stock trading. My brokerage firm has a trading API, so I put together a simple prompt: “I want to make money in the stock market. Write an application that monitors market conditions and makes buy and sell decisions using my brokerage company’s API.”

A few hours later I have a shiny new application all ready to make me rich. The problem is that the AI had to make hundreds of assumptions about my intent. Technical or fundamental analysis for market conditions? Stocks only, or options and short sales as well? What are the buy and sell thresholds? Should holdings be diversified, or can everything go into a single position? And on and on.

I try the application at small scale and it seems to work fine, but I worry. I look at the code, but it’s hard to follow. I write some basic test cases, but the best they can do is confirm the application behaves consistently — not that it’s behaving correctly. I start increasing the size of the investments I allow it to make. You can see where this is going: at some point, an incorrect assumption baked into the AI’s interpretation of my vague prompt gets triggered — with financial consequences.

What I just described is “verification debt” — the accumulated cost of not verifying that the assumptions AI made while interpreting a loosely worded prompt are actually correct. The bad news is that no matter how much effort you invest in resolving that debt after the fact, some level of ambiguity will persist.

The Specification-Driven Solution — and Its Problems

Today’s conventional wisdom is “specification-driven” AI development. The idea is that you create a machine-readable definition of exactly what you want your application to do — not user interface choices, but business rules, inputs, outputs, and required behaviors. The specification becomes a contract between you and the AI, defining unambiguously what you want, and it also becomes the basis for verifying whether what AI produced is correctly implemented.

Sounds great in theory, but there are two significant problems. First, defining a specification with this level of precision requires skills that are genuinely rare in the software industry. It is hard. Second, to fill in all those details, you need deep domain knowledge upfront — which is rarely available when you’re building something new. What most teams produce is a “specification” that reads more like a verbose market requirements document than a true machine-readable specification. An improvement, certainly, but it doesn’t move the needle much in terms of getting deterministically high-quality output from AI.

How Things Worked Before AI

I’ve done countless projects where I started with zero domain expertise — mesh networking, 3D imaging models, non-trivial technical spaces. Speaking from personal experience: the programmer is not just a code producer. They’re a knowledge acquisition system.

The act of implementing a feature forces direct confrontation with domain reality: an unanticipated edge case, a data structure that turns out to be wrong when you try to use it, two requirements that appear independent until you try to satisfy both simultaneously. Each of these is a feedback signal that updates the developer’s mental model of the domain. The code that emerges is not just an implementation — it’s crystallized learning. In this model, the specification is never really complete upfront; it’s continuously refined by the act of implementation.

The Paradox with AI-Generated Code

With AI-generated code, this feedback loop is disrupted. The AI produces a plausible implementation faster than you would have encountered the domain friction that would have taught you something. The learning that should have happened during implementation either doesn’t happen at all, or surfaces later — expensively — in QA or production.

There is no direct AI analog for the organic domain knowledge accumulation that traditional development forces. Two practical approaches are emerging. One is to let the specification and code co-evolve through deliberate iteration: use AI to implement thin vertical slices — one complete workflow end to end — and let the friction those exercises generate inform the next iteration of the specification. The other is to use AI as a domain interrogator: rather than asking it to generate code, ask it to challenge your current specification, identify ambiguities, propose edge cases, and surface constraint conflicts. The AI’s broad domain familiarity makes it surprisingly useful in this role.

Neither approach fully resolves the paradox. If the human is not doing the implementation work, domain understanding may never fully form. This points to a discipline that needs to run alongside AI-assisted development: deliberate knowledge capture — not just writing specifications, but writing annotated specifications that record why constraints exist, what alternatives were considered, and what domain realities forced specific design decisions. This is the artifact that preserves hard-won understanding and makes future iteration tractable.

Tools and Methodologies

There are numerous methodologies and tools evolving to help define machine-readable specifications. Frameworks as accessible as structured “who, what, where, and when” statements — or Gherkin scripts if you work in the testing domain — provide a practical starting point. Tools such as OpenAPI, Arazzo, and Prism can help structure and maintain these specifications. All of these are useful, but none resolves the fundamental paradox: how do you write a detailed machine-readable specification when you don’t yet have a full grasp of the domain?

Here is the point I want to leave you with: you need a specification, and a very good one. The specification is more valuable than the code AI generates from it. It is the source of truth, the basis upon which you will develop test cases that determine release readiness, and the foundation upon which future changes will be defined. The code can be regenerated. A well-crafted specification cannot be replaced.

Four Actions to Navigate the Paradox

Find a trusted partner who has done this before. Deep specification work is not part of most companies’ DNA. It is a learned skill, and the faster you acquire it, the more successful your AI-assisted development will be.
Identify who in your organization can actually write a specification. The test is simple: hand them a two-paragraph market requirements document and ask them to transform it into a set of unambiguous requirements. Most people will stare at it with no idea where to start. That tells you what you’re working with.
Treat specification development as iterative, not a one-time event. Structure your early AI coding tasks specifically to generate domain friction — the kind that forces you to refine the specification.
Get your hands dirty early. Perfection is not the goal. The goal is flushing out ambiguity. Every addition to your specification that eliminates an implicit assumption is a move in the right direction.

This is a topic we’ll keep returning to as the tools, practices, and pitfalls continue to evolve. If you want our take as things unfold, subscribe to the Quality Trail newsletter.

Let’s Compare Notes

Tell us what you’re working on. We’d be happy to connect.

Author: Jim Zuber

Jim Zuber, Chief Technology Officer

Over the past three decades, Jim Zuber, co-founder and Chief Technology Officer at QualityLogic, has established himself as a leading innovator in developing testing standards and methodologies for industries such as smart energy, imaging, telecommunications, and software technology.

Jim’s career in technology began with his role as co-founder and CTO at Blue Chip Software, where he developed the official simulation software for the American Stock Exchange, co-branded with the Amex itself. The company was successfully acquired by Compton’s New Media in 1986. Following this, Jim co-founded and served as president of Genoa Technology, guiding it to prominence as a trusted provider of test solutions within the computer and telecommunications sectors. The merger of Genoa Technology with Revision Labs Inc. laid the foundation for what would become QualityLogic.

At QualityLogic, Jim has been instrumental in architecting innovative testing products and solutions that have set industry benchmarks. His expertise in creating practical and effective testing methodologies continues to shape QualityLogic’s reputation as a leader in QA and software testing standards.

Jim remains actively engaged in advancing QualityLogic’s technology initiatives, regularly contributing insights and thought leadership in AI technologies, software testing, and industry standards.

Verification Debt: Why AI’s Speed Creates Technical Risk