Two years ago, the debate was whether AI-generated code was reliable enough to ship. A year ago, it was about which copilot to use. Today, the question is different: how do you restructure an entire product development lifecycle around AI agents?
AI works best when you go back to waterfall
The return to structured, linear thinking wasn't planned. But it makes sense once you see it.
AI agents produce better output when they have clear context and clear boundaries. Vague prompts produce generic code. Detailed specs produce code that fits your codebase, matches your design system, and follows your conventions.
Our Applied Product Discovery cycle is like a compressed waterfall. Two weeks of discovery, two weeks of design, eight weeks of delivery. Linear, structured, with clear phases. But compressed into 12 weeks instead of 12 months (or longer).
We frontload all the context and decisions before delivery starts. So when the agents get to work, they have everything they need. And therefore, there is no need for guessing, improvising or back-and-forth.
You’re probably spending 60 to 80 percent of your time on admin
When we mapped every step in our process, we found that the majority of the work wasn't the work. It was writing up outcomes, creating client documents, structuring actions, and transferring context between people. Emails, Excel files, handoff decks.
Necessary, but not where value gets created. For us, the value happens in the conversations with clients. Digging into their domain, their priorities, and their constraints. That's where product decisions get made.
So we automated the admin layer and reinvested the time. Our client sessions went from two hours to four. More time on thinking, less on paperwork.
A context layer that prevents handoff decay
The biggest problem in any product development process is context decay. Every handoff loses information. Between discovery and design. Between design and engineering. Between one meeting and the next.
We built a central context layer using AI agents and MCP servers. The agents connect to Miro, where discovery sessions happen, pull the unstructured data, and structure it automatically. Product briefs, feature boards, priority matrices. That structured context flows to Figma, where the design agent scaffolds real components from the existing design system. From there, everything moves to Linear as engineering tickets with acceptance criteria and design references.
No human copying information between tools. The context flows. It doesn't decay.
This also unlocked something we couldn't do before: exploring five solution directions simultaneously. Each one gets prototyped against the real design system and discovery data. Then we review all five with the client and pick the strongest one. Building five prototypes manually would have taken weeks. Now the agents handle the scaffolding and the humans handle the judgment.
Spec-driven instead of task-driven
The way development work gets defined has changed too. Traditional tickets describe acceptance criteria: when is this done? When it can do X, Y, and Z.
We've moved to specs. Instead of listing what a feature should do, we describe the intended outcome as a complete workflow, including edge cases. The agent works through that spec and validates its own output against it. This only works because the context from discovery and design is already loaded. The agent isn't guessing. It's executing against a clearly defined target.
What actually goes wrong
The failure mode isn't that AI writes bad code. It's that AI writes plausible code.
Code that runs. Code that passes a casual review. But underneath, it's making assumptions that don't hold, referencing APIs that don't exist in your version, or solving one problem in a way that breaks three others.
We've seen agents confidently use documentation from two years ago. We've seen them invent database columns. We've seen them build elegant solutions that completely ignore the existing architecture.
The real danger: as AI output quality improves, review discipline drops. Teams start trusting the agent. They skim instead of reading. They approve instead of questioning.
We call this AI slop. Preventing it is now as much a part of the engineering job as building features. Three layers keep it in check. The agent writes. The automated reviewer filters. The human engineer makes the final call.
The best engineers in an agentic workflow aren't the best coders. They're the best thinkers.
Your organisation probably isn't ready
Everything above only works if your foundations are in order.
No design system means agents produce inconsistent designs. No modular codebase means unmaintainable output. No structured documentation means the context layer has nothing to work with.
One of our clients wanted to adopt agentic development, but their engineering team was three years behind on refactoring a legacy codebase. Before they could build new features with AI, they needed to get the existing code into a workable state.
The approach: write a comprehensive test suite against the old codebase, then let agents refactor to a modern stack while validating against those tests. The tests become the source of truth. If the refactored code passes, you know it works.
The companies getting the most out of AI-enabled development invested in the boring stuff first. Clean architecture, documented conventions, modular code, a design system that agents can extend.
The tool is only as good as the system you build around it.
Agentic Delivery
Stay Updated
Get the latest insights on product development, AI innovation, and design strategy delivered to your inbox.