Field Note6 min read4 December 2025

The 6-Month Lifecycle of an LLM Wrapper

They demo well. They deploy quickly. And then, around month three, something quietly stops working.

The pattern is consistent enough now that we can describe it before it happens. A company selects an LLM-based tool — chatbot, document processor, internal assistant, customer-facing agent. The demo is impressive. Response times are fast. The vendor's team sets it up in a few days. Month one feels like progress.

Month two is when the edge cases start arriving. The inputs the vendor didn't demonstrate. The query that produces a confident but wrong answer. The multi-step workflow the tool was supposed to handle but routes incorrectly. In most cases, the organisation absorbs these quietly — adds a human review step here, a manual workaround there. Nobody wants to admit the thing they bought isn't working.

By month three, the maintenance load is visible. Someone on the team is spending meaningful time correcting outputs, managing prompt changes, and reporting issues to the vendor. That person's time cost has now erased the efficiency gain the tool was supposed to provide. By month six, one of two things happens: either the tool is silently abandoned, or it becomes a fixture — used in a narrow, controlled way that bears no resemblance to what was originally proposed.

The vendor's position, when challenged, is almost always the same: the inputs need to be cleaner, or the prompts need tuning, or a new model version will fix it. These are rarely lies — they're just not the real problem. The real problem is that the tool was evaluated against idealised inputs, and the organisation's actual operations don't produce idealised inputs. They never do.

The due diligence question that almost never gets asked before purchase: 'Can we see it handle the twenty most problematic scenarios we deal with every month?' If the vendor can't answer that in a demo, you're buying the idealised version, not the deployed version.

Key observations

LLM wrappers perform well on structured, predictable inputs — and poorly on the actual variety of real operations
Human review as a workaround is a cost that rarely appears in ROI calculations
The vendor's implementation team understands the tool; they rarely understand the client's operations
The failure is usually quiet, not a crash — which is why it's commonly absorbed rather than addressed
Month-three maintenance cost is the number that changes the economics of most AI tool purchases

The question to ask before any LLM deployment: what happens when the input is not what the demo assumed?

Simple 5

This piece is based on patterns observed working inside operations — not research reports or industry surveys. We write from what we see.

More pieces

Forward Look

What Good AI Governance Looks Like for a 30-Person Company

You don't need a committee. You need three clear decisions documented where everyone can find them.

Read

Market Reality

Why AI Projects Fail in Singapore

It's not the technology. Most AI projects fail before a single tool is deployed.

Read

Diagnostic

Three Signs Your Business Isn't Ready for Automation

Before any tool is selected, three operational conditions predict whether automation will help or hurt.

Read

Field Note

Why Agencies Cannot Automate to a Deadline

They promise clients faster turnaround. Then they discover their own process cannot survive the rigidity of automation.

Read

If this resonates, there is a structured next step.

No deployment starts without passing our Readiness Assessment.

Request a Readiness Conversation