Don’t Call It a Failure: A Business-Agility Reading of the “95% of AI Pilots” Story
If you have ever worked in PR or social, you know the feeling. We spent years debating ROI for activities that clearly mattered but did not fit neatly into last-click spreadsheets. Now we have a once-in-a-generation capability, and some are ready to declare defeat because the P&L did not move in six months. That is not how transformation is measured. It is how halftime is misread.
To be clear: the State of AI in Business 2025 report from MIT’s Project NANDA is worth your time. It’s made waves with a striking figure—95% of enterprise GenAI pilots haven’t delivered measurable P&L impact. The authors also label this work as an early snapshot (January–June 2025), which is important context. Early data, transparent limitations, and a conversation worth having. From a business-agility perspective, the conversation is not “Is AI failing?” The conversation is “Are we running the work in a way that creates measurable flow, safe learning, and compounding value?”
Below is the same storyline, retold with business-agility lenses.
What an Agile Organization Asks First
- Who is the customer of this pilot, and what problem are we solving for them today? Define the user, the job to be done, and the pain you are trying to remove this quarter.
- What is our hypothesis and what would disprove it? Write it down. Choose the smallest slice that can test it in production-like conditions.
- What evidence will we accept before P&L shows up? Flow and quality are the leading indicators. Finance is the lagging proof.
When those three questions are explicit, pilots stop being demos and start being experiments.
Six Months Is Not a Verdict, It Is a Cadence
Six months equals a handful of sprints with room for two or three inspect-and-adapt cycles. That is enough time to learn about permissions, routing, data quality, latency, handoffs, exception paths, and human-in-the-loop. It is not enough time to rewire multiple core workflows, retrain large teams, harden guardrails, and push improvements all the way to audited P&L. In agility we timebox to learn, then decide to scale or stop based on evidence, not on optimism.
Measure Flow First, Finance Next
Direct profit is the destination. Flow tells you whether you are moving toward that destination. Treat these as agility leading indicators that should move in months 1 to 6:
- Lead time from request to result
- Produttività per week for the target workflow
- Rework rate e exception rate
- Escaped error rate e defect containment
- Adoption: assisted tasks per user per day, active minutes in the workflow
- Risk posture: flagged issues reduced, review time reduced
- Customer outcomes: response time, first-contact resolution, CSAT or NPS deltas
If these signals improve and remain stable, the P&L generally moves between months 9 and 18, which is when scale begins and setup ends.
From Demo Theater to Workflow Reality
Agility favors working solutions in real paths over polished demos. Three practical shifts turn pilots into value delivery:
- Value slicing: release a narrow, end-to-end slice that touches the system of record and the approval path.
- Definition of Ready and Definition of Done: no work enters a sprint unless data access, privacy constraints, and success metrics are clear; no work is done until telemetry, audit trails, and rollback are live.
- Guardrails, not gates: security, risk, legal, and compliance sit in weekly reviews with product and operations. The objective is to design safe defaults that enable flow, not to pause work until the quarter ends.
Organize for Learning, Not Heroics
- One owner, one workflow, one data source for the first slice. Reduce coordination drag.
- Cross-functional team: product, operations, data, engineering, risk, and finance see the same board and the same metrics.
- Limit WIP: stop starting and start finishing. Too many pilots create false positives and thin learning.
- Weekly retros: surface blockers early, adjust scope, and rotate one small improvement per week into the Definition of Done.
The Right Scoreboard for Month Six
Executives should expect a two-line scorecard at the six-month mark:
- Flow and quality: the leading metrics listed above with before-and-after deltas and stability bands.
- Finance translation: hours avoided, error costs avoided, cycle time value released, revenue capture unlocked, risk reduction quantified. These are not GAAP yet. They are the audited trail that justifies scale.
If the flow line is up and stable, and the finance translation is credible, scale. If not, stop or rescope. Either outcome is success because you learned at low cost.
Why the “95%” Headline Can Be True and Misleading
It can be true that most pilots did not show direct P&L in six months. It can also be misleading if those pilots were not designed as agile experiments with explicit leading indicators, working slices, and weekly inspection. Agility does not promise instant profit. It promises faster truth. That is exactly what leaders need.
A Friendly Challenge to Colleagues
Before we declare the technology a failure, let us adopt an agility scoreboard and cadence. Write the hypothesis. Slice the value. Measure the flow. Invite Finance and Risk into the retro. Decide based on evidence. Then repeat.
Your turn: What is one flow metric you trust and one cadence habit that kept your pilot honest? Please comment on our LinkedIn Article!
#StateofAI2025 #BusinessAgility #ContinuousImprovement #AIROI #ChangeManagement