A Realistic Roadmap for Agentic AI in Claims

Chris Brown

07 Apr 2026 — 8 min read

Authors: Chris Brown - The Build Paradox, Mike Daly - Insurtech World
Published: 08/04/2026

Over this series, I have examined the pressures driving change, the claims spectrum, what "agentic" really means, the production gap, the data fitness question, measurement challenges, and the automation trap. This final article addresses the practical question: given all of that, what should you actually do?

In this article:

Lessons That Shape Everything Else: the four things any practitioner should get right before deploying AI in claims.
Match Approach to Claims Complexity: mapping your deployment model to the complexity spectrum.
The Regulatory Reality: SM&CR accountability, Consumer Duty, and why the FCA already has the frameworks it needs.
Why This Takes Longer Than You Think: the structural reasons AI deployment timelines exceed traditional software projects.
The Honest Conversation: What to tell your teams and why pretending change has no human cost is dishonest.
The Capability Gap: why claims expertise and AI expertise are not the same thing, and why accountability without capability is just liability.

1. Lessons That Shape Everything Else

These lessons come from building enterprise systems across multiple regulated industries and from watching AI projects stall for reasons unrelated to the AI itself.

Invest in baseline measurement before anything else. This is the single most important preparatory step. Before any AI deployment, you need robust data on how your human processes perform. Not what your SLAs say. What happens. Decision consistency. Error rates by claim type. Handling times. Complaint rates. FOS referral patterns.

Without this baseline, you cannot measure whether AI improves, maintains, or degrades outcomes. You cannot answer the question that matters: "Is this better than what we had?" The FCA's December 2025 claims handling review found firms with MI that "lacked sufficient detail to prove meaningful discussion, challenge or decision-making" [1]. Do not be that firm. Build the measurement capability before you need to prove anything.

Build governance infrastructure before you need it. Comprehensive logging that captures inputs, outputs, and decision pathways is necessary but not sufficient. You also need defined thresholds for when to investigate, pause, or roll back. You need clear accountability for who reviews alerts and what actions they take. You need this operational before you go live, not planned for later phases.

The temptation is to treat governance as a checkbox exercise, something you can address after demonstrating value. That is backwards. Governance infrastructure is what enables you to demonstrate value safely. Without it, you are operating without visibility, and the FCA will eventually ask questions you cannot answer.

Start smaller than feels necessary. The temptation is to demonstrate value quickly by deploying across a meaningful volume of claims. Resist it. Start with a genuinely narrow scope and involve the claims team in choosing the use case: a single claim type, a single product line, a single geographic region. Learn how AI behaves in your specific operational context before expanding. Everyone can gain confidence from successful proof of value.

The governance burden of monitoring a narrow deployment is manageable. The governance burden of monitoring a broad deployment with inadequate infrastructure is overwhelming. Scaling is easier than remediation. Projects that expand too fast lose control of their own deployment, and recovering from that position costs more than starting conservatively.

Plan for continuous refinement, not project completion. Traditional projects have end dates. AI deployments do not. Model behaviour changes. Claim patterns evolve. Fraud techniques adapt. The system you validated six months ago is not necessarily the system you are running today.

Budget and resources for ongoing monitoring, calibration, and refinement. Not as "maintenance" but as a core operational activity. If your business case assumes you can go live and move on, your business case is wrong.

2. Match Approach to Claims Complexity

The starting point for any sensible AI strategy is matching your deployment approach to where your claims sit on the complexity spectrum outlined in Article 2.

High-volume, low-complexity lines like gadget replacement, routine pet claims, windscreen, and basic travel are where near-full automation is appropriate and arguably overdue. These claims have clear data inputs, limited judgment requirements, minimal litigation exposure, and straightforward customer outcomes. The customer experience benefit is clear: claims are resolved in hours rather than days. The cost-benefit is equally clear: handlers freed for work that actually needs them. The governance requirements are manageable: quality assurance through sampling and monitoring for divergence, but you are not making decisions that will end up in court.

Medium-complexity lines, such as standard motor without injury, routine home, and simple commercial, are where AI should augment experienced handlers rather than replace them. The AI handles intake, data extraction, and initial assessment. It identifies relevant policy terms and suggests coverage positions. It flags potential issues. But the handler reviews AI output and makes the decision. This is augmentation at scale: the AI makes handlers more productive while handlers provide judgment that the AI cannot reliably replicate.

High-complexity lines like injury claims, subsidence, large commercial, and professional indemnity are where AI can add value in bounded ways without attempting to automate decisions. Document gathering and summarisation. Policy wording retrieval. Similar claims identification for benchmarking. Medical terminology explanation. Draft correspondence for specialist review. What AI does not do in these lines: coverage determination, liability assessment, quantum decisions, or anything that affects claim outcome.

Extreme complexity and emerging lines such as cyber, terrorism, marine, and catastrophe are where the focus should be on data capture and analysis rather than on any form of automation. The training data for reliable automation simply does not exist. Novel situations are too common. The consequences of errors are too severe.

This spectrum-based approach is not just operational common sense. It aligns with how regulators expect you to demonstrate proportionate risk management.

3. The Regulatory Reality

The FCA does not have specific AI rules. What it has is an expectation that existing frameworks apply to however you deploy technology, and those frameworks have teeth.

Under SM&CR, someone is accountable. In dual-regulated insurers (those authorised by both the PRA and FCA, which includes all major UK insurers), technology systems typically fall under SMF24 (Chief Operations), while risk controls fall under SMF4 (Chief Risk). The "duty of responsibility" means these individuals can be held personally accountable if they fail to take reasonable steps to prevent breaches in their areas [2]. Deploying AI that makes coverage decisions without appropriate governance is not a technology risk. It is a senior manager's accountability risk.

Consumer Duty adds another layer. Firms must regularly assess, test, understand, and document the outcomes their customers receive. The FCA's June 2024 multi-firm review found firms "taking assurance that completion of, or lack of material findings from product reviews or fair value assessments, automatically indicated good outcomes were being achieved"[3]. The regulator was clear: completing a review does not demonstrate good outcomes. Evidence does.

The FCA has been explicit that it expects firms to embed AI within existing governance, accountability, and consumer protection obligations [4]. The technology is new. The accountability is not.

4. Why This Takes Longer Than You Think

This takes longer than traditional software implementation, and the reasons are structural rather than technical.

Traditional software projects are deterministic. You define requirements, build to specification, run acceptance tests, and deploy. The acceptance criteria are fixed. If the system produces the expected output for the expected input, it passes.

AI systems are different. The same input can produce different outputs depending on model state, context, and factors you may not fully understand. Acceptance testing becomes iterative: build, test, refine, repeat until you meet the benchmark you have set. And the benchmark itself requires careful thought. What "good" looks like for AI-assisted claims decisions is not always obvious, and getting organisational agreement on that definition takes time.

Then there is the organisational reality. Insurance does not move fast from a governance perspective. You need buy-in across claims operations, IT, risk, compliance, and the board. You need project governance that ensures all key parties are involved. The building may be fast (the 80/20 rule applies), but the remaining 20% of testing, refinement, and sign-off accounts for the majority of the elapsed time.

Proof of concept adds time, but it is time well spent. A POC lets you test, interact with stakeholders, gather feedback, and reduce investment risk. This is additional time before production, not a shortcut to it. When you move to production, you still face all the governance, testing, integration, and sign-off requirements. But you are doing so with evidence rather than assumptions.

Third-party solutions reduce your testing burden but do not eliminate it. If you are implementing a tried-and-tested vendor solution with existing benchmarks, you reduce the validation work. But you do not eliminate it. The vendor's benchmarks were run on their data, in their context. You still need to validate it works with your data, your edge cases, and your customer base.

Stakeholder alignment takes longer in regulated industries. SM&CR accountability concentrates minds but also slows decisions. Nobody wants their name on a sign-off they are not confident in.

Vendors often underestimate these factors because their timeline typically starts after you have achieved organisational readiness. That is where most of the elapsed time sits.

5. The Honest Conversation

Staff know what is happening. Claims handlers watching AI demonstrations can see the trajectory. However carefully you frame it, people are not naive. There will be a suspicion that they may be replaced, especially when reading about large layoffs in companies like Oracle. The best course is to be transparent and honest.

Some roles will change fundamentally. The claims handler of 2030 will not do the same job as the claims handler of 2020. That is not a threat. It is reality.

We do not fully know what the new roles look like. Anyone claiming certainty about how AI will reshape insurance operations is selling something. The hybrid roles that emerge (part domain expert, part AI supervisor, part exception handler) do not exist in current job descriptions.

Not everyone will make the transition. Some will thrive in AI-augmented roles. Some will find fulfilling work in parts of claims that remain human-centred. Some will need different paths. That is a genuine loss, and treating it as a footnote is dishonest.

The alternative is not the status quo. Competitors are moving. Customer expectations are shifting. The FCA is closely monitoring claims handling. "Do not change" is not an option, but pretending change will not have a human cost is not honest either.

6. The Capability Gap

Claims managers and handlers understand customer journeys, policy nuances, and regulatory requirements. They are essential domain experts. They are not, by training or experience, AI engineers.

Agentic systems require different skills: understanding model behaviour, recognising divergence indicators, calibrating trust in AI outputs, and knowing when to override. These are not intuitive extensions of claims expertise. The FCA's AI survey found 84% of firms have an individual accountable for their AI approach[5], but accountability without capability is just liability.

Retraining existing staff works for some. Hiring AI specialists addresses the technical gap but introduces domain-knowledge risk. Hybrid teams combining domain expertise with technical capability are the pragmatic answer, and the hardest to implement.

The Bottom Line

Agentic AI offers a genuine opportunity for insurance claims. The pressures driving adoption are real. The technology capabilities are real. The benefits for customers, operations, and handlers freed from administrative burden are achievable.

But capturing this opportunity requires clear thinking about where on the claims spectrum you are operating, what production deployment actually requires, how to measure success against realistic standards, and why full automation is a trap rather than a destination.

The insurers who succeed will not be the ones moving fastest. They will be the ones who invested in measurement before they needed it, built governance infrastructure before regulators asked about it, and accepted that sustainable deployment takes longer than demos suggest.

The uncomfortable truth that underlies this entire series is this. The technology question, can AI handle claims, is already answered. Yes, for many claim types, it can.

The business question is harder: can your governance, compliance, and organisational capability support AI handling claims?

If you cannot answer that question with evidence, you are not ready. And if you deploy anyway, you are not building a competitive advantage. You are accumulating regulatory debt that will eventually come due.

This is the final article in the series "Agentic AI in Insurance Claims: A Practitioner's Reality Check." The full series: