The AI ownership question

As AI becomes embedded in professional workflows, firms face a new set of decisions: What should you build? What should you buy? What expertise, data and processes should remain uniquely yours? From open-source blueprints for domain-specific AI to tax systems that learn from accountant corrections, this month's stories point to the same reality: Competitive advantage will increasingly belong to firms that are deliberate about what they own.

CPA.com’s new Build vs. Buy framework is designed to help leaders navigate those decisions with confidence.

 
 

What's in focus

The narrow expert

What's new:

Anthropic released Claude for Legal — a fully open-sourced reference implementation showing exactly how a general-purpose AI becomes a domain-specific expert. The repository doesn’t just demonstrate legal AI, it exposes the architecture: curated system prompts, profession-specific instructions, procedural guardrails, and persistent memory working in concert to produce a system that thinks, writes and reasons like a legal professional.

How it works:

The implementation layers a precisely engineered system prompt — encoding legal reasoning conventions, document drafting standards and jurisdictional awareness — on top of Claude’s base model, then adds memory to maintain matter context across sessions. No fine-tuning. No proprietary model. Just disciplined prompt architecture and structured context. The model doesn’t become a lawyer, it becomes a legal-workflow-aware tool that operates within the profession’s norms.

Behind the news:

This matters because Anthropic published the blueprints. The repository isn’t a product pitch, it’s a pattern — a replicable methodology for collapsing a general AI into a narrow professional expert. Every vertical with codified workflows, established best practices, and document-heavy processes is now looking at an open-source proof of concept for how to build its own version.

Why it matters:

Accounting (particularly advisory work) has similar structural prerequisites as legal does: defined professional standards, repeatable document types, judgment calls bound by regulatory frameworks, and a client-facing communication style with its own conventions. The question is no longer whether AI can be shaped into an accounting-aware expert. Anthropic just demonstrated that it can, in another profession with equivalent complexity.

We're thinking:

Anthropic open-sourced the methodology, not just the product. Any firm willing to do the hard work of encoding its own professional standards into a system prompt now has a working blueprint. The hard work isn’t technical — it’s the institutional self-knowledge required to articulate how your profession actually reasons, writes and makes judgment calls. Accounting firms that can do that will build genuinely useful internal tools. Those that can’t will end up buying generic AI products and wondering why they feel shallow.

The vibe-coding trap

What's new:

James Shore, a consultant who spent his career resuscitating late-stage startups buckling under their own codebases, published a sharp model on May 10, 2026, showing that AI coding agents go net-negative on productivity within roughly 40 months unless they cut maintenance costs in inverse proportion to the code they generate. Double output while holding per-unit maintenance steady and you’ve still doubled the upkeep tail; triple the output and you’ve tripled it. Shore’s blunt read of the current state: agents are increasing maintenance costs, not reducing them.

How it works:

Shore models every line of code as a perpetual liability. A wisdom-of-crowd estimate puts maintenance at roughly 10 days in year one and five days every year after, for each month of code shipped — forever, as long as that code exists. Run those numbers and a normal team crosses the 50% threshold — half its time spent on bug fixes, dependency upgrades and design cleanup — at month 31. Drop in an agent that doubles output but leaves maintenance-per-unit unchanged: an 85% productivity spike that erases at month 19 and goes net-negative at month 40, with the maintenance penalty persisting even after the agent is removed.

Behind the news:

This lands in the middle of accounting’s quiet build-vs-buy realignment. Mid-size firms are vibe-coding their own importers, reconciliation scripts, K-1 parsers and audit sampling tools — work that two years ago would have gone to a large integrator or a custom dev shop, while the major SaaS companies bolt AI onto their suites from above. Shore’s contribution is to put a number on the cost the profession hasn’t been pricing: the maintenance tail that sits on the balance sheet for a decade and doesn’t go away when the agent does.

Why it matters:

Accounting firms are uniquely exposed to this curve because they have no software organization underneath the tooling. There’s no on-call rotation when a bank changes its CSV export in March 2027, no regression suite when the IRS revises a form schema mid-season, no code owner when the associate who shipped the script moves to the profession. The tools also sit on sensitive workflows: a silent breakage in a depreciation script doesn’t crash — it produces a wrong number on a return. Multiply that across fifteen partner-pet utilities and the firm has built a shadow IT estate it cannot evaluate, audit or unwind.

We're thinking:

Firms aggressively vibe-coding right now are accumulating technical debt without recognizing it as debt — and the bill almost always arrives during busy season, three to five years out, as a quiet error in a deliverable rather than a dramatic outage. The move isn’t to stop building; the speed gains are real and the alternative is lock-in to vendor roadmaps that don’t fit the firm. The move is to refuse to ship anything internal without three things attached: a named owner, a kill date and a buy-the-vendor-version trigger if either gets breached. Treat each agent-built tool the way you’d treat an audit workpaper — dated, owned, reviewed, disposable. The firms that do this will out-leverage the buy-everything competitors and out-survive the build-everything ones; the rest will discover that “we made it ourselves” was a balance-sheet decision they didn’t know they were making. Access the recently published Buy vs Build Framework that dives into this topic more.

The correction trap

What's new:

OpenAI and Thrive Holdings built a tax-preparation system that gets measurably better every time an accountant corrects it. In a pilot run through Current (formerly Crete) Professional Alliance, a network of 30-plus firms, the system drafted roughly 7,000 returns (mostly 1040s and 1041s), hit up to 97% field-level accuracy, cut preparation time by about a third, and raised throughput by roughly 50%. The accuracy figure matters less than the trajectory behind it. The software improved through a feedback loop that incorporated accountant corrections: returns hitting 75% correct field completion climbed from 25% at launch to 86% six weeks later.

How it works:

Every time a human preparer fixes something the AI got wrong, the system records the full story behind that fix, from the source document to the value it pulled, where that value ended up on the return and the correction the accountant made. When the same kind of mistake keeps showing up, that pattern goes to OpenAI’s coding agent, Codex, which proposes an actual fix to the underlying software and tests it before anything ships. Ambiguous cases route back to human engineers. So the corrections your staff already make during review are what rewrite the product.

Behind the news:

Most “AI for accounting” tools improve on the vendor’s schedule. An engineer notices a problem, writes a patch and pushes an update months later. This system collapses that cycle, automating much of the slow manual loop where humans translate real-world failures into fixes, so it learns from production use in weeks rather than quarters. Reading W-2s and 1099s was never the hard part — those are clean and predictable. The difficulty was the messy material: K-1s, rental schedules, prior-year carryovers and values that have to be reconciled across a dozen documents. That’s the judgment-heavy work firms have long assumed was safe. OpenAI took an equity stake in Thrive in Dec. 2025, which makes this as much a commercial bet as a research demo.

Why it matters:

The review work your seniors and managers do all day, catching the AI’s errors and reconciling the numbers, becomes a training signal. A firm running this kind of system prepares returns faster and builds a proprietary asset that compounds every busy season. That changes the economics of a practice: Value moves from how many returns you can prepare to how good your correction data is and who owns it. The flipside: Firms on a shared platform improve a product they don’t control while the vendor captures the compounding gains.

We're thinking:

The firms that benefit most will treat their review process as intellectual property and negotiate accordingly: who owns the corrections, who owns the improvements they produce, and whether your staff’s expertise builds your moat or someone else’s. A 50% throughput gain sounds like leverage until you notice it also halves the billable hours that anchor the standard pricing model. The firms that move first will reprice around advisory and judgment before that efficiency gain turns from a premium into a client expectation. And the work being automated here is exactly what the profession pointed to as proof that AI couldn’t replace a CPA — the messy, multi-document returns. That argument might now operate on a six-week improvement curve. Decide what your firm sells when accurate return preparation becomes a commodity, because this pilot just put a date on that question.

The annual reckoning

What's new:

Stanford HAI released the 2026 AI Index Report — its most comprehensive annual data pull on where AI actually stands across research, economics, policy and society. The headline: AI capability is not plateauing. Organizational adoption hit 88%, AI agents went from 12% to 66% task success on real computer benchmarks in a single year and generative AI reached 53% population adoption faster than the PC or the internet did. The report is 400-plus pages of data and the accounting profession is scattered across nearly every chapter.

How it works:

The AI Index aggregates data from the profession, academia and government to track AI progress across nine domains: R&D, technical performance, responsible AI, economy, science, medicine, education, policy and public opinion. It measures things other sources don't — not just what models can do, but what they cost, who's building them, where the talent is moving and whether public trust is keeping pace. This year's report introduced a dedicated science chapter for the first time, reflecting how quickly AI is moving from tool to co-investigator in technical domains.

Behind the news:

A few numbers that don't get enough attention: U.S. private AI investment hit $285.9 billion in 2025 — 23 times China's $12.4 billion. The number of AI researchers moving to the U.S. dropped 89% since 2017, with an 80% decline in the last year alone. Documented AI incidents rose to 362, up from 233 the year before. And the "jagged frontier" problem is real: The same systems that earned a gold medal at the International Mathematical Olympiad read analog clocks correctly just 50% of the time. Capability is accelerating and unreliable in the same breath.

Why it matters:

Accounting firms are making capital allocation decisions against a backdrop they don't have clean data on. The AI Index is the closest thing to an audited set of financials for the field. Three things should land for any firm reading it: First, 88% organizational adoption means your clients are already operating differently, whether or not your firm is. Second, responsible AI is not keeping pace with capability — incidents are up sharply, safety benchmarks lag and improving one dimension (say, accuracy) can degrade another (say, safety). Third, the estimated consumer value of generative AI tools reached $172 billion annually by early 2026, with the median value per user tripling in a single year. That's not a trend, that's a repricing of what the tools are worth to the people using them.

We're thinking:

The jagged frontier is the concept accounting firms should sit with. AI systems that can outperform PhDs on science questions and fail basic clock-reading aren't a curiosity — they're the tools firms are already deploying on client work. The failure modes aren't random, they're structural. A system that crushes standardized benchmarks and quietly misreads an ambiguous input document is exactly the configuration that produces a confident, wrong output. That's a review problem, not a technology problem and review costs money. The firms that will extract durable value from AI are the ones that build review discipline proportional to the jaggedness of the tools they're using — not the ones racing to eliminate review in the name of efficiency.

Subscribe to the AI in Focus newsletter