It's over between me and Opus 4.8

Fable 5 audited the relationship and found 3 issues

Jun 10, 2026

Opus 4.8 has been my default model for almost all finance work for weeks. If you’d asked me last week, I’d have said it’s the best model for finance work, full stop.

Then on Tuesday, Anthropic released Claude Fable. Within an hour, our team chat at Fuel turned into a room full of CFOs asking the same question: who’s tested it, and on what?

So I tested it the same night. Both models, side by side. By 1am I had my answer, and the title of this issue.

Fable found 3 issues. One of them is worth about $30M. One of them wasn’t Opus’s mistake. It was mine. Let me show you everything.

First, what Fable actually is

Back in April, Anthropic built a model called Mythos. And then they refused to release it. Too powerful, they said. Too dangerous. They locked it inside a restricted program called Project Glasswing, partners like AWS, Microsoft, Apple, and CrowdStrike only, plus a few biology researchers. The rest of us got to read rumors about it from the outside.

This week’s Claude Fable is that same Mythos model. What changed is the supervision. Fable ships with safeguards: push it into genuinely high-risk territory (serious cybersecurity work, biology, chemistry, attempts to clone the model itself) and it won’t answer. It reroutes you to Opus 4.8. Anthropic says this triggers in under 5% of sessions. Karpathy says the safeguards are “a little too trigger happy” at launch. Both can be true.

The practical part, because the practical part is the job:

→ The full-power Mythos also exists. Businesses that touch it accept mandatory 30-day data retention for safety monitoring.

→ For finance, the guardrails are a non-issue. The reroutes target cyber, bio, and chemistry. FP&A work essentially never trips them. Your forecast is not a bioweapon, no matter what your CEO says about your downside case.

→ Pricing: $10/$50 per million tokens. Double Opus. Ouch.

So that’s who walked in on Tuesday: the dangerous one, on supervised release, wearing a friendlier name. Karpathy called it a major-version-bump step change. I’ve seen one founder had it build a working product live during a customer call.

I was charmed too. But I’ve been a CFO long enough to know you don’t believe charm. You check the references. So I ran an audit.

The audit setup: same company, same prompts, both models

You know Numbr, my imaginary $93M SaaS from Issue 002. Officially our lab rat now.

I took the exact 3-statement model we built last issue and gave Opus 4.8 and Claude Fable two identical tasks. Same files. Same prompts, word for word.

Task one: build the interactive forecast dashboard (the exact prompt from Issue 002).

Task two: “make an analysis of this business using this forecast.”

Here’s what the audit found.

Finding 1: Opus misstated a number and never noticed

Both models built a working dashboard. Both looked good.

Then I checked the numbers. Opus displayed the wrong total revenue. The correct number was in the file I gave it. It showed a different one and never flagged it.

Fable showed the correct revenue. It also added a save-scenario feature and surfaced the metrics that matter for SaaS: NRR, revenue per FTE, Rule of 40, headcount efficiency. Opus gave me the generic revenue-costs-margins set.

Finding 2: a $30M omission

Same analysis prompt to both. They agreed on the thesis: expansion-driven growth, heroic operating leverage, no real bear case. Skim both and you’d call it a tie. It’s not a tie.

Fable caught the tax problem. Numbr’s model runs an 8% effective tax rate flat for five years while pretax income passes $150M. Fable flagged it, quantified $25-35M of overstated cumulative net income and cash, and gave the fix: NOL shields run out at this profitability, model toward 21% plus state by FY28-29.

Opus didn’t mention tax once. Neither has any Claude model I’ve tested.

Two more catches came with it. The SBC add-back: “Adjusted EBITDA” flatters the picture by 3-7 points, net income (44% margin) is the more honest line.

And capital allocation: $744M of cash piling up with no deployment plan. M&A, buybacks, reinvestment, why is this money sleeping? Opus called the cash pile good news and moved on. Board members do not move on.

Uncomfortable part: the tax assumption wasn’t Opus’s error. It was in MY model. Fable didn’t just audit Opus. It audited me. That’s the moment I knew this release was different.

Finding 3: communication issues

My favorite finding, and it’s not a number.

I can spot a Claude-written document from across the room.

Fable writes straight to business. It called the marketplace line “essentially dead.” It said the downside case “would get laughed at” by a board. It ended with marching orders: build an NRR stress case, fix the tax ramp, decouple headcount, kill or fund the marketplace.

Opus ended with a summary. For documents that go in front of CEOs and boards, that’s the difference between one you forward and one you rewrite at 11pm with your name on it.

Fable 5 vs Opus 4.8

So, to wrap up, here’s the full Opus error log vs Fable:

→ Wrong total revenue on the dashboard (the correct number was in the file)

→ A growth multiple anchored to the wrong baseline year

→ Missed the tax problem entirely (~$25-35M of overstated net income and cash)

→ Missed the SBC distortion (Adjusted EBITDA flattered by 3-7 points)

→ Missed the capital allocation question ($744M of cash, no deployment plan)

To be fair to Opus

Here’s what Opus did better. It did do some things better.

→ It produced a proper board-ready scenario table: ARR, revenue, margins, cash, Rule of 40 across all three scenarios. Fable’s document didn’t have one. For a board deck, I’d want that table.

→ It made one genuinely sharp catch: the headcount slider exists in the playground, but none of the preset scenarios ever flex it. That’s a more precise diagnosis than Fable’s flatter “headcount is held identical.”

→ It noticed that every cost line is modeled as a fixed % of revenue, meaning no infra-scaling or support-load risk anywhere in the model. True and useful.

But the pattern of the evening was consistent: Opus produced cleaner-looking structure with more mistakes inside it (it also misquoted a growth multiple, its second numerical slip of the night, stated with full confidence). Fable produced blunter documents with numbers that held.

So, are we really over?

Mostly.

Fable costs double. The high-volume plumbing, categorization, tagging, short summaries, stays with cheaper models. Opus and I are still friends. It keeps the routine tasks. We’re being very adult about it.

But the judgment work, forecast analysis, anomaly investigation, anything where a model has to hold a long chain of reasoning across an entire P&L, that’s Fable’s now.

It’s the first model that performed at a level I’d call senior on my tests. It caught what a good controller catches, including my own mistake.

One evening isn’t a verdict though. It’s a strong signal, the kind you take seriously and verify anyway. This week I’m moving my heavy analysis to Fable and putting it through more experiments. More results soon.

Tell me what you think about Fable in the subscriber chat. I read every message.

Join The Finance Engineer’s subscriber chat

Available in the Substack app and on web

And as is tradition in good marketing around here: this issue was brought to you by our founding partners, Hampton, FIF Collective, Light, and Spendesk. Grateful to them for backing this newsletter before model releases started writing my content calendar.

— Alyona

Discussion about this post

Ready for more?