The missing piece in Anthropic's finance agents

I tested Anthropic's finance agents all weekend. They don't work out of the box.

Jun 06, 2026

∙ Paid

I love good marketing. And nobody in AI does it better than Anthropic right now.

Look at what they pulled off. On May 5th they dropped ten finance agent templates and the whole internet lost its mind on cue. Every AI influencer reposting within the hour. Half my LinkedIn feed, honestly.

And Anthropic’s marketing deserves its own newsletter issue. Their creative team (who, I promise, are not agents, and not Claude) keep coming up with the most human way to sell AI. Go look at their Instagram. It’s art, illustration, in-person Code with Claude meetups, real people making real things.

They recently ran a roundtable in Tokyo where they put painters, bakers, and engineers in the same room, with one of Japan’s last three living sento mural painters working on the wall behind them. Look at that. Crazy that this is what AI marketing looks like now.

Anyway. Away from the marketing for a second: financial services is now Anthropic’s second-biggest source of enterprise revenue, right behind tech.

But as a CFO-turned-founder, I know the single most costly habit in finance is believing the marketing.

So I closed Slack. Opened a fresh Claude window. And spent the rest of the weekend testing Anthropic’s agents on an imaginary company called Numbr. (Sorry, Numbr, if there’s a real one out there. I didn’t mean you.)

And since we’re on the subject of good marketing, here’s some of my own: this issue was brought to you by our founding partners, Hampton, FIF Collective, Light, Spendesk and Fuel.

First, let’s agree on what a word “agent” means

Before we go further, we need to talk about a word.

I truly hate how everyone is calling everything an “agent” right now. Email automation? Agent. Macro in Excel? Agent. N8N flow with a ChatGPT call bolted on the end? Bro, that’s still just a N8N flow.

We’re all smart finance people here. We do critical thinking for a living. So let’s actually agree on what each word means before someone sells us a product that doesn’t do what we think.

→ Automation. A rule-based workflow. You write the rules, it follows them, forever. A lot of the “AI-native” stuff you see marketed in finance is simply rule-based workflows.

→ Skill. A Claude feature. A reusable instruction set you hand the model: how to do a thing, the steps, the format, what to check, what to refuse. Find them under Customize → Skills (toggle any on or off). Build your own with the built-in Skill Creator, which interviews you and writes the file. You’ll see a lot of my skills on the pages of this newsletter.

→ Plugin. A bundle of skills, connectors, and helpers packaged together. Basically an app inside Claude. Add one under Customize → Plugins → “+” → Add marketplace, then paste a GitHub URL, which is exactly how we’ll install the finance ones in a minute. You can build your own too.

→ Routine. A scheduled workflow you set once and let run. Smarter than automation, not as independent as an agent. In Cowork these are Scheduled Tasks: describe the recurring job, it fires on a schedule (as long as your laptop’s awake and Claude’s open).

→ Memory. LLM remembering your context, preferences, and files across sessions, so you’re not re-explaining your business every morning. Turn it on in settings. Tip: you can also move your memory from ChatGPT to Claude. Open Claude, go to Settings > Capabilities > Memory, and click Start Import. Copy the extraction prompt provided and follow the instructions.

→ Cowork. A Claude feature. The desktop workspace where Claude works with your files, tools, and skills in one place instead of you copy-pasting between tabs. The finance version of the setup engineers have always had.

→ Agent. A model that decides what to do next on its own. It plans, picks its tools, checks its own work, tries again when it’s wrong. Calls you never scripted.

Most things being marketed as “agents” right now are actually plugins, skills, or automations. That’s not bad! It’s just not what the word means.

So the ten “finance agents” everyone lost their mind over?

They’re plugins. Packaged skills with a defined workflow, called from inside Cowork, with mandatory human review on every output. That’s not an agent in the strict sense. That’s a really well-engineered plugin.

Cool. Now we can talk about them honestly.

They’re not even inside Claude. They’re on GitHub.

This was the first thing that surprised me. The whole thing lives on GitHub. As code.

At github.com/anthropics/financial-services.

For anyone not from the engineering side, GitHub is where code lives on the internet. It’s the version-controlled filing cabinet of the entire software industry. Anthropic publishing finance agents there means: here’s the open-source recipe, you go install it.

And once you’re on GitHub, you start noticing things.

There are other finance agents up there. From other people. From other companies. Some of them, honestly, better than what Anthropic shipped. The marketing made it sound like Anthropic invented finance-flavored AI on Tuesday. The repo tells a different story.

My first impression: the marketing was louder than what’s inside.

That’s not me being a hater. Anthropic is one of the companies I bet on hardest. I’d call myself a Claude psycho on a public post, and I have. But part of building your finance engineering skillset is being able to read marketing carefully - not just receive it.

Okay. Let’s install them.

How to add them (5 minutes, no engineering required)

The install is the easy part.

You need three things: Claude desktop (the app, not the browser), a paid plan, Cowork enabled.

Then:

Open Claude desktop
Settings → Plugins → Add plugin → Add marketplace

Paste: https://github.com/anthropics/financial-services

Hit Sync

Done. The marketplace shows up in your plugin directory.

When you open it, you’ll see the full list:

Built for corporate finance (us):

Model builder - DCF, LBO, 3-statement, comps, live in Excel
Financial analysis - same core, broader scope
GL reconciler - finds breaks, traces root cause, routes for sign-off
Month end closer - accruals, roll-forwards, variance commentary
Earnings reviewer - earnings call + filings → draft update to note

Built for investment banking / PE / wealth management (not us):

Equity research
Investment banking
Fund admin
KYC screener

Other:

Finance (general), Marketing, Small business

A lot of these are calibrated for investment banking, private equity, and wealth management workflows. Not for the FP&A manager, controller, or head of finance at a SaaS company.

For corporate finance (the people this newsletter is actually for) three of them are worth your time today:

→ Model builder (with caveats, see below)

→ GL reconciler

→ Month end closer

I’m going to walk through Model Builder in detail right now.

Inside the Model Builder

When you open Model Builder, it offers four model types:

→ DCF - discounted cash flow

→ LBO - leveraged buyout

→ 3-statement - IS / BS / CF, integrated

→ Comps -trading multiples

A thing I noticed immediately and I think it tells you exactly who Anthropic built this for.

There’s no model for recurring revenue forecasting. No unit economics. No MRR waterfall. No cohort retention. No usage-based pricing model.

If you run finance at a SaaS company - that list is your whole job. And it’s missing.

DCF and LBO are pretty standard banker-side instruments. Most corporate finance teams don’t live in those models day-to-day. The only one from the four that genuinely maps onto operational corporate finance is the 3-statement.

So that’s where we’ll go deep.

What’s good inside the 3-statement skill

I went into the repo and read the 3-statement skill file line by line. (Reading skill files is now part of my actual job. I did not expect to ever type that sentence, and yet here we are.)

Three things Anthropic got right.

1. Formulas over hardcodes. Every projected cell is a formula. Every assumption is labeled, cited, and traceable to an input. No typed numbers in calculation cells. This is the right discipline — every cell can be audited, every driver flexed, and the model behaves like a model instead of a printout.

2. Verify step-by-step with the user. This part I love. The skill is explicitly told NOT to build the entire model end-to-end and present it complete. It’s told to stop at each statement. Show the work. Wait for confirmation. Catch errors early.

The exact language from the skill:

→ After mapping the template → show the user the sections, confirm before touching any cells

→ After populating historicals → show the historical block, confirm values/periods match source

→ After building IS projections → run subtotal checks, confirm before moving to BS

→ After building BS → show the balance check (Assets = L+E) for every period

→ After building CF → show the cash tie-out (CF ending cash = BS cash)

→ Do NOT populate the entire model end-to-end and present it complete. Break at each statement. Show the work. Catch errors early.

This is human-in-the-loop done right. After every wrong number I’ve watched Claude confidently produce, this is exactly the discipline I want.

Steal this pattern. Paste it into every skill you ever build. I have.

3. Real integrity checks. Master Status reads PASS across all periods. Every reconciliation gets its own tab. Every guardrail shows up visibly on the Cover and Checks tabs. That’s good engineering, full stop.

What to keep in mind before we start

The Model Builder agent ships as a general template. No business context. No industry assumptions. No idea what your revenue model is.

To get a forecast that actually reflects your company, you have to customize the underlying skill file first. With business-specific context, drivers, assumptions, metrics.

You can do this inside Claude Chat. No engineering required. But you do need to do it. Out of the box, the model is generic.

Let me show you what “out of the box” actually looks like, and what happens after you customize.

Meet Numbr, my imaginary company

To test this honestly, I needed real-looking data. So I generated a fictional company with Claude and tested on that.

Meet Numbr, Inc. – a $93M ARR fintech SaaS selling financial intelligence tools to banks, hedge funds, and asset managers.

→ Three legal entities (US, Netherlands, Singapore)

→ Two revenue streams (recurring subscriptions + usage-based API billing)

→ Freshly cash-flow positive in 2025

→ Complex enough to be interesting, clean enough to follow along

The inputs are three Excel files exported straight from their systems:

→ QBO export — Cash-basis P&L with $3M intercompany, Balance Sheet that doesn’t tie

→ NetSuite export — proper accrual books, $89.5M revenue, $12.6M operating income

→ HubSpot export — pipeline, customer data, deal stages

This is what real corporate finance actually looks like.

I’m dropping the Numbr files here. Took me ~3 hours to generate and clean. You can run the same test yourself.

Numbr Hubspot Export

78.9KB ∙ XLSX file

Download

Numbr Netsuite Export

45.3KB ∙ XLSX file

Download

Numbr Qbo Export

42.4KB ∙ XLSX file

Download

Let’s see what we got out of the box

I switched on Model Builder, pointed it at the three files, and asked for a 3-statement model.

It worked. It ran the step-by-step verification exactly like the skill promised. Stopped at each statement. Showed the balance check. Tied out cash. As pure engineering, it did what it said on the tin.

It built a generic 3-statement model.

And it had no idea Numbr was a SaaS company. It skipped the recurring revenue build. It ignored churn. It never split subscriptions from usage-based API billing, even though that split is sitting right there in the HubSpot and NetSuite files.

ARR, net revenue retention, cohorts, all gone. It forecast revenue as one line growing at one rate, the way you’d model a company that sells a single widget to a single kind of customer.

I gave the Financial Analysis plugin a shot too. It was a little better. A few more ratios, a bit more commentary. But still average.

Lesson learned: Anthropic’s plugins are good, technically, but they don’t work out of the box.

So I went further, and customized the skill, which really is great, to fit my imaginary $93M SaaS.

Let’s see what happens when we customize the skill

I did it in Claude Chat.

I described the business model, the challenges, the revenue streams, the markets, and the exact financial model I wanted out the other end.

I kept Anthropic’s skill as the base, because most of the rules in it are good. No reason to start from scratch when the bones are right.

You can download my custom SaaS 3-statement skill file below (+ prompt I use to turn the agent's model into an interactive dashboard you can play with).

The missing piece in Anthropic's finance agents

I tested Anthropic's finance agents all weekend. They don't work out of the box.

First, let’s agree on what a word “agent” means

They’re not even inside Claude. They’re on GitHub.

How to add them (5 minutes, no engineering required)

Inside the Model Builder

What’s good inside the 3-statement skill

What to keep in mind before we start

Meet Numbr, my imaginary company

Let’s see what we got out of the box

Let’s see what happens when we customize the skill

This post is for paid subscribers