Apple's Foundation Models Framework Now Supports Claude. Here's Why That Actually Matters.
WWDC26 session 339 quietly made Anthropic a first-class citizen in Apple's AI framework. It changes how anyone building local automations on Mac should think about the Claude vs. on-device tradeoff.
I’ve spent the better part of this year building an automation stack on top of my home Mac. Nothing fancy from the outside. It’s just a collection of Python scripts and shell jobs that sync my notes, pull in meeting summaries, classify what goes where, and generally try to keep my personal knowledge management from becoming a pile of unprocessed garbage. But I’ve been slowly replacing the cloud AI calls at the center of it with something local, on-device, private, and free to run. That’s been the project. Get the cheap, fast, repetitive work off the paid API and onto hardware I already own.
WWDC26 was supposed to be the roadmap for finishing that. And it was. But it also gave me something I wasn’t expecting, buried inside a developer session that almost nobody in the mainstream coverage even mentioned.
Apple handed me the cleanest architecture I’ve ever had for this kind of work. And in the same breath, they told me exactly where Claude fits into it.
The session is number 339: “Bring an LLM provider to the Foundation Models framework.” The short version is this: Apple built a new Swift protocol called LanguageModel, and anything that conforms to it runs through the exact same LanguageModelSession API. That’s the on-device Apple model. That’s Private Cloud Compute. That’s MLX running a local open model. And now, explicitly announced by name as first-party Swift packages, that’s Claude and Gemini too. Swapping the model underneath is a one-line change. Everything downstream stays the same.
I’ve been running a version of this philosophy manually for months, designing the automation suite so the model was always a swappable component rather than something baked permanently into the system’s logic. Apple just made that the official framework design. It’s validating in the best way, the kind where you realize you were already doing the right thing before you knew it was the right thing.
Here’s what that actually means in practice.
I ran a proxy test this spring against eighteen real captures from my notes, the kind of messy, mid-thought stuff that actually ends up in a capture inbox. Untitled voice notes, half-finished ideas, links with no context. I wanted to know before September whether the architecture I’d designed would hold up, or whether I’d been building toward something that sounded good in theory and fell apart on contact with my actual life.
The on-device Apple Foundation Model is a small model, something in the 3-billion-parameter range. The results were honest: excellent at generating titles for untitled notes, mediocre at routing things into the right folder, and genuinely unreliable at tagging with any nuance. That’s fine. Those are the right jobs for a small, fast, free, private model. Run it ten thousand times a month and the cost is zero and nothing leaves the machine.
But then there are the other jobs. The meeting that ran sideways and needs a real summary, not three bullet points. The draft that needs a second opinion, not autocomplete. The thing that’s broken in a way I can’t quite articulate to anyone without spending twenty minutes building context first. Those jobs need judgment, and a 3-billion-parameter on-device model doesn’t have judgment. It has pattern-matching. Claude has judgment. That distinction matters, and WWDC26 is the first time Apple has given me a clean, framework-level way to express it.
Before this, mixing local and cloud models in the same system was awkward. You’d hit the local model one way and the API another, and some hand-rolled glue lived in the middle hoping nothing changed on either end. What session 339 describes is a single session interface where the model is just an argument. On-device for the cheap pass, Claude for the heavy lift, and the rest of the system never has to know the difference.
Don’t get me wrong. This is all still theoretical until macOS 27 Golden Gate ships in September and the fm CLI is actually running against real data. Right now I’m validating the prompt architecture against a proxy model to make sure it holds up before wiring in the real thing. But the signal from this WWDC is clearer than anything I’ve seen in years of watching Apple’s AI strategy from the outside: they’re not trying to replace every model with their own. They’re trying to be the platform that all the models run through, cleanly, with consistent privacy guarantees and a unified API surface.
The one thing worth saying plainly: OpenAI isn’t there. GPT-5 and ChatGPT are not Foundation Models framework providers. ChatGPT lives in the Shortcuts ‘Use Model’ action, in the system extension, and in the OpenAI API directly. That’s fine for what it is, but it’s not the same relationship, and the gap is hard to miss. Apple named Anthropic and Google by name as framework partners, and the absence of OpenAI from that list is either a negotiating gap or a deliberate choice. I don’t know which. But it does mean the cleanest, most native path for mixing Apple’s on-device model with a frontier cloud model is Claude, not GPT. That’s not nothing.
I’ve been saying in previous pieces that the iPad’s hardware was never the problem. The hardware argument applies here too, in a different direction. My Mac has always been capable of running this kind of automation. What it lacked was a coherent framework for doing it that didn’t require rebuilding the integration layer every time a model API changed. That’s what’s actually different about this WWDC. The framework landed.
The September GA is the real test. I’ll be upgrading my MacBook and iPad Pro carefully, verifying every launchd job one at a time before I trust the suite to the new OS, and then slowly swapping the MLX prototype for the real fm CLI. If the validation holds, if the prompt architecture that worked against a proxy model works the same way against Apple’s actual on-device model, then the stack I’ve been building toward is basically done. Local model for the repetitive work, Claude for the judgment calls, Shortcuts for the device layer, and a framework underneath all of it that Apple is actively maintaining.
That’s the first time in years I’ve been able to say all four of those things in the same sentence without caveating half of them.
I’ll write about how the upgrade actually goes. Either it works and that’ll be a satisfying piece to write, or something breaks in an interesting way and that’ll be an even more satisfying piece to write. Either way, September is going to be loud.

