Cloud AI Coding Costs Keep Climbing — How to Pay $0 and Still Use Claude Code

Update — April 22, 2026. XDA Developers reported yesterday that Anthropic is A/B-testing a version of the Pro plan that doesn’t include Claude Code — affecting about 2% of new signups. Current Pro users keep it; the pricing page has been updated to show Claude Code unchecked for the test variant. If the test rolls out fully, the cheapest way to use Claude Code jumps from the $20 Pro tier to the $100+ Max tier. Either way, the direction is clear: the cost of cloud AI coding is going up, not down. What’s below is how to keep using Claude Code regardless.

Every few weeks there’s another headline about an AI company raising prices, tightening rate limits, or putting a favorite tool behind a higher tier. If you’re a developer who uses Claude Code or a similar agent every day, the math starts getting uncomfortable. What used to be a $20/month habit can quietly become $100/month — per seat — before you notice. Multiply by a small team and it’s a real line item.

I got tired of watching that number climb so I built a way around it. Claude Code still works great. You just run it against a local AI on your own Mac instead of the cloud.

The project is open source and free: github.com/nicedreamzapp/claude-code-local. Bash one script, double-click a launcher, and you’re coding against a local model that costs nothing to run beyond electricity.

Why the cloud bill keeps growing

Cloud AI pricing isn’t stable and shouldn’t be treated like it is. The models keep getting more capable, and each jump in capability gets repriced into a higher tier. Context windows expand and the per-token cost goes up to match. “Included in your plan” features get quietly moved up a tier. None of this is wrong — the cost to serve a frontier model is real — but the net effect on a developer’s monthly AI spend is a steady upward drift you can’t plan for.

Meanwhile, the machine on your desk has gotten radically more capable at running those same kinds of models. An M-series MacBook with 32+ GB of RAM can now comfortably run a coding model at 15–65 tokens per second. That’s real, useful inference speed. Five years ago this wasn’t possible. Today it’s ordinary.

The question stops being “can my Mac run this?” and becomes “why am I paying a subscription for something my Mac can do by itself?”

How the local setup works

Claude Code — Anthropic’s CLI coding agent — talks to api.anthropic.com by default. It sends your prompts and code to the cloud, gets responses back, runs tool calls. Fast, reliable, costs money.

The workaround is a tiny server that runs on your Mac and pretends to be the Anthropic API. Claude Code thinks it’s talking to the cloud. It’s actually talking to localhost:4000, which hands the request to an MLX-powered local model running on your Apple Silicon GPU. The response comes back formatted exactly like the real API would format it. Claude Code doesn’t know the difference.

I went through three generations of this before it actually worked well:

Gen 1 — Ollama + a translation proxy. 30 tok/s, 133 seconds per task. Worked, but slow.
Gen 2 — llama.cpp with KV cache compression + same proxy. 41 tok/s. Still 133 seconds per task because the proxy was the bottleneck, not the model.
Gen 3 — Killed the proxy entirely. Wrote a native MLX server that speaks Anthropic’s API directly. 65 tok/s and 17.6 seconds per task. 7.5× faster than anything I had before, and it came from deleting code rather than adding it.

The key insight: when you’re running everything locally, every layer of translation or RPC between Claude Code and the model is overhead that buys you nothing. Direct is always faster than proxied.

What it costs

Setup — once, ~20 minutes:

git clone https://github.com/nicedreamzapp/claude-code-local
cd claude-code-local && bash setup.sh
Double-click the launcher that lands on your Desktop.

Ongoing — $0/month. Electricity maybe adds a dollar or two to your power bill if you’re running inference constantly. That’s it. No API key. No usage tier. No rate limits. No dashboard to check.

Running Claude Code against a local model also means:

Your code never leaves your Mac. For NDA work, client files, or anything under compliance rules, this isn’t a “nice to have” — it’s the difference between being able to use AI and having to turn it off.
Works without internet. Plane, train, coffee shop with bad wifi, client office with firewall restrictions — if the machine is running, the AI is running.
No surprise bill. You won’t wake up to a $400 month because you left a loop running overnight.

The tradeoff

To be fair: local models aren’t cloud Claude. Cloud Sonnet and Opus are still more capable on really hard reasoning tasks. For 80-90% of day-to-day coding — writing functions, refactoring, explaining code, running tool calls, doing multi-step edits — the local models are good enough that I’ve stopped reaching for cloud. For the hardest 10-15% of problems, cloud still wins.

The honest recommendation: run local as your default, and keep a cloud subscription only if you’re reaching for it weekly. Most developers will find they rarely need to.

Where the ceiling is going

Local models keep getting better faster than cloud pricing falls. The 31B and 70B open-weight models available today are what people paid cloud premiums for 18 months ago. In another 18 months, what’s running on your Mac will be indistinguishable in capability from what the cloud was charging $200/month for this year.

That’s the real story. Not “cloud is bad.” Cloud is fine. It’s that local has caught up, and the cost math has flipped. If you’re a developer who cares about the bill, or a firm whose files can’t legally leave the premises, or just someone who doesn’t want to be surprised by next quarter’s pricing page — this is the version that makes sense now.

Try it: github.com/nicedreamzapp/claude-code-local. MIT license. Zero dependencies you can’t audit. Run it, or don’t, but either way you’ll have one less subscription to worry about.

— Matt

Part of the Nice Dreamz lineup. If you’re a firm (law, accounting, medical, therapy) that can’t legally put client files through cloud AI, AirGap AI is the private on-device setup I do for firms — book a 15-min call there.

Why the cloud bill keeps growing

How the local setup works

What it costs

The tradeoff

Where the ceiling is going

Leave a Comment Cancel Reply