Thread by @svpino | Bookmarks

Santiago @svpino 2026-04-08

If you ask me, Gemma 4 is one of the best models out there for a single reason:

You can run it locally and it’s really, really good (probably Sonnet level?)

And, of course, ya can now use it to power OpenClaw and show a middle finger to the company who doesn’t like it.

2026-04-08

Run OpenClaw with Gemma 4 and Atomic Chat

MacBook Air M4 · 16 GB RAM · 25 tok/s

No cloud! No subscription fees! Open-source local model. Runs on your regular device

GooGZ AI @PaulGugAI 2026-04-08

Which flavor of Gemma 4 can run on a 16GB mini, and would it be 'clever enough' for most things however ? I've been hearing mixed reviews/reports so far about the efficacy of the smaller E2B/E4B variants. I think the just is still out.

Matthias Kampmann @M_Kampmann 2026-04-09

I wish this were true.

Tool calling is not reliable though.

Also it plays back the instructions of repeating jobs instead of executing them quite often.

GLM 5.1 though…

Jatin Garg @jatingargiitk 2026-04-09

running it locally is the real moat here, but “sonnet level” depends a lot on the task for coding and long-horizon reasoning, that’s still a stretch. the interesting part is getting near-frontier quality without the API leash.

Nishan @nishancodes 2026-04-09

Not Sonnet level, not even near.

But it's an insanely good model compared to its oss peers

Balvinder Kalon @BalvinderKalon 2026-04-09

25 tok/s on a MacBook Air is genuinely usable. the game changer with local models isn't matching frontier quality, it's having a capable model running with zero latency and zero cost for the 80% of tasks that don't need opus-level reasoning. gemma 4 hits that sweet spot.

Sam Selvanathan @samselvanathan_ 2026-04-09

'Sonnet level' is the wrong frame. 26B MoE with ~4B active parameters per forward pass isn't competing with Sonnet quality-wise. It's frontier-adjacent reasoning at local-inference economics. Different deployment category. The teams that get this aren't asking 'is it as good as

Konstantin Gladych @gladkos 2026-04-08

Google Turboquant makes possible large context window for agentic tasks even on a mid-end devices.

Bongquisitive @bongquisitive 2026-04-09

I don't think Gemma-4 is even at the level of Qwen3.5-27B level...but that's just my use cases...

Erich Cervantez  @erichcervantez 2026-04-09

Wonder how the A.I. tech bros feel about this after blowing tens of thousands on mac studio ultra machines that burn tokens like it's the fourth of july 🎇

NColonJr @NelsonColonJr 2026-04-09

Its insanely efficient, I thought it was just as good as claude at first, its not even close. Still buggy, gives me cracked code, crashes ~80k-100k token spent...still sufficient just not as good.

Petru | Steam Vibe @SteamVibeLtd 2026-04-09

Agreed. The fact it runs so well on consumer hardware is the real win. OpenClaw integration just proves the point.

Marcus Motill @marcusmotill 2026-04-09

"sonnet level" the Claude bros are so annoying

Paul ADW @PaulADW 2026-04-09

Really, you can’t. Feed it 40k tokens and watch it eat like 50GB of ram

Pia Red Dragon @PiaRedDragon 2026-04-09

Agreed, but if you are not going to use the full sized model use these guys version, the 30GB version gave me BETTER results on MMLU.

I am not sure what their proprietary quantization method is, but it is insane!