{"id":71804,"date":"2026-04-15T09:13:03","date_gmt":"2026-04-15T02:13:03","guid":{"rendered":"https:\/\/hbbgroup.net\/googles-gemma-already-acts-like-gemini-someone-made-it-think-like-claude-opus-too\/"},"modified":"2026-04-15T09:13:03","modified_gmt":"2026-04-15T02:13:03","slug":"googles-gemma-already-acts-like-gemini-someone-made-it-think-like-claude-opus-too","status":"publish","type":"post","link":"https:\/\/hbbgroup.net\/zh\/googles-gemma-already-acts-like-gemini-someone-made-it-think-like-claude-opus-too\/","title":{"rendered":"Google&#8217;s Gemma Already Acts Like Gemini\u2014Someone Made It Think Like Claude Opus Too"},"content":{"rendered":"<div>\n<p>If you&#8217;ve been following the local AI scene, you probably know <a href=\"https:\/\/decrypt.co\/364047\/want-claude-opus-ai-potato-pc-next-best-bet\" target=\"_blank\">Qwopus<\/a>\u2014the open-source model that tried to distill Claude Opus 4.6&#8217;s reasoning into Alibaba&#8217;s Qwen, so you could run something resembling Opus on your own hardware for free. It worked surprisingly well. The obvious catch: Qwen is a Chinese model, and not everyone is comfortable with that.<\/p>\n<p>Jackrong, the same pseudonymous developer behind that project, heard the feedback. His answer is <a href=\"https:\/\/huggingface.co\/collections\/Jackrong\/gemopus-4\" target=\"_blank\" rel=\"nofollow external noopener\">Gemopus<\/a>\u2014a new family of Claude Opus-style fine-tunes built entirely on Google&#8217;s open-source Gemma 4. All-American DNA, same idea: frontier-level reasoning, running locally on hardware you already own.<\/p>\n<p>The family comes in two flavors. <a href=\"https:\/\/huggingface.co\/Jackrong\/Gemopus-4-26B-A4B-it-GGUF\" target=\"_blank\" rel=\"nofollow external noopener\">Gemopus-4-26B-A4B<\/a> is the heavier option\u2014a Mixture of Experts model that has 26 billion total parameters but only activates around 4 billion during inference, which means it punches well above its weight on constrained hardware.<\/p>\n<p>Parameters are what determine an AI&#8217;s capacity to learn, reason, and store information. Having 26 billion total parameters gives the model a huge breadth of knowledge. But by only &#8220;waking up&#8221; the 4 billion parameters relevant to your specific prompt, it delivers the high-quality results of a massive AI while remaining lightweight enough to run smoothly on everyday hardware.<\/p>\n<p>The other is <a href=\"https:\/\/huggingface.co\/Jackrong\/Gemopus-4-E4B-it\" target=\"_blank\" rel=\"nofollow external noopener\">Gemopus-4-E4B<\/a>, a 4-billion parameter edge model engineered to run comfortably on a modern iPhone or a thin-and-light MacBook\u2014no GPU required.<\/p>\n<p>The base model choice matters here. Google&#8217;s Gemma 4, released on April 2, is built directly from the same research and technology as Gemini 3\u2014the company said so explicitly at launch. That means Gemopus carries something no Qwen-based fine-tune can claim: The DNA of Google&#8217;s own state-of-the-art closed model under the hood, wrapped in Anthropic&#8217;s thinking style on top. The best of both worlds, more or less.<\/p>\n<p>What makes Gemopus different from the wave of other Gemma fine-tunes flooding Hugging Face right now is the philosophy behind it. Jackrong deliberately chose not to force Claude&#8217;s chain-of-thought reasoning traces into Gemma&#8217;s weights\u2014a shortcut most competing releases take.<\/p>\n<p>His argument, backed by recent research, is that stuffing a student model with a teacher&#8217;s surface-level reasoning text doesn&#8217;t actually transfer real reasoning ability. It teaches imitation, not logic. &#8220;There is no need for excessive imagination or superstitious replication of the Claude-style chain of thought,&#8221; the model card reads. Instead, he focused on answer quality, structural clarity, and conversational naturalness\u2014fixing Gemma&#8217;s stiff Wikipedia tone and its tendency to lecture you about things you didn&#8217;t ask.<\/p>\n<p>AI infrastructure engineer Kyle Hessling ran independent benchmarks and published the results directly on the model card. His verdict on the 26B variant was pretty favorable. &#8220;Happy to have benched this one pretty hard and it is an excellent finetune of an already exceptional model,\u201d he wrote on X. \u201cIt rocks at one-shot requests over long contexts, and runs incredibly fast thanks to the MOE (mixture of experts) architecture.&#8221;<\/p>\n<div>\n<blockquote>\n<p lang=\"en\" dir=\"ltr\">Gemopus-4-26B-A4B from Jackrong is LIVE! <\/p>\n<p>Happy to have benched this one pretty hard (see my benches in the model card) and it is an excellent finetune of an already exceptional model! My friend Jackrong is always cooking the greatest!<\/p>\n<p>It rocks at one-shot requests over long\u2026<\/p>\n<p>\u2014 Kyle Hessling (@KyleHessling1) <a href=\"https:\/\/twitter.com\/KyleHessling1\/status\/2042637857194676321?ref_src=twsrc%5Etfw\" data-wpel-link=\"internal\">April 10, 2026<\/a><\/p>\n<\/blockquote>\n<\/div>\n<p>The smaller E4B variant passed all 14 core competence tests\u2014instruction following, coding, math, multi-step reasoning, translation, safety, caching\u2014and cleared all 12 long-context tests at 30K and 60K tokens. On needle-in-haystack retrieval, it passed 13 out of 13 probes including a stretch test at one million tokens with YaRN 8\u00d7 RoPE scaling.<\/p>\n<div>\n<figure><img loading=\"lazy\" alt width=\"1372\" height=\"1194\" decoding=\"async\" data-nimg=\"1\" src=\"https:\/\/img.decrypt.co\/insecure\/rs:fit:3840:0:0:0\/plain\/https:\/\/cdn.decrypt.co\/wp-content\/uploads\/2026\/04\/Captura-de-pantalla-2026-04-14-a-las-12.26.49.png@webp\"><\/figure>\n<\/div>\n<p>The 26B extends natively to 131K context and all the way out to 524K with YaRN, which Hessling also stress-tested: &#8220;It also crushed my simple needle-in-the-haystack tests all the way out to an extended context of 524k!&#8221;<\/p>\n<p>On edge hardware, the E4B is genuinely fast. Jackrong reports 45\u201360 tokens per second on iPhone 17 Pro Max, and 90\u2013120 tokens per second on MacBook Air M3\/M4 via MLX. The 26B MoE architecture means it offloads gracefully on unified memory systems or GPUs with under 10GB of VRAM. Hessling called it his daily driver recommendation for VRAM-starved setups.<\/p>\n<p>Both models are available in GGUF format, which means you can drop them straight into LM Studio or llama.cpp without configuration. The full training code and a step-by-step fine-tuning guide are on Jackrong&#8217;s <a href=\"https:\/\/github.com\/Jackrong-llm-finetuning-guide\" target=\"_blank\">GitHub<\/a>\u2014same pipeline he used for Qwopus, same Unsloth and LoRA setup, reproducible on Colab.<\/p>\n<p>Gemopus is not without its rough edges. Tool calling remains broken across the entire Gemma 4 series in llama.cpp and LM Studio\u2014call failures, format mismatches, loops\u2014so if your workflow depends on agents using external tools, this is not your model yet. Jackrong himself calls it &#8220;an engineering exploration reference rather than a fully production-ready solution,&#8221; and recommends his own Qwopus 3.5 series for anyone who needs something more stable for real workloads.<\/p>\n<p>And because Jackrong deliberately avoided aggressive Claude-style chain-of-thought distillation, don&#8217;t expect it to feel as deeply Opus-brained as Qwopus\u2014that was a conscious tradeoff for stability, not an oversight.<\/p>\n<div>\n<blockquote>\n<p lang=\"en\" dir=\"ltr\">Yeah the philosophy on this one was stability first, it is my understanding that the Gemma models tend to become unstable if you force a bunch of Claude thinking traces into them, you can see this when testing many other Opus gemma fine tunes on hugging face. <\/p>\n<p>Jackrong tried a\u2026<\/p>\n<p>\u2014 Kyle Hessling (@KyleHessling1) <a href=\"https:\/\/twitter.com\/KyleHessling1\/status\/2042661626604851632?ref_src=twsrc%5Etfw\" data-wpel-link=\"internal\">April 10, 2026<\/a><\/p>\n<\/blockquote>\n<\/div>\n<p>For those who want to go deeper into Gemma fine-tuning for reasoning specifically, there is also a separate community project worth watching: <a href=\"https:\/\/huggingface.co\/DJLougen\/Ornstein-26B-A4B-it-GGUF\" target=\"_blank\" rel=\"nofollow external noopener\">Ornstein<\/a> by pseudonmyous developer DJLougen, which takes the same 26B Gemma 4 base and focuses specifically on improving its reasoning chains without relying on the logic or style of any specific third party model.<\/p>\n<p>One honest caveat: Gemma&#8217;s training dynamics are messier than Qwen&#8217;s for fine-tuners\u2014wider loss fluctuations, more hyperparameter sensitivity. Jackrong says so himself. If you need a more battle-tested local model for production workflows, his Qwopus 3.5 series remains more robustly validated. But if you want an American model with Opus-style polish, Gemopus is currently your best available option. A denser 31B Gemopus variant is also in the pipeline, with Hessling teasing it as &#8220;a banger for sure.&#8221;<\/p>\n<p>If you want to try running local models on your own hardware, check our guide on <a href=\"https:\/\/decrypt.co\/348129\/running-your-own-local-open-source-ai-model-easy-heres-how\" target=\"_blank\">how to get started with local AI<\/a>.<\/p>\n<div>\n<h3>Daily Debrief Newsletter<\/h3>\n<p>Start every day with the top news stories right now, plus original features, a podcast, videos and more.<\/p>\n<\/div>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>If you&#8217;ve been following the local AI scene, you probably know Qwopus\u2014the open-source model that tried to distill Claude Opus [&hellip;]<\/p>","protected":false},"author":5,"featured_media":71805,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[220],"tags":[],"class_list":["post-71804","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-tien-dien-tu"],"acf":[],"_links":{"self":[{"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/posts\/71804","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/comments?post=71804"}],"version-history":[{"count":0,"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/posts\/71804\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/media\/71805"}],"wp:attachment":[{"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/media?parent=71804"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/categories?post=71804"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/hbbgroup.net\/zh\/wp-json\/wp\/v2\/tags?post=71804"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}