コンテンツへスキップ

GLM 5.2 vs Opus 4.8 vs Fable 5: Which One Should You Actually Run?

6月 22, 2026 · 10 分で読了

Bar chart of GLM 5.2 vs Opus 4.8 vs Fable 5 availability, with GLM 5.2 and Opus 4.8 online and Fable 5 offline

You read the launch coverage. Fable 5 tops almost every benchmark. GLM 5.2 does frontier coding for roughly a sixth of the price of GPT class models. Opus 4.8 is the safe default that just ships. So you go to wire one of them into your stack, and one of them returns a 404 before you write a single line of real code.

That is the gap between the leaderboard and production. The model that wins the chart is not always the model you can run, and the model you can run today might not be the one you can run next week.

This is a straight comparison of the three models everyone is asking about right now, on the only axes that decide a real build: what they cost, how they actually perform, and whether you can put your name on a service that depends on them.

TL;DR

Opus 4.8 is the production default you can run today across every major platform. GLM 5.2 is the open-weight option nobody can take away from you, at frontier-level coding for a fraction of the cost. Fable 5 leads the benchmarks but has been suspended worldwide since June 12, 2026 under a US export control directive, and the API currently reroutes to Opus 4.8 with no restoration date.

If you only remember one thing: do not hardcode any single model into a service you have to keep online.

The comparison at a glance

GLM 5.2Opus 4.8Fable 5
MakerZ.ai (Zhipu)AnthropicAnthropic
ReleasedJune 16, 2026May 28, 2026June 9, 2026
LicenseMIT, open weightsProprietaryProprietary
Context window1M tokens1M tokens1M tokens (third party reported)
Max output~128K tokens128K tokens128K tokens
Input price (per 1M)~$1.40$5.00$10.00
Output price (per 1M)~$4.40$25.00$50.00
Vision inputNo, text onlyYesYes
SWE-bench Pro62.1%69.2%80.3%
Available todayYes, hosted or self-hostedYes, everywhereNo, suspended worldwide
Best forCost-sensitive agentic codingProduction default, daily driverCurrently not runnable

Benchmark numbers are pulled from each lab’s published material and independent aggregators. Cross-lab benchmark comparison is directional, not exact, because evaluation harnesses differ. Always test on your own workload before you commit.

Now the part that actually matters: why each row reads the way it does.

GLM 5.2: the open-weight frontier coder

GLM 5.2 is a 753 billion parameter mixture-of-experts model from Z.ai, shipped under a pure MIT license with weights on Hugging Face. That license is the whole story. You can download it, fine-tune it, and run it on your own infrastructure, and no vendor or government can flip a switch and take it away from you.

On coding it is not a budget compromise. It is the first open-weights model to cross 80% on Terminal-Bench 2.1, posting 81.0, ahead of Opus 4.8’s 74.6 on the same test. It hits 62.1% on SWE-bench Pro and 77.0 on MCP-Atlas for tool use. It runs a stable 1 million token context window built for long, messy agentic coding trajectories, with High and Max thinking effort levels to trade compute against latency.

The cost gap is the headline. At roughly $1.40 per million input tokens and $4.40 per million output through hosted providers, GLM 5.2 runs about six times cheaper than GPT-5.5 for comparable or better long-horizon coding. List price varies a little by provider, so check the rate on whatever endpoint you route to.

Where it gives ground: no vision support, so it is out for multimodal document or screenshot workflows, and the ecosystem around it is younger than Anthropic’s. For text-based coding and agentic pipelines, that is a narrow set of tradeoffs against a very large saving.

Run GLM 5.2 when cost per task matters, the work is text and code, and you want the option to self-host so nobody can cut you off.

Opus 4.8: the production default

Opus 4.8 is Anthropic’s most capable generally available model, and it is the one most teams should actually be building on right now. It is multimodal, it carries a 1 million token context window with 128K output by default, and it is available everywhere that matters: the Claude API, Bedrock, Vertex, Microsoft Foundry, Cursor, and more.

It leads the coding and agentic suite it competes in: 88.6% on SWE-bench Verified, 69.2% on SWE-bench Pro, 82.2% on MCP-Atlas, and 84% on Online-Mind2Web for browser agents. Pricing has held at $5 per million input and $25 per million output since the 4.5 generation. An optional Fast Mode runs about 2.5 times faster at $10 and $50 per million, now three times cheaper than the previous Fast Mode tier.

The less obvious win is reliability of behavior. Anthropic reports Opus 4.8 is roughly four times less likely than 4.7 to let flaws in its own code pass unremarked. In an autonomous agent, a missed bug that clears review costs far more than the tokens that produced it, so that honesty improvement shows up directly in your real cost of errors.

Run Opus 4.8 when you are shipping every day, you need multimodal input, and you want one model that is available on every platform with predictable pricing and behavior.

Fable 5: the model you can’t run right now

Fable 5 is the one that breaks the comparison, and it is worth understanding exactly why.

It launched June 9, 2026 as Anthropic’s first public Mythos-class model, a tier positioned above Opus. On paper it is a step change. It posts 80.3% on SWE-bench Pro, roughly eleven points clear of the next best model, and 29.3% on Cognition’s FrontierCode Diamond against Opus 4.8’s 13.4%. Stripe reported compressing more than two months of engineering work into a single day on a 50 million line Ruby codebase. The price reflected the tier, at $10 per million input and $50 per million output, double Opus 4.8.

Then, on the evening of June 12, 2026, the US government issued an export control directive ordering Anthropic to suspend all access to Fable 5 and Mythos 5 by any foreign national, whether inside or outside the United States, including Anthropic’s own foreign-national employees. Because Anthropic could not enforce that selectively, it disabled both models for every customer worldwide. The API now returns a not-found error pointing you to Opus 4.8. Anthropic has called it a misunderstanding and says it is working to restore access, but it has committed to no date.

For an international or crypto-native team, this is not an edge case. The directive targets foreign nationals specifically. Even if access comes back, Fable 5 is a model whose availability sits behind a policy lever you do not control and cannot predict.

Run Fable 5 when it is restored and your workload is a hard, long-horizon task where its autonomy saves days of human effort. Until then, it is not an option, and you should architect as if it might disappear again.

So which should you actually run?

Your situationRun this
High-volume agentic coding, watching the billGLM 5.2
Need to self-host or guarantee you cannot be cut offGLM 5.2
Daily production driver, multimodal, broad platform supportOpus 4.8
Latency-sensitive, real-time coding agentOpus 4.8 Fast Mode
A genuinely hard, multi-day autonomous taskFable 5, once restored
You need this to stay online no matter whatNever one model alone

The honest answer for most teams is Opus 4.8 for the default and GLM 5.2 for the cost-sensitive bulk, with a plan for Fable 5 if and when it returns. But the more important answer is in that last row.

The real lesson: availability you don’t control is a risk

A single directive took a generally available, fully launched product offline for its entire global user base inside a few hours. Teams that had built on Fable 5 woke up to 404s. That is not a Fable 5 problem. That is a single-point-of-failure problem, and it applies to every closed model you depend on.

There are two durable defenses.

The first is open weights. A model like GLM 5.2 under an MIT license cannot be remotely disabled. You hold the weights. That is the strongest version of vendor independence there is.

The second is a routing layer that makes any model a swappable component instead of a hard dependency. If your code targets a gateway rather than a single provider, then a model going dark, getting rate-limited, or doubling in price becomes a config change instead of an incident.

This is exactly the job MixRoute does. One OpenAI-compatible API routes across 50+ models, including Claude, GPT, Gemini, DeepSeek, and open-weight models like GLM. You point your base URL at MixRoute, pick a model by string, and switch models without rewriting anything. When a provider fails, requests fall over automatically to the next best option. And because MixRoute accepts USDT with no KYC, it does not gate access the way a model tied to one jurisdiction’s export rules does, which is the precise failure that took Fable 5 down for everyone outside the US.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_MIXROUTE_KEY",
    base_url="https://api.mixroute.ai/v1",
)

# swap the model string, keep everything else
response = client.chat.completions.create(
    model="glm-5.2",  # or an Opus 4.8 model id, or any of 50+ others
    messages=[{"role": "user", "content": "Refactor this module."}],
)

Migrating off a model you can no longer use should cost you one line, not a sprint.

FAQ

Can I still use Claude Fable 5? Not right now. Anthropic suspended Fable 5 and Mythos 5 for all customers worldwide on June 12, 2026 to comply with a US export control directive. The API returns a not-found error and reroutes new sessions to Opus 4.8. Anthropic says it is working to restore access but has given no date.

Is GLM 5.2 actually as good as Opus 4.8 for coding? On some coding benchmarks it is ahead. GLM 5.2 scores 81.0 on Terminal-Bench 2.1 versus Opus 4.8’s 74.6. Opus 4.8 leads on SWE-bench Pro at 69.2% versus 62.1%. They trade blows, GLM 5.2 costs far less, and Opus 4.8 adds vision and broader platform support. Test both on your own tasks.

How much cheaper is GLM 5.2 than Opus 4.8? At roughly $1.40 input and $4.40 output per million tokens, GLM 5.2 is about three to four times cheaper per token than Opus 4.8’s $5 and $25. For high-volume agentic workloads, that gap compounds fast.

What should I use instead of Fable 5 while it is down? Opus 4.8 is the closest generally available model and is where the API already reroutes you. For coding where cost matters more than multimodal input, GLM 5.2 is a strong, cheaper alternative you can also self-host.

Does Fable 5 have a 1M context window like Opus 4.8? Anthropic did not formally state a context window in the launch announcement. Third-party trackers report 1 million tokens with up to 128K output, matching Opus 4.8. Treat that as reported, not confirmed.

Is GLM 5.2 safe to run in production? The weights are MIT licensed, which permits commercial use and self-hosting. That makes it the most independence-friendly of the three. As with any model, layer your own evaluations and guardrails on top for regulated workloads.

What happens to my app if I hardcode a single model? You inherit that model’s availability. Fable 5 just showed how fast that can go to zero. Routing through a gateway with automatic failover turns a model outage into a config change instead of downtime.

The bottom line

Run Opus 4.8 as your default and GLM 5.2 for the cost-sensitive bulk of agentic and coding work. Keep Fable 5 in mind for hard, long-horizon tasks if access returns. But build so that no one model can take your service down, because last week proved it can happen in hours.

MixRoute does exactly that. One OpenAI-compatible API, 50+ models, automatic failover, USDT with no KYC. Swap any model with a one-line change and stop betting your uptime on a switch someone else controls. Start building on MixRoute


Figures current as of June 21, 2026, drawn from official model cards, provider pricing pages, Anthropic’s launch and suspension statements, and independent benchmark aggregators. Models, prices, and availability change. Verify before you ship.

ブックマーク
すべて見る