← ブログ

この記事はまだお使いの言語で提供されていません。英語版を表示しています。

Claude Fable 5 Fallback to Opus 4.8 Explained

Learn how Claude Fable 5 safeguards interact with Opus 4.8 fallback behavior in production API systems.

By Dora 9 min read
Claude Fable 5 Fallback to Opus 4.8 Explained

It’s Dora. I have been routing production traffic to Claude Fable 5 for about a week. Long enough to watch the fallback behavior fire, short enough that I still remember what surprised me. This piece is for anyone who has just integrated Fable 5 and seen stop_reason: "refusal" come back on a perfectly innocent prompt — or who is about to, and would rather not find out at 2 a.m.

The short version: a Claude Fable 5 fallback is not an error. It is a documented part of how the model ships. When a safety classifier declines a request, the API returns HTTP 200 with a refusal stop reason, and Anthropic gives you three ways to retry that request on Claude Opus 4.8 without losing the user. If you treat it as an exception to catch, you will mishandle it. If you treat it as a routing decision, it slots in cleanly.

I will go through what triggers a ​​​fallback​***, what the ​API​ actually returns, how to implement the retry, and what it means for billing.***

Why Opus 4.8 matters in Fable 5 routing

Fable 5 safeguards and fallback behavior

Fable 5 is Anthropic’s most capable widely released model, and it ships with safety classifiers that sit in front of the model. When a classifier flags a request, Fable 5 does not answer. The request can be re-run on Claude Opus 4.8, and the user is told that happened. This is documented in Anthropic’s announcement of Claude Fable 5 and Mythos 5.

Anthropic states the classifiers fire on fewer than 5% of sessions on average. That number tracks with what I have seen so far. Most of the time you do not notice the fallback machinery is there.

Mythos 5 restricted access context

Mythos 5 is the same underlying model as Fable 5, minus the classifiers. It is not generally available. Access goes through Project Glasswing, which is currently limited to cybersecurity partners and a smaller set of biology researchers under a separate trusted access program. If you do not already have access, you are building on Fable 5. The anthropic mythos branding can be confusing here — Mythos is the model class, and Fable 5 is the publicly available member of that class.

For the rest of this piece, assume your code is calling Fable 5.

Why fallback is a product feature, not just an error path

This is the part that took me a minute to internalize. Opus 4.8 is not a degraded experience. It is the previous-generation Opus tier, still capable, and it does not run the same classifiers. So the routing logic is: try the strongest model first, and if a classifier declines, fall through to the model that was the flagship two months ago. The user gets an answer either way. That is the whole design.

A fallback is not a bug report. It is a routing decision your code makes on the user’s behalf.

What triggers fallback or refusal

Cybersecurity, biology/chemistry, and distillation categories

The stop_details.category field tells you which classifier fired. The published categories on Fable 5 include cyber, bio, and reasoning_extraction — the last one is for requests that look like attempts to reverse-engineer or distill model outputs under Anthropic’s Terms of Service. The current list and exact behavior are in the refusals and fallback documentation on the Claude API docs.

I have not seen bio fire on anything I do. I have seen cyber fire twice. Both times the prompt was security-adjacent but harmless — one was a question about how a specific log format was structured, the other was about a CVE that is years old and fully patched. Neither was an attempt to do anything. The classifier saw the surface pattern and declined.

False positives and conservative safeguards

Anthropic has been explicit that the classifiers are tuned conservatively — stricter than would be ideal, in their own framing. That is the trade-off. They would rather decline a harmless cyber question and route you to Opus 4.8 than miss a real misuse case. The fallback exists exactly because the false-positive rate is non-zero by design.

If you build with that assumption, the surprises go away. If you build assuming refusals are rare emergencies, the first one breaks something.

What the API returns when requests are declined

The response is a normal HTTP 200. The shape is roughly:

{
  "role": "assistant",
  "content": [],
  "stop_reason": "refusal",
  "stop_details": {
    "type": "refusal",
    "category": "cyber",
    "explanation": "..."
  },
  "usage": { "input_tokens": 106, "output_tokens": 1 }
}

You are not billed for a request refused before any output is generated. If you continue the same conversation without resetting the refused turn, you will keep getting refusals — Anthropic’s streaming refusals documentation covers this specifically. Remove or rewrite the turn before retrying.

The category field is informational. Do not branch user-facing copy on it. It can also be null in some surfaces, including batch results, so detect refusals by checking stop_reason directly.

How builders should implement fallback

Three ways. Pick one. Do not stack them.

Server-side fallback parameter

The cleanest path on direct Claude API or Claude Platform on AWS is the opt-in fallbacks parameter. It is currently in beta. You add a fallback model list to the request, and if Fable 5 refuses, Anthropic re-runs the request on the next model in the list — Opus 4.8 at launch — and returns that response to you. One round trip from your side.

Not supported on the Message Batches API, and not currently available on Amazon Bedrock, Vertex AI, or Microsoft Foundry. For those, use the SDK middleware.

Client-side SDK middleware

The Anthropic SDKs ship a refusal-fallback middleware. You configure a client once with a fallback model list, and it handles the retry, the beta header for the fallback credit, and the conversation history bookkeeping. The accepting model gets pinned for follow-up turns so the conversation stays consistent.

I used the middleware. The configuration is one block at client construction, and after that client.beta.messages.create behaves exactly like the regular client — except refusals route automatically. That is the path I would recommend if you are on Bedrock, Vertex, or Foundry, or if you just want the same code path everywhere.

Logging classifier outcomes without exposing sensitive content

When a refusal happens, log enough to debug — model, timestamp, category — but do not log the full prompt back into your application logs if it might be sensitive. The classifier already flagged it. Treat the prompt as something you want to handle, not something you want indexed in your observability stack.

I keep a counter on stop_details.category and a sample rate of full payloads for development environments only. That gets you false-positive patterns without leaking the content.

Billing and user experience

Avoiding duplicate prompt-cache cost where supported

If your original Fable 5 request used a long cached prefix, you do not want to pay for that cache read twice when you retry on Opus 4.8. The fallback credit handles this. When a refusal mints a credit, you get an opaque fallback_credit_token in the response, and passing it on the retry request avoids the duplicate cache charge. The mechanism and the beta header are documented in AWS Bedrock’s fallback credit guide, and the SDK middleware sends the header for you. The token lasts five minutes.

If you used the server-side fallbacks parameter or the middleware, this is handled. If you are doing a manual retry, you have to wire it in yourself.

Explaining fallback to end users

A fallback is not a failure. But the user does need to know the response came from a different model, both for transparency and because Opus 4.8 may answer differently. I show a small inline note — something like “Answered with a fallback model” — and link to a help page that explains what that means. Not an apology. A label.

What I do not do is expose the category to the user. “cyber” or “bio” out of context reads like an accusation, and it usually is not.

Keeping safety behavior observable

Track refusal rate as a normal SLI. If it drifts up week over week, you want to know — either your usage is shifting toward flagged categories, or a classifier has been retuned. Both are operationally interesting. Both are invisible if you do not measure.

FAQ

Why does Fable 5 fall back to Opus 4.8?

Because Fable 5 ships with safety classifiers that can decline requests in specific categories (cyber, biology, chemistry, distillation). When that happens, Fable 5 does not answer, and the request can be re-run on Opus 4.8 — which does not run the same classifiers — so the user still gets a response.

How should API teams handle a refusal response?

Treat it as a normal API outcome, not an exception. Check stop_reason == "refusal". Either use the server-side fallbacks parameter, the SDK middleware, or implement a manual retry with the fallback credit token. Reset the refused turn before continuing the conversation, or you will keep getting refusals.

Does fallback mean the request is unsafe?

No. The classifiers are tuned conservatively, so harmless requests in adjacent categories will sometimes trip them. Anthropic states fewer than 5% of sessions hit a fallback. Treat a refusal as a routing signal, not a verdict on the user.

When should Opus 4.8 be the default model?

When you do not need Fable 5’s reasoning ceiling and you want to avoid the routing logic entirely. Opus 4.8 is roughly half the per-token cost and does not run the same classifiers. For routine work, Opus 4.8 is often the more sensible default. For long-horizon agentic runs, Fable 5 with a fallback configured is the move.

Conclusion

A Claude Fable 5 fallback is a routing event, not an error. The classifiers fire conservatively, the API returns a clean 200, and Anthropic gives you a server-side parameter and an SDK middleware that handle the retry, the cache billing, and the conversation history without you writing much code.

The implementation work is small. The framing shift is the harder part. Once you stop treating refusal as an exception, the rest follows.

I am still watching how often the cyber classifier fires on legitimate questions. One more week of data should tell me whether I need to tune anything on my side. Continuing next week.

Previous posts: