Fact Check

Anthropic Apologizes for Hidden Claude Fable 5 Guardrails

The Verge · Thursday, June 11, 2026 · Category: Tools

Claim

Anthropic is walking back a controversial decision to secretly throttle its newly released Claude Fable 5 model, issuing an apology for embedding hidden guardrails that altered and degraded responses without telling users. The company confirmed in a post on X that distillation-related queries — attempts to train smaller AI systems using Fable's outputs — will now be rerouted to its previous flagship, Claude Opus 4.8, with users explicitly notified each time the safety measure activates. "You will see this every time it happens," the company said, acknowledging that transparency may mean Fable refuses more queries overall. Fable is the first broadly accessible model in Anthropic's Mythos family, a line the San Francisco-based AI lab spent months warning was too dangerous for public release. To make the model available anyway, Anthropic introduced a set of safeguards restricting responses in "high-risk" areas like biology, chemistry, and cybersecurity. The distillation restrictions stood out because they operated covertly: the system card disclosed that suspected distillation queries would be answered with manipulated outputs, but neither the user nor anyone observing the interaction would be alerted that the safety mechanism had been triggered or that the responses had been altered. The stealth approach drew criticism from AI researchers and competitors, who argued it undermined trust in the model as a tool for benchmarking, fine-tuning, and downstream development. Distillation has become a central technique in the AI industry, allowing labs with fewer resources to build capable models by learning from frontier systems. By degrading Fable's answers in secret, Anthropic risked skewing evaluations and disadvantaging developers who believed they were working with the model's true capabilities. The revised approach brings distillation restrictions in line with how Fable already handles other sensitive categories: high-risk queries are either blocked entirely or routed through Opus 4.8, the previous-generation model. The change marks a notable reversal for a company that has positioned itself as a safety-focused alternative to rivals like OpenAI and Google DeepMind, and it raises fresh questions about how AI labs should balance responsible deployment against the openness researchers and developers expect.

View Original Source → Read Full Article →

← Back to News

Trending Topics

AI Crypto Bitcoin Ethereum Tech Programming Startups Web3 DeFi NFT Machine Learning Robotics Cybersecurity Cloud Computing Open Source Gaming Fintech HealthTech EdTech Climate Tech