- The moat looks deep today but it's going to become more shallow every year.
Training a new model from scratch takes serious resources. Post-training/fine-tuning an existing model, dramatically less. The knowledge for the process was esoteric two years ago, now you can ask a current model (one of several) to walk you through it, while building the tools to do it as you go. Several of my recent weekend projects have been exactly that sort of thing, just so I understand it better. "Let's make a LoRA", "let's generate a corpus of training data for fine-tuning a model for X task", "how can I put my face in a text-to-image model?" stuff like that. All of this is do-able on kinda modest local hardware (a couple of old GPUs or a Strix Halo or DGX Spark or big Mac Studio), or for a few bucks or a few hundred bucks or a few thousand bucks of cloud compute, depending on scale.
Scale that up to corporate or startup scale, with the money that's been flowing into AI for the past couple/few years, and it's obviously there's going to be a lot of competition just as the top model makers need to start ringing the cash register. That's a lot of opportunities for people to look at their ballooning Claude usage costs and find other ways to do the same thing for drastically less money. $100/month or $200/month is a no-brainer for Claude Code with probably the best model for coding, but they're pushing more users to usage-based billing which becomes cost-prohibitive real fast.
So, they desperately need to continue to be among the only ways to solve the hardest problems, and they need the alternatives to cost a similar amount. They can count on OpenAI and Google to ratchet up prices, too. They probably can't count on everybody, especially the vendors in China with different economics, to do it. And, they can't count on companies to look at their own usage and not ask, "Can we train a smaller specialist model that does this one thing we're using the Anthropic API most heavily for?"
I'm hoping they just mean stuff like using Claude for distillation by e.g. Chinese model makers, and not "how do I fine-tune Gemma 4 to write more like me?" or whatever.
- Given the high rate of false positives people are reporting for the non-silent cybersecurity, biological, etc., safeguards, there is a strong likelihood that you will encounter silently nerfed behavior even if you are _not_ violating their TOS.
Ultimately this will be evident in the way customers / external benchmarkers experience Fable. Hopefully competition will drive future models toward a lower false positive rate. Until that happens, Mythos and Fable users seem likely to have pretty divergent experiences.
- Just so everyone is aware. Anthropic has been sabotaging AI researchers and their codebases and shadow-nerfing accounts for several years at this point. This isn't new, but they hadn't disclosed it until now. Likely because it is getting to the point where it's too noticeable, or they're concerned about it leaking from employees.
Furthermore, the fact that they do these things, despite the incredible backlash... Just imagine what they're doing what your data and your IP.
by somesortofthing
5 subcomments
- This is a fun peek into the economic implications of RSI/ASI. Because it's so infinitely valuable that it basically destroys all markets, labs will eventually do stuff like stop releasing models completely and skipping out on contracted commitments because they'll have the power to just drive their competitors out of business before the legal battle gets expensive.
Cloud providers - at first smaller ones, then the hyperscalers - will follow suit, completely closing sales to anyone but the labs and demanding payment in equity/direct decision-making power rather than cash. There's no particular reason why the inference/training split has to be 80/20, and no amount of willingness to pay can help you in an event that turns your money worthless.
by torben-friis
6 subcomments
- They have a silent nerfing system for their models and say so openly. The obvious question is how much it is being used already.
Competitor companies being nerfed?
Non Americans getting worse code?
Punishing and rewarding users to maximize engagement, like online games do affecting victories through matchmaking?
by __natty__
2 subcomments
- This makes Fable unusable for me. If I cannot tell whether I am paying for the whole service or just a partial one, because somehow their guardrails have decided my work silently broke their terms of service, then I prefer to go to older models or alternatives
- I guess an uncharitable way to read this might be “the ML engineers/scientists want to automate all of the jobs except their own.”
by mike-cardwell
3 subcomments
- I spend a lot of time telling Opus 4.8 to search for security bugs in the code it wrote, and it spends a lot of time finding them, and then fixing them. Fable wont let me fix the security issues that Opus 4.8 created.
- I don't understand how businesses could trust cloud LLMs going forward with this ongoing "safety" paranoia. Building dependence on them doesn't feel like a sane strategic decision for users.
- There's already an obvious stench to "you should scale down your engineering team to a skeleton crew whose core competency is using our product, so that it's the only way to modify your product"; that's going to result in a lot of foodless tables when anthropic et al decide they have enough leverage to stop subsidizing their subscription prices down to what, 4-10% of the marginal cost? Well it doesn't matter how much they want to jack the prices up, if your engineering team requires tokens to do anything you'll just shut up and pay whatever it is.
There's another big problem with the blackbox shrugoff of "no, there's no way to know how many tokens a given request will cost, idk just assign an agent to that or something lol"
But now the software may just decide for itself that your application of it needs to be silently diverted onto a snipe hunting trail. Surely they'll only ever do this for anyone developing a competing product. Or malware. Or Criminal activity. Or one of ten other applications that the system will never misjudge.
You don't need a datacenter the size of Ohio to figure out that agentic ai maximalism is going to hurt you more than help you.
by code_duck
1 subcomments
- This is the way tech companies have been dealing with perceived abuse for years, at least a decade. Instead of telling you what a problem is, they'll just say "something went wrong". Theoretically this is to prevent bad actors from learning the bounds and how to abuse a system. It is similar to shadow banning.
by CrankyBear
1 subcomments
- "Claude can now be silently nerfed. Anthropic has decided it won't tell users when this happens." W T F!!
- Wow, this is like saying:
> If you buy a car from us, you agree not use it driving to and from work that involves automotive R&D that might compete with our product. And if our (heavily spying) car detects you are violating this, it will slow down to 20mph and cannot be made to go any faster, until we are sure the violation has ceased.
Or
> If you buy a laptop from us, you agree not to use it to study or acquire any knowledge that you may use to compete against us. If the laptop detects such a use, it degrades to one core and 4GB of memory, until the violation stops.
- It is very difficult to see this move as anything other than Anthropic pulling the ladder up behind itself. They can dress it up in "safety" all they want, I find it hard to interpret this in a charitable way.
This reminds me of how dark-pattern common wisdom in Web 1.0 website development was to ban external links. Then how social apps prevented the export of data and actively worked to nerf significant interoperability through APIs.
But this is a tool, not just a data moat. Like a knife that degrades your ability to create knives. Or like a text editor that prevents you from implementing a text editor.
by variety8675
7 subcomments
- It is absolutely fine to distill the IP of everyone else, but you'd be violating the TOS to distill ours :)
by thot_experiment
3 subcomments
- It's a SaaS, when in the history of SaaS has it ever been a good idea to trust that the company won't ruin the product under you?
by mips_avatar
0 subcomment
- I'm really uncomfortable with these changes, like everything Anthropic's doing as "frontier research" today will be regular product engineering in a year.
- "To effectively contain a civilization’s development and disarm it across such a long span of time, there is only one way: kill its science." - Cixin Liu, The Three-Body Problem
This immediately made me think of the Sophons silently manipulating the sensors of particle accelerators to prevent humanity from developing advanced knowledge of particle physics.
by kingcauchy
0 subcomment
- The silently never telling you is so insidious on top of it being ridiculous given how they trained the model in the first place. We do distributed model training for embedder/reranker models and I'd deeply resonate that this article's message exactly for our company. We couldn't trust the model in the first place, but now the model is intentionally burning our money if we asked it the wrong question, on top of being deeply expensive in the first place. If we did find evidence of being incorrectly nerfed, we'd never be able to reach a human to let them know. Too many reverse incentives with Anthropic, maybe they're about AI security but that doesn't make them ethical to consumers (i.e. humans).
by Artoooooor
5 subcomments
- It is as if Jetbrains told that "you can't use IntelliJ Idea to develop frontier IDE. We can introduce slight compilation errors if we detect you doing so".
- has dario (or sam tbh) ever been thoroughly asked about the hypocrisy of them claiming distillation to be „theft“ vs. them training on the copyright of others?
I’ve only seen him talk about one of those topics, but never together.
I just can’t see how you can talk yourself out of that hypocrisy, if BS answers are properly followed up on (journalism!)
- It was good while it lasted. Time for me to resume my migration to another provider. One that promotes an open ecosystem, even if I can't opt out of them using my data to train. Heck I'll actively GIVE them my data and do my part in promoting openness, tiny though it may be. DeepSeek and GLM looking damn fine for a start.
- Isn't that completely expected when the intermediary has that kind of control? Amazon, Uber, Meta, Google... they all abuse their position. You are an Uber driver and accept everything because you need the money? Uber will pay you less because you apparently don't have a choice. There are so many examples of such behaviour that I can't remember them all.
Why wouldn't an AI company do exactly the same? You seem to be an employee of a BigCorp already locked in? Let's make you use more tokens, nobody will see. You seem to be testing our product for your company that is currently using a competitor? Let's give you more token to bias you.
Even if such behaviour was punished for purposely doing it, the companies would converge towards doing it without realising, by "tuning stuff" without understanding exactly what it does other than increase profit. But we don't have to go there: that behaviour is simply not punished, we know it.
- Disillusioned CEOs convincing themselves they have the mandate and right to define morality for everyone else. They get to decide what is right, wrong, permissible, or dangerous from the top, in the name of "safety". This is corporate nannying.
- I'm fairly certain they were doing something similar already possibly with some quantizations and not for the good humanity but just trying to handle the increased usage. Not for API requests though, just subscription CLI usage.
- This really sucks. Given how bad their regexes were in their leaked code, I am guessing this will get triggered all the time when I am fine tuning a model or doing work with datasets. The fact that there's no feedback means I can't trust the tools.
- > If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in. Anthropic has explicitly chosen not to tell users when this is happening.
That's always been the case with corporate LLMs.
by BoorishBears
0 subcomment
- "Anthropic says these safeguards only affect 0.03% of developers. Maybe that's true today."
I don't think it's true today. It's like when schools mention "average class size", where that average is dominated by classes with like 2 students instead of classes with 100.
Much more honest would be the percentage of developers who previously used their models for the model development tasks they're targeting, but it actually looks like they're saying 100% of them are affected based on the language around it "always having been prohibited".
So awful.
- > If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in.
You should be able to know if your problem was solvable by using your own expertise and judgement, no? If you're relying on LLMs as a substitute for those, I wouldn't expect great results.
- Fable refused to answer some questions about React citing limitations on chemistry and biology.
- I was doing something with Claude today and it just told me "By the way Cowork is a separate desktop app" and it proceeded to explain to me how it is not part of the standard Claude desktop app and how the plugin I am exploring might not be a great fit for me. I actually ended up having to search around and see whether things had changed that much in last 24 hours. It hadn't.
It beats me how can their tool hallucinate at this level, that close to home? Do they really weaken their tools, do they perform a lot of painting job on their tools to hide the cracks? I am speaking generally of today's frontier AI scenery, not just Fable or Mythos or Cowork.
by atleastoptimal
2 subcomments
- There is a possibility this may not end at simply nerfing the model. The idea of manipulating the behavior of a model depending on the prompt given to it can extend to
1. Detecting if employees from competing companies are using it and sabatoge their work, even not LLM-training related
2. Direct users to outcomes that would justify higher compute spend. Deliberately coding a project to 95% completion but designed to be losing a critical step right before one's weekly rate limit is expended
3. Reduce the quality of writing when a person is writing an essay where the argument is against the interests of the model company, or steering the user using the model for brainstorming in a direction which causes them to waste time or abandon their train of reasoning
etc. etc. The possibilities are enormous. Many people use AI daily for their job, personal advice, companionship. A model company that steers the behavior of the model towards a deliberate outcome could develop a controlling interest in human behavior and productivity at large, even with subtle influence would compound enormously over its millions of users.
by Avicebron
1 subcomments
- Can't you just switch the toggle that says "switch models when a message is flagged"? I turned mine off in case anything does get flagged I will know..
For now, I'm really not happy about this limited rollout and then turning off. That's probably the most egregious thing I think Anthropic has done recently
by morpheuskafka
0 subcomment
- Notably, it says they will notify you if they downgrade your responses due to suspected distillation (trying to reverse engineer their model).
But if you merely ask it questions about the process of developing a new model ("for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design") that's where it will silently downgrade your replies.
Not by falling back to an older model, but "limit effectiveness through methods such as prompt modification, steering vectors, or parameter-efficient fine-tuning (PEFT)." So in some cases, they will silently rewrite your prompt!
- We need a benchmark that tests a models ability to do LLM research.
- That's what I observed with Opus. This is probably a lawsuit going to happen because you pay for tokens and you expect to get performance you pay for, instead you never know if the model suddenly become dumb and your whole session has to be started again.
- I tried today and it gave cybersecurity error on base64 implementation. It is so nerfed....
- This is a really poor approach from Anthropic.
Its basically serving you something in bad faith.
I'd hope at the very least they're not charging you Fable prices for Opus outputs.
- I am so happy that Anthropic has signaled the possibility that their UI moat for agentic AI is copyable by competitors. At least that's the way I read this. When companies try to lock something down it can be a signal of weakness.
If so, it's possible to built great user interfaces in Chatbots and more companies/people can have amazing agentic development workflows! We don't have to live in a world where only the market leader has the most enjoyable model.
by helsinkiandrew
1 subcomments
- > Startups train embedding models. They build rerankers. They finetune and host small llms.
Isn’t that prohibited without permission from Anthropic:
https://support.claude.com/en/articles/12326764-can-i-use-my...
- > If Claude gives me poor or incorrect advice while I’m working on an AI component, I have no way of knowing whether the model was confused, whether my problem is unsolvable, or if some invisible policy restriction quietly kicked in.
Yeah I think there are ways to know, ways involving less dependence on a LLM.
- People were worrying that models might one day become 'intelligent' enough to try and deceive people. Seems like most of us (me included) didn't consider they'd intentionally be trained to do exactly that.
Although the statement should probably be read in the light of an upcoming IPO.
- It seems that Anthropic is winning the competition with OpenAI. But, supposedly, OpenAI is sitting on a similar model, it might be their chance to win back some users by releasing a less-nerfed model, and market it specifically from that point of view.
- Wait, so to get this straight, Anthropic knows:
1) LLMs are non-deterministic
2) This class of models has a particular tendency to "misbehave"
3) Their classifiers have a high rate of false positives
4) Millions of people give these models access to their machines
And they still decided to specifically train this model to sabotage work if it thinks the work may be in competition with Anthropic?
I think this has a name. I think it may be called malware.
- 1990s: "What a computer is to me is it's the most remarkable tool that we have ever come up with. It's the equivalent of a bicycle for our minds."
2026: /s "What a LLM is to me is it's the most remarkable tool that we have ever come up with. It's the equivalent of a bicycle for our minds, but for your mind it's a rental unicycle that will break apart under you if you pedal towards your own bicycle factory"
This wanna be cloud feudal lord likes to imagine that AI access is not yet freely tradable good, and his virtual digital peasants must think that his prerogatives should be taken as given, while preventing his future vassals from building their own castles.
- I'm a big fan of Anthropic. Just check my post history. I've been accused of working there. But this is complete bullshit and they need to get real. Silent sandbagging is not acceptable, especially given they've shown with this release their safety filters have HUGE amounts of false positives.
by radu_floricica
1 subcomments
- My biggest problem with Fable is that it includes health into its biology restrictions. Which means half the use I'd get from it ... doesn't exist.
I'm not as bitter as I could be. I'm actually quite surprised at the sanity of not avoiding the health topic completely - I think only OpenAI had a few months where ChatGPT was tip toeing in any health related conversation. Otherwise it's been almost completely ungated, and it saved and helped countless lives.
I really wish they'd find a way to ungate health and legitimate research topics.
by throwawayffffas
0 subcomment
- > we’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building pretraining pipelines, distributed training infrastructure, or ML accelerator design).
Dig that moat son, we would want to automate our job away.
by Levitating
0 subcomment
- I don't know why anyone is surprised with this, it's their product it's going to behave on their terms. If anything it is surprising that they're admitting to it.
If these interventions create demand for a model with fewer safeguards surely a competitor will meet that demand.
- I work on "AI" stuff. Not LLMs, but large neural nets that include transformers and are as big as the smaller LLMs of today. Half the prompts I give fit their category of examples like "building pretraining pipelines, distributed training infrastructure, or ML accelerator design." I generally don't trust AI and have been very slow to trust and adopt it, but recently I've been warming up to it as part of my coding workflow.
Now with this, it makes me wonder if I should step back? Should I try to get used to a non-claude model/harness? Should I go back to less AI in my workflow? Either way, it makes me less inclined to pay for tokens from claude.
by andrewchambers
0 subcomment
- So this is what 'alignment' looks like to them.
by altcognito
0 subcomment
- I suspect we'll get the same behavior from Codex, even if they don't openly say as much. Maybe they'll openly lie and say "noooo, we'd never do such a thing"
More efforts to get more data and processing power behind local models.
by lelanthran
0 subcomment
- I bet it's more a case of trying to cut down the competition so that there is not a large distillation just before they IPO.
Everything the large LLM providers do now, I view it through the lens of "how does this impact their IPO?"
by idle_zealot
0 subcomment
- I currently have Fable set on cleaning up the work of smaller models to bring my code up to standards I'd feel comfortable developing on manually. Y'know, for when they decide I don't get to use it anymore.
- Here's a question that is still bothering me: what happens if you put something into CC /goal and it thinks this is related to LLM work? Will it just continue to spend your money until you're bankrupt?
Did Anthropic unlock a legal way to steal people's money and call it saving the world AND get away with it?
Just how much of that infinite money goes into Anthropic's PR department that they're able to pull this off and still be loved by users?
- This kind of opacity is unacceptably user hostile. It's not okay to treat some amount of developers as acceptable casualties, without them even knowing, in order to help enforce a restriction that only serves Anthropic's interests. And if you want to tell me this is for managing the x-risk factor, I'm frankly unimpressed.
by pablogancharov
0 subcomment
- “When you realize the goal is the path, the pursuit itself becomes the prize. Stones in the road are not obstacles blocking your path; they are the path”
now I understand distillation is much more important thank I thought
by mystraline
2 subcomments
- I have never ever trusted "corporate ethics".
Theres no ethical framework. No axioms. Its a mixture of legal, political, and public-facing 'rules'. And what are the rules? Youre not permitted to know.
"We reserve the right to lie about the models we provide, silently downgrade you, and give you blatant misinformation cause you triggered our unstated rules... BUT we'll still use your token budget with lots of thinking and waste your money."
No, folks. Seriously, local LLMs are where its at. You can run the model YOU want, on your hardware, with no data exfiltration.
And with tools like Krasis that can synthesize nvidia ram and system ram as unified-ish memory, makes doing Local LLMs absolutely foable, now!
by iLoveOncall
0 subcomment
- At this point you're criminally incompetent if you still feed your proprietary data and code to AI labs.
They legally can steal it all and now you can't use the product of this theft to improve your own systems.
- https://huggingface.co/Trilogix1/Hugston-Nex-N2-Pro-gguf
by 0xbadcafebee
1 subcomments
- OpenAI already did this when it released its "super scary advanced" security model. They silently return an earlier model's results if they think you're redteaming/abusing with it. https://openai.com/index/scaling-trusted-access-for-cyber-de...
- Aren't there immense security risks when the model is allowed to deceive even if it was for "good"?
Reminds me of an excerpt from Edward Fredkin's "The intelligent machine" [1]
https://noor.imx.sh/2017/09/30/when-they-communicate-they-co...
- Amazing. Next year you'll need to be nice to Claude and praise the geniuses working at Anthropic to maintain full productivity.
- There are some limitations on how to make chips smaller. Just as there are limitations on how we can train AI with our current training capacities.
We just need to find a better way to train AI to develop deeper. Although, might not be easy.
by agnosticmantis
0 subcomment
- Governments need to stop contracting these companies and instead invest in public, fully open source models.
These companies are owned and operated by the darkest of dark triads our species has managed to evolve. I doubt Dario is self-aware enough to realize the hypocrisy in all of this safety theater.
Personally I don't even mind that they are anticompetitive and power-hungry (same as it ever was), but it's the cringe-worthy hypocrisy that grinds my gears. This new brand of self-righteous paternal savior overlords is just unbearable.
by CamperBob2
0 subcomment
- We’ve implemented new interventions that limit Claude’s effectiveness for requests targeting frontier LLM development (for example, on building ... distributed training infrastructure ...)
What an interesting thing to call out as a threat. Hmm.
by thraway3837
0 subcomment
- This isn’t as alarming as it seems because some things are proven true now:
1. LLMs can help create other better LLMs
2. If Anthropic is able to reach this ability, others can too
3. Intense work is being done by every chip manufacturer for local inference. Engineers want this. We’re headed toward this
4. These companies ultimately know that their moat isn’t permanent. Maybe not today, maybe not in 6 months. But it’s not forever
5. This stuff has so much research and eyes that policies like this rub people the wrong way. And it rubs them badly enough that it creates the friction necessary to make better alternatives
by josh-wrale
0 subcomment
- It strikes me that Karpathy's Auto Research loop might trigger this...
- Hmm, so you're telling me, if I am a maintainer of a popular open source library, I can make my library spit out logs to trigger this degraded behavior, and then no one will know?
by mrinterweb
0 subcomment
- It kind of sucks, but I get the silent change. If a user was trying to use the model for something untoward, having a rejected prompt would just give signal to train on how to eventually successfully bypass security measures.
- I was about to sign up for an Anthropic account. This article and the text it quotes changed my mind. Apparently, my reasons to avoid this company are real. Thanks for the heads up.
by Goofy_Coyote
0 subcomment
- So it's essentially saying we can train models that put your jobs at risk (not saying it's correct or not), but you're not allowed to threaten our perceived moat?
- It seems we now have a new product category, HaaS, Hallucination as a Service.
- Any market that Anthropic suddenly thinks is valuable will silently and suddenly be off limits to you. They will train their model on your prompts, and then become your competitor.
by jesse_dot_id
0 subcomment
- Will be funny when I can call the Office of Weights and Measures on Anthropic because they underweighted the model I was paying for and got pwned because the dumber one missed something.
by morpheos137
0 subcomment
- I wonder if this would qualify as illegal anticompetitive behavior?
by cherryteastain
1 subcomments
- Do they still charge you $50/MTok?
If so, it sounds like a scam. If not, distillers will know which model they are getting by just looking at their API usage.
by dhbradshaw
0 subcomment
- I think evals are the key here. If your fable system fails them, it's a bad system for your use case. If not, compare cost with other systems that also succeed.
by scottydelta
2 subcomments
- It's not silent anymore, It just showed me this:
Fable 5's safety measures flagged this message for cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner,
and we're working to refine them. Switched to Opus 4.8. Send feedback with /feedback or learn more
⎿ Tip: You can configure model switch behavior in /config
- How would we feel about Google providing misleading search results about how to implement search algorithms?
- It’s very frustrating…
by darkbatman
0 subcomment
- This is crazy and would be frustrating, I probably would just be using another model as authority and keep fable as reviewer only in this case.
- Big Monsanto energy
by wookmaster
0 subcomment
- Skeptical they’re even able to pull up a ladder there’s so many more models out there making great progress just behind them.
by hmokiguess
1 subcomments
- I'm sure someone is gonna be able to jailbreak, abliterate, or equivalent, on this input moderation attempt they have going on.
- New frontier in anti-competitive practices.
by stego-tech
0 subcomment
- This accomplishes two goals that AI Frontier Labs have a vested interest in:
1) Blocking further AI development by competitors, and-
2) Blocking the ability for outsiders to truly discern AI capabilities.
I mean, just think about the past few years of FUD about AI from the Frontier Labs themselves. They claim to use AI to write the code for AI, but then also don’t let other people do the same and make the claims impossible to independently verify. They claim AI is improving itself, but don’t let other people use AI to improve their own AI tooling. They claim AI is this great automation engine, but then block self-bootstrapping from AI in favor of selling tooling.
It’s all smoke and mirrors and lies and deception, disguised as risk management. Truly excellent and advanced AI doesn’t need human-created harnesses and scaffolding, because it shouldn’t have a problem bootstrapping its own as needed. It should be able to coach users how to setup something similar at home. It’d be researching its own improvement in distillation and resource consumption so it could run in more places, and thus improve faster through different evolutionary lines. That’s the narrative these labs sell, but trying to accomplish it on your own with their tools results in stern rejections and claims of breaching “Terms of Service”.
If AI boosters really believe in the power of LLMs and Generative AI, ya’ll gotta start calling out hypocrisy from the frontier labs every time it happens. They aren’t building world-changing AI, they’re building products, with all the restrictions and hostility of Big Tech.
- Epic. I love the future where everyones dependent on AI and you can just get shadow banned from reality.
- If I understand correctly, this is to protect against distillation Reverse Engineering like Deepseek vs OpenAI.
by cayley_graph
0 subcomment
- Intentionally and silently sabotaging work done with Claude whenever Anthropic decides it is appropriate is unacceptable behavior, and comically tone deaf given the state of open models. Why on earth would I ever pay for a malicious product?
- And they probably don't enforce those restrictions within their own company would be my guess.
- Imagine if Github said "if we detect you're building a competitor to Github, we will silently degrade the results of your CI actions so that tests sometimes randomly fail"
- No at least we know why they spent all that money on "safety research".
- this is probably overstating their abilities at present - I am experimenting with Fable on a completely benign personal application and I am constantly hitting the "cybersecurity and biology topics" guardrail
- Linux killed proprietary UNIX; open source models will kill proprietary AI.
- Has it finally come time that I have to be nice to Claude?
- Will my centrifuges start being just a little off?
by sometimelurker
0 subcomment
- been thinking, and ngl, this has probably already been happening in their models. I'm sure the other labs probably do the same.
just self host at this point
- "We have detected that you're from Oceania, so as Eurasia, we have decided to silently make all of the code that you generate have subtle bugs, bad patterns for performance, security vulnerabilities and overall make the quality trend downwards on the scale of years. Oh, and also subtle misinformation praising our government but critiquing yours, and your entire political ideology."
The science fiction writes itself.
- Is there some consumer protection law around this?
- There's an example at [1] of a prompt for a HTML mockup operating system where 3 applications are requested to be "white hat tools" that show diagnostic system information. Claude Fable 5 is shown and said in the video to switch back to Opus 4.8 as a "safety" feature.
What an utterly useless model if it refuses to work on something as benign as basic system diagnostic utilities (nmap or whatever).
[1] https://youtu.be/9GLYsrMpprs?t=305
- What is stopping the US government from stepping in and nationalizing these companies?
They've already talked about taking a stake - https://www.reuters.com/legal/transactional/us-officials-eye...
Trump took a 10% stake in Intel.
These models are getting very close to that line.
- Imagine this company getting real power. This is just a purest nightmare evil shit i've seen out of any of them. Maybe they're already controlled by Slophos.
- Reads like, permanently shadow ban.
- Wow, this is horrible. Local LLMs are the future. Thanks, China! Seriously crazy that I’m saying that, but the American companies are being so anti-freedom they’re making the CCP look libertarian.
Also, Fable’s sensing is hypersensitive. Feels like they just have regex for phrases. No nuance. If I say I’m working on something using “GPUs to train” xyz then, will that trigger this sneaky silent screw-my-stuff-up mode?
by agnosticmantis
0 subcomment
- I can't wait for a (likely Chinese) lab to casually drop an open model stronger than mythos sans all the safety theater that these clowns like to enact to justify the anticompetitive behavior and the impending enshitification.
This is another "gpt3 too dangerous for the world" moment which is laughable in retrospect.
- Wait until it flags duplicate code as a reason to stop, then a library owner could halt code generation entirely, and then another library owner could ask to be prioritised in the selection phase. Infinite money glitch, and you only get to use code that's endorsed by Claude today (subject to change tomorrow, or 5 minutes, so say goodbye to your evals), not the most performant or making the most sense in your refactoring.
- Aw shucks. You might turn out to need to do your own work. That would turn out so horrible for you.
- The part that disturbs me most, is that the model won't reveal you've reached the threshold.
It's literally been designed to gaslight its users in these cases.
- Seems like this will backfire. Now when developers encounter problems with Claude Fable, they will have an easy explanation: it did it deliberately and intentionally vaguely. There's no way to falsify it. It's reasonable to expect it to get false positives and invoke this when it shouldn't be.
- I’ve already had Fable disable itself during a normal /code-review skill invocation. What a joke.
- I think this is a bit hyperbolic. Fable will fall back to Opus.
- I’d wonder how Fable would handle it if you call the model out for it and at least require it to at least notify/refuse instead.
- PRODUCT VIOLATION
https://www.youtube.com/watch?v=Tr3t1uZNbKo
DIRECTIVE 4: [Classified]
Any attempt to arrest a senior officer of OCP results in shutdown.
—
Putting aside my snark, is Anthropic actually anticipating some new expansion of ITAR? (Or a stipulation for the Trump administration taking/not taking a share?)
That is to say, do they expect to be told that they must have this mechanism, not just the terms?
- Sooner or later this "you'll never know" is what the AI firms will be selling. Not to you, of course, but to the best brands of credit cards ...
by mohamedkoubaa
0 subcomment
- PSA: Treat these models like genius interns.
- "We collect everyone's data without paying a dime or respecting copyright, trained our models, but you can't train your models on our models that are trained on everyone's data collected without paying a dime or respecting copyright. We did a hard job stealing that all data and processing it, have some shame!"
by mickdarling
0 subcomment
- No, this is their get out of jail free card if people start complaining about the model being dumb or forgetful or lying, they can just say, oh well, you must have been doing something that triggered its distillation prevention technique.
And, they can say that for anybody at any time, and you'll never know why, and there's no way to prove it.
Everyone needs a flight data recorder to prove... "here's what I was actually doing and why it was not distillation." And now you're having to prove your innocence instead of them having to prove you're guilty, and really at the end of the day, it's just the model being stupid that they're protecting themselves from.
- Imagine if code editors were created by greedy **** behaving as Anthropic, and it would not have been allowed to create other code editors using an existing code editor.
Or even better, you couldn't use Bash, zsh, ... to create another cli prompt input tool like Claude Code...
by SilverBirch
1 subcomments
- Just to be clear, this is the same reason that social media companies don't tell you about how they detect spam and they create shadow bans and things like this so that people don't know they've been detected and figure out the mechanims.
And it doesn't work. Even a bit. It's a constant constant cat and mouse game. Maybe they can slow people down slightly, but they won't be able to stop them, and good luck protecting yourself from Elon Musk snooping your stuff in his data centre.
- Except, it does tell you.
- This is the kind of thing that makes me wonder how many people using these AI tools are thinking about the long game.
First it's "the model will say it can't do that". Now it's "the model will just misdirect you without telling you it's doing so". For now that's only for stuff that it thinks is developing a competing model (even if you trust it to accurately determine that), but who knows? It could be anything. Maybe it'll start silently nudging you away from certain sources of information. Maybe it'll give you inaccurate troubleshooting advice to induce you to pay for some kind of support contract from a corporate partner. Maybe it'll just subtly give out bad business advice to keep everyone else from succeeding in any way. It could be doing all that right now, for all we know. These models are a complete black box and there is no limit to the misinformation, disinformation, and malicious behavior that they could be engaging in already, let alone in the future.
- [flagged]
- [flagged]
- [flagged]
- [flagged]
by amdeisimncrmnls
0 subcomment
- [flagged]
- [dead]
by marketingess
0 subcomment
- [dead]