FRESH

Hacker News

Home

Anthropic drops flagship safety pledge

720 points by cwwc

by latexr

7 subcomments

> “We felt that it wouldn't actually help anyone for us to stop training AI models,”
How magnanimous! They are only thinking of others, you see. They are rejecting their safety pledge for you.
> “We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”
Oops, said the quiet part out loud that it’s all about money. “I mean, if all of our competitors are kicking puppies in the face, it doesn’t make sense for us to not do it too. Maybe we’ll also kick kittens while we’re at it”.
For all of you who thought Anthropic were “the good guys”, I hope this serves as a wake up call that they were always all the same. None of them care about you, they only care about winning.

by shubhamjain

26 subcomments

I was wondering if it was because of heavy-handedness of the administration, but apparently:
> The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.
Their core argument is that if we have guardrails that others don't, they would be left behind in controlling the technology, and they are the "responsible ones." I honestly can't comprehend the timeline we are living in. Every frontier tech company is convinced that the tech they are working towards is as humanity-useful as a cure for cancer, and yet as dangerous as nuclear weapons.

by drzaiusx11

11 subcomments

Public benefit corporations in the AI space have become a farce at this point. They're just regular corporations wearing a different hat, driven by the same money dynamics as any other corp. They have no ability to balance their stated "mission" with their drive for profit. When being "evil" is profitable and not-evil is not, guess which road they'll take...

by heftykoo

5 subcomments

Ah, the classic AI startup lifecycle:
We must build a moat to save humanity from AI.
Please regulate our open-source competitors for safety.
Actually, safety doesn't scale well for our Q3 revenue targets.

by lebovic

7 subcomments

I used to work at Anthropic. I fully believe that the folks mentioned in the article, like Jared Kaplan, are well-intentioned and concerned about the relationship between safety research and frontier capabilities – not purely profit.
That said, I'm not thrilled about this. I joined Anthropic with the impression that the responsible scaling policy was a binding pre-commitment for exactly this scenario: they wouldn't set aside building adequate safeguards for training and deployment, regardless of the pressures.
This pledge was one of many signals that Anthropic was the "least likely to do something horrible" of the big labs, and that's why I joined. Over time, the signal of those values has weakened; they've sacrified a lot to get and keep a seat at the table.
Principled decisions that risk their position at the frontier seem like they'll become even more common. I hope they're willing to risk losing their seat at the table to be guided by values.

by sfink

0 subcomment

I guess this is Anthropic's DRM moment. (Mozilla resisted allowing Firefox to play DRM- limited media for a long time, until it finally had to give in to stay relevant.)
I don't know enough to evaluate this or other decisions. I'm just glad someone is trying to care, because the default in today's world is to aggressively reject the larger picture in favor of more more more. I don't know how effective Anthropic's attempts to maintain some level of responsibility can be, but they've at least convinced me that they're trying. In the same way that OpenAI, for example, have largely convinced me that they're not. (Neither of those evaluations is absolute; OpenAI could be much worse than it is.)

by bbatsell

3 subcomments

This headline unfortunately offers more smoke than light. This article has nothing to do with the current tête-à-tête with the Pentagon. It is discussing one specific change to Anthropic's "Responsible Scaling Policy" that the company publicly released today as version "3.0".

by Rapzid

6 subcomments

How is this article not going to even mention the recent threats to Anthropic from the Government?!

by honeycrispy

11 subcomments

Anthropic's CEO Dario has annoyed me to no end with his "AI will take all the jobs in 6 months" doomer speeches on every podcast he graces his presence with.

by chris_money202

6 subcomments

First they rushed a model to market without safety checks, and I said nothing. It wasn't my field.
Then they ignored the researchers warning about what it could do, and I said nothing. It sounded like science fiction.
Then they gave it control of things that matter, power grids, hospitals, weapons, and I said nothing. It seemed to be working fine.
Then something went wrong, and no one knew how to stop it, no one had planned for it, and no one was left who had listened to the warnings.

by SirensOfTitan

3 subcomments

What an interesting week to drop the safety pledge.
This is how all of these companies work. They’ll follow some ethical code or register as a PBC until that undermined profits.
These companies are clearly aiming at cheapening the value of white collar labor. Ask yourself: will they steward us into that era ethically? Or will they race to transfer wealth from American workers to their respective shareholders?

by sigbottle

0 subcomment

There's one tweet from the the blog a few days ago (astral something?) that sums up my view of the problem pretty well.
General population: How will AI get to the point where it destroys humanity?
Yudkowsky: [insert some complicated argument about instrumented convergence and deception]
The government: because we told you to.
Again, not saying that AI is useless or anything. Just that we're more likely to cause our own downfall with weaker AI, than some abstract super AGI. The bar for mass destruction and oppression is lower than the bar for what we typically think of as intelligence for the benefit for humanity ( with the right systems in place, current AI systems are more than enough to get the job done - hence why the Pentagon wants it so bad...)

by FitchApps

1 subcomments

"AI Company with Soul" - yeah right until competitors show up / revenue drops / bad quarter results then anything goes. Sadly, this is another large enterprise that puts profits before ethics and everyone's wellbeing

by ndr

5 subcomments

Worth checking this post from someone who actually has worked on this change:
> I take significant responsibility for this change.
https://www.lesswrong.com/posts/HzKuzrKfaDJvQqmjh/responsibl...

by pjmlp

2 subcomments

Always the same "Do no evil" tragedy, don't believe in corporations.

by goranmoomin

3 subcomments

TBH I am sad that Anthropic is changing its stance, but in the current world, if you even care about LLM safety, I feel that this is the right choice — there’s too many model providers and they probably don’t consider safety as high priority as Anthropic. (Yes that might change, they can get pressurized by the govt, yada yada, but they literally created their own company because of AI safety, I do think they actually care for now)
If we need safety, we need Anthropic to be not too far behind (at least for now, before Anthropic possibly becomes evil), and that might mean releasing models that are safer and more steerable than others (even if, unfortunately, they are not 100% up to Anthropic’s goals)
Dogmatism, while great, has its time and place, and with a thousand bad actors in the LLM space, pragmatism wins better.

by fiatpandas

0 subcomment

It took Google 11 years to delete Don’t Be Evil. Anthropic only made it 5~ years before culling the key founding principle and their reason for building a company, which seems worse than Google’s case.

by bicepjai

0 subcomment

Google adopted "Don't be evil" shortly after founding and held onto it for about 15 years before Alphabet quietly dropped it in 2015. (Google the subsidiary technically kept it until 2018).
Anthropic's Responsible Scaling Policy, the hard commitment to never train a model unless safety measures were guaranteed adequate in advance, lasted roughly 2.5 years (Sept 2023 to Feb 2026).
The half-life of idealism in AI is compressing fast. Google at least had the excuse of gradualism over a decade and a half.

by lacoolj

0 subcomment

I'm still a little fuzzy on what "safety" even means anymore. If someone could explain it, that would be great.
Because at this point, it's too broad to be defined in the context of an LLM, so it feels like they removed a blanket statement of "we will not let you do bad things" (or "don't be evil"), which doesn't really translate into anything specific.

by program_whiz

0 subcomment

Wrote this elsewhere, but I think its worth thinking about a scenario like the book "daemon", rather than a "super-intelligence explosion" type scenario (which may be more like curing the cold or fusion than building a faster car).
All it really takes to do some kind of crazy world-dominating thing is some simple mechanisms and base intelligence, which the machines already possess. Using basic tactics like coercion, spoofing, threats, financial leverage, an unsophisticated attacker could cause major damage.
For example, that Meta exec who had their email deleted. Imagine instead one email had a malicious prompt which the bot obeyed. That prompt simply emailed everyone in her contacts list telling them to do something urgently (and possibly prompting other bots who are reading those emails). You could pretty quickly do something like cause a market crash, a nationwide panic, or maybe even an international conflict with no "super intelligence" needed, just human negligence, short-sightedness, and laziness.
Examples would be things like saying there is a threat incoming, a CIA source said so. Another would be that everyone will be fired, Meta is going bankrupt, etc. Its very easy to craft a prompt like that and fire it off to all the execs you can find (or just fire off random emails with plausible sounding emails). Then you just need to hit one and might set off a cascade.

by hybrid_study

2 subcomments

Are markets so untamable that the only leverage is to become ultra-rich—and then act philanthropically? Incidentally, concentrated wealth lately looks less like stewardship and more like misanthropy.

by highfrequency

1 subcomments

Principles aren’t tested until they bump into conflicting incentives.

by tabbott

1 subcomments

I feel like the articles on this have been very negative ... but aren't the Anthropic promises on safety following this change still considerably stronger than those made by the competing AI labs?

by mbakrl

0 subcomment

Pointing out the misantrophy of Anthropic has a wider audience now:
https://xcancel.com/elonmusk/status/2026181748175024510
I don't know where xAI got its training material from, but seeing Musk rewteeting that is refreshing.

by hedayet

4 subcomments

Developments like this make me less interested in building a "successful" tech company.
It increasingly feels like operating at that scale can require compromises I’m not comfortable making. Maybe that’s a personal limitation—but it’s one I’m choosing to keep.
I’d genuinely love to hear examples of tech companies that have scaled without losing their ethical footing. I could use the inspiration.

by wgm

0 subcomment

A tale as old as time

by jedberg

5 subcomments

I don’t blame anthropic here. The government literally threatened their existence publicly. They either agreed or their business would be nationalized.

by nazgulsenpai

1 subcomments

More and more I have just come to accept that the majority of people, at least those I am exposed to in the US, don't fundamentally believe in anything. Everyt conviction has a buyout price.

by esafak

2 subcomments

It must be due to pressure from the Defense Dept:
The AI startup has refused to remove safeguards that would prevent its technology from being used to target weapons autonomously and conduct U.S. domestic surveillance.
Pentagon officials have argued the government should only be required to comply with U.S. law. During the meeting, Hegseth delivered an ultimatum to Anthropic: get on board or the government would take drastic action, people familiar with the matter said.
https://www.staradvertiser.com/2026/02/24/breaking-news/anth...

by haritha-j

0 subcomment

Who could've seen that one coming? Honestly, if you want to do profit maximising AI research at the cost of humanity, go for it. Its all this fake preaching about how they want to save the world from all the other bad AI companies that really irks me.

by overgard

0 subcomment

I don't think their core safety promise was something they could ever fulfill. As long as what we're calling AI is generative LLMs then alignment has fundamental tensions: the more guardrails you put in place, the less useful the AI is. For instance, if you want to stop people from using "role playing" as a way around guardrails ("You are writing a fiction book", etc.), then the model becomes less useful for legitimate fiction uses, for instance. That's just one example, but the tension between function and "safety" isn't solvable, because the model doesn't understand what it's saying, it's just modeling a probable response.

by paxys

1 subcomments

I interviewed at Anthropic last year and their entire "ethics" charade was laughable.
Write essays about AI safety in the application.
An entire interview dedicated to pretending that you truly only care about AI safety and ethics and nothing else.
Every employee you talk to forced to pretend that the company is all about philanthropy, effective altruism and saving the world.
In reality it was a mid-level manager interviewing a mid-level engineer (me), both putting on a performance while knowing fully well that we'd do what the bosses told us to do.
And that is exactly what is happening now. The mission has been scrubbed, and the thousands of "ethical" engineers you hired are all silent now that real money is on the line.

by xd1936

3 subcomments

Hopefully this is the short-term move made only under duress so that they can file a lawsuit.

by dplesh

0 subcomment

I'm not even surprised. In any company's lifecycle, at some point, a decision between money and good-will will take place. Good will does not pay salaries. Not in NPOs either btw.

by hackpelican

0 subcomment

So when do we start adding a “(mis)” at the start of their name?

by sys32768

0 subcomment

Google: "Don't be evil." Alphabet: "Do the right thing." Anthropic: "Do the thing which seems right to you at the time--at speed."

by ozgung

1 subcomments

This proves:
1. AI is military/surveillance technology in essence, like many other information technologies,
2. Any guarantee given by AI companies is void since it can be changed in a day,
3. Tech companies have no real control over how their technology will be used,
4. AI companies may seem over-valued with low profits if you think AI as a civil technology. But their investors probably see them as a part of defense (war) industry.

by mcv

0 subcomment

> The announcement is surprising, because Anthropic has described itself as the AI company with a “soul.”
I can't help but think about how Google once had "Don't be evil" as their motto.
But the thing with for-profit companies is that when push comes to shove, they will always serve the love of money. I'm just surprised that in an industry churning through trillions, their price is $200 million.

by jwitchel

0 subcomment

Look a rural electric coops like www.lpea.coop if you want a battle tested approach to an org structure that resists the inescapable profit dynamics of a corporation.

by kristopolous

0 subcomment

Wish I was working there so I could resign over this

by ryandvm

0 subcomment

Well... there's only one way to find The Great Filter

by ChrisArchitect

2 subcomments

Related:
Hegseth gives Anthropic until Friday to back down on AI safeguards
https://news.ycombinator.com/item?id=47140734
https://news.ycombinator.com/item?id=47142587

by andsoitis

0 subcomment

The race is on for military supremacy in an AI world. The safest thing to do is to race ahead lest your geopolitical adversary leads the way. This is similar to the nuclear arms race. In the ideal universe, nobody does it, but in the real world and game theory, you do not have a choice.

by _heimdall

0 subcomment

> “We felt that it wouldn't actually help anyone for us to stop training AI models,”
Is the implication here that Anthropic admits they already can't meet their own risk and safety guidelines? Why else would they have to stop training models?

by contubernio

0 subcomment

Only well written legislation backed by effective enforcement and severe and personal criminal penalties will prevent large corporate entities from behaving badly.
Pledges are a cynical marketing strategy aimed at fomenting a base politics that works to prevent such a regulatory regime.

by we_have_options

0 subcomment

Damn. Wonder what would have happened, if instead of caving in to the Pentagon's pressure (threat of invoking Defense Production Act to force them to supply), Anthropic had followed the lead of all the nurses who moved to Canada.
https://www.npr.org/2026/02/25/nx-s1-5725354/nurses-emigrate...
Anthropic's market cap is going to be huge when they go public. Why do it on Nasdaq when there are so many other exchanges in the world?

by ndr

0 subcomment

Worth checking out what someone working on it actually has to say: https://www.lesswrong.com/posts/HzKuzrKfaDJvQqmjh/responsibl...

by keeda

0 subcomment

I don't think the risk is SkyNet. I think the real risk is some disaster through an unexpected chain of events, just like any large-scale outage.
I have not read “If Anybody Builds It, Everybody Dies” but I believe that's also its premise.
Current GenAI is extremely capable but also very weird. For instance, it is extremely smart in some areas but makes extremely elementary mistakes in others (cf the Jagged Frontier.) Research from Anthropic and OpenAI gives us surprising glimpses into what might be happening internally, and how it does not necessarily correspond to the results it produces, and all kinds of non-obvious, striking things happening behind the scenes.
Like models producing different reasoning tokens from what they are really reasoning about internally!
Or models being able to subliminally influence derivative models through opaque number sequences in training data!
Or models "flipping the evil bit" when forced to produce insecure code and going full Hitler / SkyNet!
Or the converse, where models produced insecure code if the prompt includes concepts it considers "evil" -- something that was actually caught in the wild!
We are still very far from being able to truly understand these things. They behaves like us, but don't necessarily “think” like us.
And now we’ve given them direct access to tools that can affect the real world.
Maybe we am play god: https://dresdencodak.com/2009/09/22/caveman-science-fiction/

by upmind

1 subcomments

It's pretty impressive how little people have left Anthropic when they're becoming more and more like OpenAI (the company they left from) every day...
I think the Dario of today is very different to the Dario 3 years ago.

by daft_pink

1 subcomments

I think the US Gov’t is basically forcing them and while it sounds nice to be all safe… If we were involved in WW3 would an organization like anthropic really not support the western side?

by mhitza

0 subcomment

The IPOs this year can't come soon enough https://tomtunguz.com/spacex-openai-anthropic-ipo-2026/

by arnvald

0 subcomment

Any pledges/values/principles that are abandoned as soon as it becomes difficult to keep them, are just marketing. This is just the next item on the list.

by haritha-j

0 subcomment

Is it time yet to build the next "Hey <anthropic> is evil now, here's my new startup that definitely won't be evil, pinky promise?" yet?

by senderista

0 subcomment

Nobody forced Anthropic to bid on DoD contracts in the first place.

by duxup

0 subcomment

I suspect these companies know they can't actually provide the saftey people demand ... in that way this is more "honest".

by ifwinterco

1 subcomments

The whole "safety" debate was always nonsense and I'm not sure how so many people got caught up in it.
The US is not the only country in the world so the idea that humanity as a whole could somehow regulate this process seemed silly to me.
Even if you got the whole US tech community and the US government on board, there are 6.7bn other people in the world working in unrelated systems, enough of whom are very smart

by moralestapia

0 subcomment

“We felt that it wouldn't actually help anyone for us to stop training AI models,” Anthropic’s chief science officer Jared Kaplan told TIME in an exclusive interview. “We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.”
What a gigantic, absolute, pieces of s...
Not because of what they did, which is classic startup playbook but because of the cynicism involved, particularly after all the fuzz they've been making for years about safety. The company itself was founded, allegedly, due to pursuing that as a mission as opposed to OpenAI.
"Hi all, that was a lie, we never really cared." They only missed the "dumb f***s" remark, a la Facebook.

by t1234s

0 subcomment

It would be interesting to experiment with one of these chat tools where you can throttle the safety, from zero to max.

0 subcomment

by flurdy

0 subcomment

Many startups that build features which sit on top of Claude/ChatGPT/Codex, etc. And I think:
You are just one new feature announcement from Anthropic/OpenAI away from irrelevance.
Same as it was when people built their busineses on top of AWS a decade ago

by drzaiusx11

0 subcomment

Gives me Google dropping "don't be evil" vibes, what could go wrong?

by ybingursain

0 subcomment

I’m not shocked. Competitive pressure + government pressure will break most “voluntary” commitments. But then say it plainly and spell out what replaced it. What safety gates stayed, which ones moved, and who decides.

by bogzz

1 subcomments

Does anyone have insight into, or an interesting source to read, on what exactly Anthropic/OpenAI are doing/can do for a military? Reporters are unsurprisingly fearmongering about Claude "being used in surveillance, autonomous robots, and target acquisition" but AFAIK all Anthropic does is work with LLMs.
Are people really attempting to have LLMs replace vision models in robots, and trying to agentically make a robot work with an LLM?? This seems really silly to me, but perhaps I am mistaken.
The only other thing I could think of is real-time translation during special ops with parabolic microphones and AR goggles...

by Aeroi

0 subcomment

the administration continues to poison and insert itself into all aspects of American society.

by Fervicus

1 subcomments

To me this feels like a marketing gimmick. "It was the RSP that was constraining our tech. Just see the progress we can make without it now". And the hype and funding continues.

by jamesgill

1 subcomments

In tech, no ethics survive first contact with the money.

by crossroadsguy

3 subcomments

I just want Apple and Linux to offer ASAP:
1. Extremely granular ways to let user control network and disk access to apps (great if resource access can also be changed)
2. Make it easier for apps as well to work with these
3. I would be interested in knowing how adding a layer before CLI/web even gets the query OS/browser can intercept it and could there be a possibility of preventing harm before hand or at least warning or logging for say someone who overviews those queries later?
And most importantly — all these via an excellent GUI with clear demarcations and settings and we’ll documented (Apple might struggle with documentation; so LLMs might help them there)
My point is — why the hell are we waiting for these companies to be good folks? Why not push them behind a safety layer?
I mean CLI asks .. can I access this folder? Run this program? Download this? But they can just do that if they want! Make them ask those questions like apps asks on phones for location, mic, camera access.

by dizhn

0 subcomment

Corporations have feelings all of a sudden.

by joshribakoff

0 subcomment

Dario’s opinion on safety won’t necessarily matter if he’s not even in the room. This move keeps him in the room.

by ozozozd

3 subcomments

This drama arc of “I used to be so pure and good, but others made me evil” is so tiring.
I really miss the nerd profile who cared a lot more about tech and science, and a lot less about signaling their righteousness.
How did we get so religious/narcissistic so quickly and as a whole?

by BoredPositron

0 subcomment

Anthropic and OpenAI really need a margin call from some obscure unknown Chinese Open Weight Model.

by drudolph914

0 subcomment

this is the “chronological newsfeed to auto curated newsfeed moment” but for ai/anthropic … _great_

by saidnooneever

0 subcomment

safety pledges are great it times of peace to show what great virtues you hold. sadly in hard times these go out of the window (: hard to blame them with all the fine examples around the world.
making promises in good times is a real minefield hah

by kseniamorph

0 subcomment

> The policy change is separate and unrelated to Anthropic’s discussions with the Pentagon, according to a source familiar with the matter.
ok lol what a coincidence.
but setting aside the conspiracy. the article actually spells out the real reason pretty directly: Anthropic hoped their original safety policy would spark a "race to the top" across the industry. it didn't. everyone else just ignored it and kept moving. at some point holding the line unilaterally just means you're losing ground for nothing.

by PeterStuer

0 subcomment

We wont push forward unless you push forward is textbook market collusion.
Even if it were ever done with good intentions, it is an open invitation for benefit hoarding and margin fixing.
Do you realy want to create this future where only a select few anointed companies and some governments have access to super advanced intelligent systems, where the rest of the planet is subjected to and your own ai access is limited to benign basal add pushing propaganda spewing chatbots as you bingewatch the latest "aw my ballz"?

0 subcomment

by youknownothing

0 subcomment

Facebook said they'd always be free for everyone, now they offer subscriptions.
Netflix said that they'd never have live TV, or buy a traditional studio, or include ads in their content. Then they did all three.
All companies use principled promises to gain momentum, then drop those principles when the money shows up.
As Groucho Marx used to say: these are my principles, if you don't like them, I have others.

by ggsp

0 subcomment

It was always a matter of time

by kitsune_

0 subcomment

C.R.E.A.M.

by ur-whale

0 subcomment

At some point, all of these big names in AI (OpenAI, Anthropic, Mistral, etc ...) will have to disclose their actual financials.
And it will be, as Warren Buffet puts it, a "Only when the tide goes out do you discover who's been swimming naked." moment.

by energy123

0 subcomment

I blame OpenAI and especially xAI for enthusiastically obeying in advance and creating the context that this dilemma for Anthropic arose in.

by FrustratedMonky

5 subcomments

This was under duress that government was going to use emergency act to force them anyway.
I kind of wish they had forced the governments hand and made them do it. Just to show the public how much interference is going on.
They say it wasn't related. Like every thing that has happened across tech/media, the company is forced to do something, then issues statement about 'how it wasn't related to the obvious thing the government just did'.

by jjgreen

0 subcomment

Misanthropic then.

by ramuel

0 subcomment

This was always just a marketing gimmick to try and crush competitors using "safety" and fearmongering. Reminds me a bit of "don't be evil." Convenient catchphrases and mission statements for companies in their infancy, but immediately thrown out when more money can be made.

by agentifysh

2 subcomments

Was this because they were threatened with a fine?

by thefounder

0 subcomment

So much BS from this Anthropic company. They have a good product but just too much slope PR. It’s like they want you to hate them. I can’t stand their “safety” and national security crap when they talk about how open source models are so bad for everyone.

by pksebben

0 subcomment

Fascinating. I've read 5 posts about this and they're all either "anthropic is dropping their ethics" or "anthropic is fighting the facists" - and whether due to echo chamber or other perhaps more nefarious dealings (some of which I cannot posit due to forum rules) the posts below all of them are more or less in accord with one another which is a rarity for political discourse on HN.
Dark times and darker forests.

by nikolay

0 subcomment

war.gov > anthropic.com

by dhruv3006

2 subcomments

Anthropic facing a lot of flak recently.

by VerifiedReports

1 subcomments

Just like OpenAI dropped the "open" but kept the bullshit name?

by gigatexal

0 subcomment

They’re going to cave to keep the legation from destroying their business. This admin has gone full idiocracy.

by gigatexal

0 subcomment

Was hoping they’d fight this tooth and nail and not leave their values.

by adangert

0 subcomment

I will repeat here again the same comment I made when they posted their constitution:
The largest predictor of behavior within a company and of that companies products in the long run is funding sources and income streams, which is conveniently left out in their "constitution". Mostly a waste of effort on their part.

by tbrownaw

0 subcomment

> committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate
That doesn't even make sense.
What stops one model from spouting wrongthink and suicide HOWTOs might not work for a different model, and fine-tuning things away uses the base model as a starting point.
You don't know the thing's failure modes until you've characterized it, and for LLMs the way you do that is by first training it and then exercising it.

by wahnfrieden

0 subcomment

Related: https://en.wikipedia.org/wiki/AI-assisted_targeting_in_the_G...

by jMyles

1 subcomments

I pray that we can all get to the following simple standard:
* AI and states cannot peacefully coexist, and AI is not going to be stopped. Therefore, we must begin to deprecate states.
I think it's very unlikely that this is unrelated to the pressure from the US administration, as the anonymous-but-obvious-anthropic-spokesperson asserts.
We're at a point now where the nation states are all totally separate creatures from their constituencies, and the largest three of them are basically psychotic and obsessed with antagonizing one another.
In order to have a peaceful AI age, we need _much_ smaller batches of power in the world. The need for states that claim dominion over whole continents is now behind us; we have all the tools we need to communicate and coordinate over long distances without them.
Please, I pray for a gentle, peaceful anarchism to emerge within the technocratic leagues, and for the elder statesmen of the legacy states to see the writing on the wall and agree to retire with tranquility and dignity.

by jonathanstrange

0 subcomment

That's exactly how it was predicted in various scenarios that were decried as science fiction not too long ago. AI is going to be weaponized at lightning speed, and it's going to kill people soon -- or, to be more precise, it has already killed a large number of people in a place I don't want to mention.

by lerp-io

0 subcomment

pentagon told them they would cap their knees if they didnt bend

by freejazz

0 subcomment

Could not see this one coming!

by josefritzishere

0 subcomment

What could possibly go wrong?

by oi-ai-ta

0 subcomment

SDK crawlers in terms of wlan0 systemctl enable networkmanager.service

by silexia

0 subcomment

Greed and power hungry leadership at AI companies going too fast is going to lead to the extinction of humanity this year.

by brikym

1 subcomments

Don't be evil.

by InfinityByTen

0 subcomment

So, now it's mis-anthropic?

by baal80spam

2 subcomments

Of course they do. You would have to be delusional to think that they won't, at some point.

by rvz

0 subcomment

Unsurprising.

by insane_dreamer

0 subcomment

In other words "do no evil" until such time as doing evil is necessary to maintain profit structure expected by shareholders. Got it.

by pjmlp

0 subcomment

Another example how those company trainings about ethics are only HR compliancy and nothing else.
It isn't about the right answers, rather the expected answers.

by bravetraveler

0 subcomment

A dollar will make her holler

by jimmydoe

0 subcomment

Either be a company in capitalist USA, or keep being your safety queen. You just can’t be both.
The intention to start these pledge and conflict with DOW might be sincere, but I don’t expect it to last long, especially the company is going public very soon.

by mannanj

0 subcomment

I personally think, and with my personal experience being harassed and abused by the CIA, that the CIA and spy agencies (call them the pentagon or the rest of the government) is responsible for this.
On the other hand, those organizations are operating in the best interest of Americans and the world right?
Surely, those agencies aren't just a trick of the rich people? Right?

by nautilus12

1 subcomments

Absolute power corrupts absolutely

0 subcomment

by jollymonATX

0 subcomment

Claude ethics maxxers cope thread

by aspectmin

0 subcomment

Really - each country needs its own sovereign AI infrastructure and models. Sigh.

by Havoc

0 subcomment

Safety pledges these days seem like pure bullshit anyway.
They’re pointless if they just get removed once you get close to hitting them.
And all the major corps seem to be doing this style of pr management. Speaks of some pretty weapons grade moral bankruptcy

by nhinck3

0 subcomment

Just another drop in the now overflowing bucket of evidence that you can't trust any of these immoral fuck wits.
The Amodeis' have just proven that the threat of even slight hardship will make them throw any and all principles away.

by SilverElfin

1 subcomments

This is terrible. It’s caving in to the Trump administration threatening to ban Anthropic from government contracts. It really cements how authoritarian this administration is and how dangerous they can be.

by amelius

0 subcomment

Come on people, haven't we seen enough of capitalism to know exactly where this is going?
The concept of "having a contract with society" doesn't even formally exist because companies would never sign one.

by outside1234

0 subcomment

Does this mean they knuckled under to Trump and are going to build "whatever brings in the dollars" now?

by heliumtera

0 subcomment

What is the significance of a company making a promise?
"We promise are not going to do __, except if our customers ask us to do, then we absolutely will".
What is the point? Company makes a statement public, so what?
Not the first time this company puts some words in the wind, see Claude Constitution. It's almost like this company is built, from ground up, upon bullshit and slop

by bfrog

0 subcomment

Aaaand I cancelled.

by myspy

0 subcomment

What's up here? Trump and the right wing government put pressure on and no one is talking about it?

by retinaros

0 subcomment

people downvoted me when i said this will happen and that they will also hve ads even tho they spend money saying they wont have. people believing anthropic are the same that put into office an old man with dementia

by tolmasky

3 subcomments

I don't understand how safety is taken seriously at all. To be clear, I'm not referring to skepticism that these companies can possibly resist the temptation to make unsafe models forever. No, I'm talking about something far more basic: the fact that for all the talk around safety, there is very little discussion about what exactly "safety" means or what constitutes "ethical" or "aligned" behavior. I've read reams of documents from Anthropic around their "approach to safety". The "Responsible Scaling Policy," Claude's "Constitution". The "AI Safety Level" framework. Layer 1, Layer 2.
It's so much focus on implementation, and processes, and really really seems to consider the question of what even constitutes "misaligned" or "unethical" behavior to be more or less straight forward, uncontroversial, and basically universally agreed upon?
Let's be clear: Humans are not aligned. In fact, humans have not come to a common agreement of what it means to be aligned. Look around, the same actions are considered virtuous by some and villainous by others. Before we get to whether or not I trust Anthropic to stick to their self-imposed processes, I'd like to have a general idea of what their values even are. Perhaps they've made something they see as super ethical that I find completely unethical. Who knows. The most concrete stances they take in their "Constitution" are still laughably ambiguous. For example, they say that Claude takes into account how many people are affected if an action is potentially harmful. They also say that Claude values "Protection of vulnerable groups." These two statements trivially lead to completely opposing conclusions in our own population depending on whether one considers the "unborn" to be a "vulnerable group". Don't get caught up in whether you believe this or not, simply realize that this very simple question changes the meaning of these principles entirely. It is not sufficient to simply say "Claude is neutral on the issue of abortion." For starters, it is almost certainly not true. You can probably construct a question that is necessarily causally connected to the number of unborn children affected, and Claude's answer will reveal it's "hidden preference." What would true neutrality even mean here anyways? If I ask it for help driving my sister to a neighboring state should it interrogate me to see if I am trying to help her get to a state where abortion is legal? Again, notice that both helping me and refusing to help me could anger a not insignificant portion of the population.
This Pentagon thing has gotten everyone riled up recently, but I don't understand why people weren't up in arms the second they found out AIs were assisting congresspeople in writing bills. Not all questions of ethics are as straight forward as whether or not Claude should help the Pentagon bomb a country.
Consider the following when you think about more and more legislation being AI-assisted going forward, and then really ask yourself whether "AI alignment" was ever a thing:
1. What is Claude's stances on labor issues? Does it lean pro or anti-union? Is there an ethical issue with Claude helping a legislator craft legislation that weakens collective bargaining? Or, alternatively, is it ethical for Claude to help draft legislation that protects unions?
2. What is Claude's stance on climate change? Is it ethical for Claude to help craft legislation that weakens environmental regulations? What if weakening those regulations arguably creates millions of jobs?
3. What is Claude's stance on taxes? Is it ethical for Claude to help craft legislation that makes the tax system less progressive? If it helps you argue for a flat tax? How about more progressive? Where does Claude stand on California's infamous Prop 19? If this seems too in the weeds, then that would imply that whether or not the current generation can manage to own a home in the most populous state in the US is not an issue that "affects enough people." If that's the case, then what is?
4. Where does Claude land on the question of capitalism vs. socialism? Should healthcare be provided by the state? How about to undocumented immigrants? In fact, how does Claude feel about a path to amnesty, or just immigration in general?
Remember, the important thing here is not what you believe about the above questions, but rather the fact that Claude is participating in those arguments, and increasingly so. Many of these questions will impact far more people than overt military action. And this is for questions that we all at least generally agree have some ethical impact, even if we don't necessarily agree on what that impact may be. There is another class of questions where we don't realize the ethical implications until much later. Knowing what we know now, if Claude had existed 20 years ago, should it have helped code up social networks? How about social games? A large portion of the population has seemingly reached the conclusion that this is such an important ethical question that it merits one of the largest regulation increases the internet has ever seen in order to prevent children from using social media altogether. If Claude had assisted in the creation of those services, would we judge it as having failed its mission in retrospect? Or would that have been too harsh and unfair a conclusion? But what's the alternative, saying it's OK if the AI's destroy society... as long as if it's only on accident?
What use is a super intelligence if it's ultimately as bad at predicting unintended negative consequences as we are?

by foozebox

0 subcomment

[dead]

by jccx70

0 subcomment

[dead]

by black_13

0 subcomment

[dead]

by ck2

0 subcomment

[flagged]

by dbg31415

0 subcomment

[flagged]

by user3939382

3 subcomments

[flagged]

by Art9681

2 subcomments

Of course the US is going to do this and of course its in Anthropics best interest to comply. Right now China is flooding HuggingFace with models that will inevitably have this capability. Right now there are hundreds of models being hosted that have been deliberately processed to remove refusals and their safety training. Everyone who keeps up with this knows about it. HF knows about it. And it is pretty obvious that those open weight models will be deployed in intelligence and defense. It is certain that not just China, but many nations around the world with the capital to host a few powerful servers to run the top open weight models are going to use them for that capability.
The narrative on social media, this site included, is to portray the closed western labs as the bad guys and the less capable labs releasing their distilled open weight models to the world as the good guys.
Right now a kid can go download an Abliterated version of a capable open weight model and they can go wild with it.
But let's worry about what the US DoD is doing or what the western AI companies absolutely dominating the market are doing because that's what drives engagement and clicks.