Anthropic drops flagship safety pledge

40 points by unkz 7 hours ago on tildes | 19 comments

Shades of Google dropping “don’t be evil” (yeah, yeah, now it’s do the right thing in a different document).

Anthropic, the wildly successful AI company that has cast itself as the most safety-conscious of the top research labs, is dropping the central pledge of its flagship safety policy, company officials tell TIME.

In 2023, Anthropic committed to never train an AI system unless it could guarantee in advance that the company’s safety measures were adequate. For years, its leaders touted that promise—the central pillar of their Responsible Scaling Policy (RSP)—as evidence that they are a responsible company that would withstand market incentives to rush to develop a potentially dangerous technology.

: Aerrol | 5 hours ago

Cool everything is accelerated in this era, even the speed at which we lose 'don't be evil'. I hate this timeline.

moocow1452 | 7 hours ago

This wouldn't have anything to do with Anthropic now working on AI models for the Pentagon that by necessity can not be non-harmful, and therefore infringe on some element of safety for somebody, no?

PelagiusSeptim | 5 hours ago

Since they were just given an ultimatum by the Pentagon, I can't see how this wouldn't be connected.

[OP] unkz | 5 hours ago

I don’t think it’s that kind of safety they are taking about. This is more like, superintelligence risk.

moocow1452 | 5 hours ago

I don't think it matters that much. A gun is an inherently dangerous object, a military is similarly so, so if your AI is working plans with a military, somebody's risk is already a negotiable.

More to your point about superintelligence, XKCD made a comic where the author is more worried about what certain people would be empowered to do with an autonomous fleet of kill drones that will follow their orders than if the drones decided to wipe out humanity or maximize paperclips without orders. There's a much richer history of malice on one side of that equation than the other.

: [OP] unkz | 5 hours ago

Yeah, I just don’t think they ever had a policy of not training bots with the raw intelligence to commit atrocities with human assistance.

TonesTones | 5 hours ago

Pete Hesgeth recently threatened to cut Anthropic from current and future DOD contracts unless they drop some of their safety measures. This is likely part of their response to that pressure.

Defense officials warned they could designate Anthropic a supply chain risk or use the Defense Production Act to essentially give the military more authority to use its products even if it doesn’t approve of how they are used.

Money talks.

Eric_the_Cerise | 4 hours ago

The point I'm surprised ... no, not surprised, I guess, just--somehow--even more disappointed over...

Snowden was less than 15 years ago.

Today, the Pentagon is threatening to blacklist Anthropic, explicitly, for not giving them full use of their AI, for A) fully autonomous, AI powered targeting & strike capabilities, and 2) unrestricted, fully autonomous AI powered mass surveillance of US citizens.

This is not a whistleblower thing, it's not a reporter "scoop", nothing.

The US Pentagon is flat-out openly stating that it will destroy an AI company if it can't use the AI for mass spying on the US public.

(Oh yeah ... and killing people w/o human oversight)

[OP] unkz | 5 hours ago

Good catch, that probably is a major factor.

cdb | 2 hours ago

Based on the part you quoted, I'm not sure how the conclusion is "money talks." Isn't this more like "do what we say or we'll ignore your policies, regardless of previously negotiated contracts?"

It seems that based on the Axios article, there were two possible courses of action by the DoD: cut off contracts, or force Anthropic to contract with the DoD on their terms. Even if it's about that contract money, they couldn't avoid the second threat unless they shut down operations.

TonesTones | 2 hours ago

I did not even consider that interpretation. I know that a $200M contract with the DOD is active, and I assumed that my quoted portion was in the context of those contracts.

I assumed that Anthropic pulling out of the contracts was an option and one they would not take. It seems bizarre that the DOD could force Anthropic to contract; that type of adversarial relationship with a contractor would be a national security threat in my eyes. I think both sides need to operate in good faith for mutual work to be beneficial.

It’s certainly possible that there was an implied threat of seizing or imposing strong national controls on Anthropic’s business if they did not meet the terms. Frankly, that would be insane, but this administration has given me plently of reason to believe something like that.

: cdb | an hour ago

From the Axios article linked elsewhere in this thread:

How it works: The Defense Production Act gives the president the authority to compel private companies to accept and prioritize particular contracts as required for national defense

The law is rarely used in such a blatantly adversarial way. The idea, the senior Defense official said, would be to force Anthropic to adapt its model to the Pentagon's needs, without any safeguards

The Pentagon is also considering severing its contract with Anthropic and declaring the company a supply chain risk, which would require a plethora of other companies that work with the Pentagon to certify that Claude isn't used in their workflows.

I'm not a big fan of the government making threats like this either. I don't know if it's implied or not, but the article makes it seem like it is two clear threats, which are both very serious. Either be compelled to do the DoD's bidding, or get cut off not only from DoD contracts, but from contracts with any company that works with the DoD.

SloMoMonday | 3 hours ago

Underlying reasons aside, I really don't trust any of the AI companies as far as I can throw a data center and I was already sceptical on their ideas of AI safety. I don't consider much of their "research" to be anything more than AI fan fiction and have already had my rants on thier papers about AI introspection and AI blackmail.

I keep seeing pieces about how these companies "can't turn off their AIs" or how "they don't even understand how it works" and even how their LLMs are in the top percentiles of Maths Olympiads and Coders. I literally typed "What is 784×413 698×225 786×2÷15" into Google and the first AI generated answer was:

python
result = 784 * 413 * 698 * 225 * 786 * 2 / 15
print(f"{result=}").
result=5329240721280.0

My crappy desk calculator shows 9 764 167 711 513,6. Do I trust the cheap legacy hardware that gives my the same answer over and over and goes into error if I use it incorrectly. Or the multi billion dollar AI that gives me 5,329,240,721,280 the second time and then this whole mess even though the correct answer shows up in the "thinking" as:

res1 = 784 * 413 * 698 * 225 * 786 * 2 / 15
res2 = 784 * 413698 * 225786 * 2 / 15
print(f"{res1=}")
print(f"{res2=}")
Individual parts
part1 = 784 * 413
part2 = 698 * 225
part3 = 786 * 2 / 15
print(f"{part1=}, {part2=}, {part3=}")
res1=5329240721280.0
res2=9764167711513.6
part1=323792, part2=157050, part3=104.8

You could argue that I should prompt better or that it's up to me to verify the outputs. But this is an all powerful genius level everything machine that has already cost tens of thousands of people their incomes. It should be able to do basic maths.

What does this have to do with AI Safety?
Everything really.

Because I don't think the AI apocalypse will happen because of some major military operation where they turn off the AI safeguards and the machines go crazy. I think it'll happen because the machines are going to misinterpret some critical semantic point and give the wrong person the wrong information at the worst possible time and they make a bad decision with it.

If you can't interrupt a system while it's on bad rationalization pathways, then you have a bad system. If you don't understand how the system is reasoning or reaching outcomes, then you probably have no handle on the inputs and training data. If your system needs audits to identify errors in extended outputs, then maybe it should not be in live environments.

stu2b50 | 3 hours ago

The math question is mainly one of syntax. It interpreted “413 698” as 413 * 698, which is fair enough. Using spaces instead of commas in arithematic syntax is highly irregular. Generally when you have two numerals next to each, multiplication is assumed. Eg, you parse 5x as 5 * x not “fifty x”.

: SloMoMonday | 26 minutes ago

It is not a matter of syntax. It's semantics.

If I get the syntax wrong with a programatic system, it throws an error. More likely, a system will have methods to normalize ambiguous data points. Theres a reason why we don't have to group our numbers like this in data entry.

If I make a similar error with a Language Model, it still throws an answer. And I had to reveal the reasoning layer to audit how we got to that answer.

I don't care if a system CAN correctly answer questions. It MUST correctly answer questions. A syntax or user input error has to throw an error and if it can't do that, why is this tech in on the market.

The equation I asked was directly copied from the Samsung calculator app and this was the formatting of the numbers. There are instances where digits are grouped with commas and dots at decimal. Also instances with space groups and commas for decimals. One of my old clients was on the older side and had pipe symbols for groupings and kept the decimal in the next cell.

All of those differences are semantic because it's just the "rendering" of the data. In the background, the data is universally represented. Regardless of the viewers preferred syntax, the meaning should be consistent.

LLMs do not have semantic reasoning and at the same time, does not normalize data points to a standard format. If I'm having an LLM process massive data sets and there are semantic differences between comparative data points, what happenes.

This is not simply a user error situation. It is a strategic vulnerability. Natural language is high noise and high loss and there is massive token overlap across domains. At the same time, models do not have logical or reasoning systems. Its just a large scale rationalization engine that can be subverted with malicious prompt chains and biased training.

[OP] unkz | 2 hours ago

I literally typed "What is 784×413 698×225 786×2÷15" into Google and the first AI generated answer was:

Kind of apples and oranges isn’t it? That’s Google’s quick and dirty LLM designed to skim web results. If you ask a frontier model a question you will have a radically different experience. Also, you gave it a pretty ambiguous question.

GPT 5.2 Thinking said:

Eg:

Interpreting your expression as:

784 \times 413698 \times 225786 \times 2 \div 15

Result (exact):

\frac{48,820,838,557,568}{5}

Decimal:

9,764,167,711,513.6

cdb | 2 hours ago

I think the issue with type of thinking is trying to apply a human model of cognition to a machine. If we're talking about humans, it makes sense to say that if a human can't even do tasks using elementary school level thinking, they are probably not good at tasks requiring more complex thinking. AI models are different, though. They might have some trouble with arithmetic or counting the number of R's in the word "strawberry," but they can do better than most humans at some more complex tasks, requiring either wide-ranging or esoteric knowledge.

For instance, this week I asked copilot to put together a study guide to understand the technical and contextual aspects of a specific meeting discussing research based on the accumulated slides and meeting minutes from the past two years. This was to aid onboarding and to help experienced people address gaps in knowledge. This would be an impossible (or very very long) task for someone not knowledgeable about this type of research and a labor-intensive task for an experienced person, but it spit out a really good outline in about a minute. I don't know if it's perfect, but it's really good (I'm the experienced one trying to help a new hire). I haven't used the guide a bunch yet, but asking it to explain the first point also resulted in a good and useful answer. It's possible that it missed a few items here or there, but if you understand 80% of everything happening in this meeting, that's probably higher than average. Point is, there are a lot of questions that humans aren't great at answering that AI models can do as good or better at.

: EgoEimi | an hour ago

To add, adult innumeracy is quite prevalent. In the US, according to the NCES, 28% of adults possess level 1 numeracy or less, meaning they can't do multi-step arithmetic like the earlier mentioned calculation. If performing arithmetic were necessary for sound moral reasoning, then we should simply incarcerate that section of the adult population for the health and safety of everyone else. But I'm sure the vast majority of them are morally functional.

Indeed, cognition is multifaceted, and while humans are superior to machines in certain facets, it's very clear that machines are vastly superior in other facets.