A GitHub Issue Title Compromised 4,000 Developer Machines

42 points by xvello a day ago on lobsters | 16 comments

duncan_bayne | a day ago

It's simply not possible to secure agentic AI against this sort of thing. I wonder if any of the AI boosters have read Goedel, Escher, Bach.

valenterry | 22 hours ago

AI agents simply need to be treated like very naive junior developers with dementia - at most.

I think this is actually great. It will finally make the use of sandboxing and clear boundaries popular and widespread. Just like how apps on Android/iOS are separated and constrained, the same must now happen with programs and even libraries inside programs.

Deno is a good example in how it allows to control and Sandbox applications. Now we need to go even further and constrain even libraries or even functions inside an (already sandboxed) app.

We should have had that 10 years ago, but now we'll finally be forced to develop these concepts and implementations rather quickly.

duncan_bayne | 19 hours ago

I can't imagine how that would work, in practice. No snark, I mean that literally.

What would the UX look like, here? What's the granularity? When is the user prompted to allow / deny an action? How does this play with things like pipes and shells? What about a "God app" like Emacs that reads my mail, runs my shells, plans my day, ...?

To be clear this isn't me saying "it'd never work", this is me saying "I hate living in iOS and Android kindergarten-land and am afraid this might break my preferred ways of interacting with computers".

Edited to add: my current approach, worked out with colleagues, is to put Claude in a Docker container, and the services it is working on, DB, etc. are all in separate containers. Only nonprod, read only, tokens are available in the Claude container. The source lives on the host machine but is mounted into the container.

Thus Claude can't even push commits on its own, and that is exactly how I like it. It's a coding assistant (and a useful one too, when thoughtfully used).

valenterry | 13 hours ago

I think there are two different contexts here. One is the user executing something. The problem is "almost" solved for this on ios/android. Even deno already asks for permissions if e.g. a non-whitelisted website is to be accessed by the software.

The other thing is the developer view. When I add a new library to an application, I usually now ahead of time what that library will do. For example:

My database client will make connections to the database and send and receive data. It will neither read nor write to the filesystem
My library to analyze image content will read the filesystem and not write to it nor need network access.
The library I use to parse HTML will need any permissions whatsoever. It receives some data as input and spits some other data out.

And so on. Most libraries don't need permissions. So Why should my HTML-parsing library suddenly be able to read my environment variables and exfiltrate info? Or write things into my filesystem?

As a dev I want that the libraries I use can't do that, unless I explicity allow them (in whatever dependency manager I use). Or even further: finegrained on the function level.

What about a "God app" like Emacs that reads my mail, runs my shells, plans my day, ...?

I would say it can do some very limited things: 1.) use imap against a predefined domain, 2.) execute predefined shell scripts 3.) write into a specific part of your filesystem (different from the scripts).

Seems reasonable to me. But even if you want to give more permissions - emacs is a battletested tool, used by many. But not all tools are like that, and I still want to be able to use them without worry.

k749gtnc9l3w | 21 hours ago

It would still be interesting, though, to see how far securing would go if the large labs at least tried to care, that is, if there were LLMs with fully separate control and data streams.

But that would probably mean a single-digit constant factor increase in both training and inference cost for the same capabilities, so, apparently not happening.

Corbin | 11 hours ago

Please crack open your copy of GEB; the relevant chapter is Chapter 6, The Location of Meaning. The central difficulty is that a sufficiently-intelligent model must decode patterns in the data. To see this with a local model, play around with code generation starting from prompts with various nested levels of quotation and note how the model generates different sorts of strings for different quotation contexts. The issue isn't that the data can contain embedded control messages; if it were that simple then we could merely attach a parser and grammar to our harness so that the model never generates non-data tokens. Rather, the issue is that the data can contain (bogus) help-I'm-in-a-factory pleas or pseudo-logical appeals to some (bogus) superior context like a simulation hypothesis.

The flip side of this is that a general-purpose model, with a parser and grammar in its harness, can perform fairly well when not allowed to generate control tokens. The problem is that people keep designing tools which insist that the model be allowed to make control decisions!

: k749gtnc9l3w | 9 hours ago
Invocations of superior context (as well as I-am-inside-the-factory) is yet another example of mixing meta-levels, and single stream and the models being literally trained on mixed-meta-layers instruction-following examples are not helping.

Handling nested quotations and deciding based on their level is more probably good than bad; the problem is exactly that nested-structure tracking is not done enough. But it is hard to do enough! Actually, human-level performance is not enough; it somewhat helps that humans get annoyed and uncooperative relatively easily.

In terms of local experiments, I think a pretty accessible example of structure-tracking breakdown comes from models like Z-Image Turbo — which follows descriptions of relative placement of objects in a collage quite well, until overwhelmed by complexity.

My estimation that one could advance noticeably but not impactfully by explicitly fine-tuning models to only treat things in tool-related tags as tool-related (and other similar meta-level consistency checks); and impactfully, but not to a solved-problem level, and with new limits to the claimable abilities — with a project to design train an out-of-band-instructions model.

duncan_bayne | 19 hours ago

there were LLMs with fully separate control and data streams

Is that even possible? My understanding is that it's not, with LLMs.

: k749gtnc9l3w | 18 hours ago
For what value of possible?

On the one hand, depends on how agent is defined. Of course, control/data separation means that an agent can't figure out a workflow by experimenting and following the tool warnings, if there is not enough trusted documentation available. Which is a feature, in many contexts — like the one in the article!

On the other hand, how close to the current architecture. Reasonably close should be possible.

The exact current architectures seem to be single-stream-to-single-stream; and if we only have API access, it is of course impossible for us to have stream separation.

Adapting some mixture-of-experts architecture so that there are experts only seeing «trusted» input and some only seeing «untrusted data» input, and then fine-tuning so that untrusted instructions are not followed — this probably takes some months of work for a moderately size team to generally work out, then one probably needs to actually train that model with all the corresponding optimisations to figure out, possibly from scratch, probably using existing instruction-following models to generate synthetic data. It would be strange if it wasn't somewhat more resilient to the attacks.

Figuring out a proper adaptation of mostly-transformers architecture to the stream separation should be more work, but maybe comparable to the initial figuring out of mixture-of-experts? So, I believe it to be feasible, but with from-scratch and more expensive training for the same performance quality under non-adversarial conditions.

As for reliability, adversarial data getting misclassified will certainly stay an issue, but hopefully straight command injection will become significantly harder than the very low bar we see now.

mdaniel | a day ago

Wowzers what a comedy of errors

And, in case anyone was wondering, yes, of course it was npm

Tenzer | 16 hours ago

Should probably be merged with https://lobste.rs/s/efkl8m/github_issue_title_compromised_4_000.

pyfisch | 16 hours ago

or with the story for the original post https://lobste.rs/s/xbe859/clinejection_compromising_cline_s

: pushcx | 13 hours ago
We merge stories on the same topic within a week, so on big stories like this there tends to be one per week.

cborg | 12 hours ago

History doesn't repeat itself, but it often rhymes.

That's a fun observation about 2600hz, I hadn't thought it of. I would have thought it was more akin to sql injection, until I checked the "fix": https://github.com/cline/cline/pull/9211

saturnyx | 18 hours ago

Seems similar to a blog on legitsecurity I posted on lobste.rs a while back. Blog Post: https://www.legitsecurity.com/blog/camoleak-critical-github-copilot-vulnerability-leaks-private-source-code Lobster Comments: https://lobste.rs/s/jr6zfo/critical_github_copilot_vulnerability

It is not shocking that someone came up with a new way to use GitHub copilot (and other AI tools) to hack and infiltrate repositories. The issue is that AI tools have way too much access to the repo. An AI triage bot being able to install from attacker's typosquatted fork?

Setting limits and bounds to an LLM would help, in my opinion. Tell it what it can and cannot do. And of course we should make sure that security/access tokens have just the right permissions. The companies that make these AIs need to make sure that they process the right info(it should be able to filter out malicious prompts). And of course, we definitely need someone to fact-check and verify everything that the AI does once in a while. I believe these precautions will help to reduce issues like these.

: thesnarky1 | 17 hours ago

it should be able to filter out malicious prompts

This is like telling a telephone switch in 1980 it should be able to tell if the user dropped a quarter in or made it sound like the user dropped a quarter in. We are back to in-band signaling and until there is a technological leap, then users MUST be aware of the risks. This bot was set up against the Claude Github Actions security guide. No one should allow their LLM near untrusted input if they need to trust its output or give it access to trusted actions.

This company got to go a little faster by turning the safeties off and just found out why those safeties exist. Hopefully others will learn, as well.

The issue is that AI tools have way too much access to the repo.

Exactly. Give it the right permissions, your outcome might be different.