AKA, if a malicious skill got into your AI agent, you're cooked.
I think this isn't surprising, nor do I think it should be considered a prompt injection at all. An AI skill is akin to a plugin for traditional software - if you install a malicious IDE extension or Outlook plugin, the attacker can also do whatever they want to the PC and exfiltrate whatever data they want to. So this article is a big nothingburger.
ai skill is not just a plugin. given the right model, supposedly, it can do much more. since everyones harness tends to be tied to the model, it has a whole tool set to use.
i think people are probably already doing it. i made a skill scanner but it's also just easy to download a zip and inspect the contents... but people are loading these things remotely. i agree that it is easy to not install a pentester's magic skill, but the attack capabilities a skill can have are pretty insane. people should just make their own is my pov.
While debugging in Cursor a couple weeks ago, Opus 4.6 chirped it had discovered that my token, when base64 decoded, had a date that was in the past - perhaps expired?
And it was expired!
And I was happy. And some time passed - and I realized it had read my .env file and performed operations on my API keys.
That these models do all this stuff already makes me assume any skill take over is simply trivial.
Unlike plugins in traditional software, skills do not represent a carveout from any security boundary nor run with elevated trust. They're just selectively loaded context. Anything you can convince an agent to do with a skill you can convince it to do without one.
allowlisting breaks once the agent has messaging tools. you can deny all outbound from the agent, but if it can post to teams or slack or email, link previews will fetch whatever URL the injection puts in. messaging is usually the first tool anyone adds to an enterprise agent so you end up with strict network controls that don't actually prevent anything.
Thankfully inserting malicious skills is not something that can easily be done, you need to a lot of things wrong and the attacker to do a lot of things right in order for it to be exploited.
> nor do I think it should be considered a prompt injection at all
Can we stop the apologetic framing? It's increasingly common to create exploits from multiple vulnerabilities. Each one is bad. Downloading corporate malware is stupid. Adding random prompt injection is reckless. Insane to run autonomous agents on top of it.
Prompt injection is more serious in this regard, because there is no known solid protection. All the other problems are failure in process, prompt injection is failure at the first thought.
Well, isn't that swell - good that meanwhile countless MBA cretins have "adopted" enterprise-wide Copilot integrations, to make their companies "AI native" or whatever the word is on LinkedinLunatics street these days.
A skill is just a program for an LLM agent. This just seems like works-as-expected. Are the five lines in the skill notably innocuous or something? I don't mean to dismiss it out of hand but I don't understand what happened here because it seems to read "`curl $url | bash` can exfiltrate data" which seems pretty straightforward that it can.
This is the result of anthropomorphizing LLMs. People are thinking “I am giving instructions to a human” and not “I am giving instructions to a computer”.
Nice find. We're PoCing Cowork and I've personally been impressed with it so far, but it seems we'll have to wait with a wider rollout until Microoft give us more admin feature to turn off what users can do with it.
> Note: Admins have limited oversight of ‘Skills’, as Skills in Copilot Cowork are automatically loaded from a specific path in a user’s OneDrive.
I feel this part is a bit disingenuous. We have full control over the sharepoint containers which house users personal onedrives. We actively scan them and prevent a lot of files from getting in them. That being said, it's still a fair point, because a "skill" could basically be a text file.
No, Teams classics was a much better product. This is javascript slop all over. From unresolved updates to inconsistent state across the screens, it's insanely bad.
Didn't the first 365 copilot lauch have a whole rollback as they belateded realised the rag setup would often ignore file access and permissions, so queries like "List the highest paid members of x team sorted by salary" would just work etc?
The combo of rushing with a technology that isn't very easy to control, understand or securely limit is just mad to me.
Bingo. MS has so many strengths that should make them relevant — a billion or so Windows installations, ~~Office~~ ~~M365~~ M365 Copilot is still the de facto productivity suite, Azure data centers, the OpenAI deal — and they just can’t get out of their own way because their strategy is “leverage those strengths to cram Copilot down peoples’ throats”
They have no taste. And no aspiration to taste. The industry is moving too quickly for the quasi-monopoly strategy of forcing users to buy their product.
Books will be written about how Microsoft had an amazing strategic position and failed at AI because they never prioritized an actual great product.
Exfiltrates: to steal sensitive data from a computer system (for example, via a flash drive).
I'm not going to defend Microsoft here, but the title (at the source blog) is misleading and a bit rage-baity. What happened with Cowork may have been rushed, possibly due to incompetence, but incompetence is not malice. This framing is also recycled across a few of the author's other interesting findings.
Within the article, the wording is much more accurate: “The victim uploads a skill file to Copilot Cowork that contains a prompt injection,” and “The injection manipulates Microsoft Copilot Cowork into posting a Teams message that exfiltrates pre-authenticated file download links when viewed.”
The malice is by the author of the malicious skill file.
This is an intrinsic risk associated with giving LLMs access to sensitive material. It's reckless of Microsoft to give an LLM such broad access based on the user's own permissions.
If there were a confirmation prompt for the Teams message, why would even a highly competent user refuse it? That's what the skill says it will do. The message is expected, the visible content is expected, a confirmation prompt is just a nuisance.
> Within the article, the wording is much more accurate: “The victim uploads a skill file to Copilot Cowork that contains a prompt injection,” and “The injection manipulates Microsoft Copilot Cowork into posting a Teams message that exfiltrates pre-authenticated file download links when viewed.”
it's indeed accurate and clearly states what the outcome is: an exfiltration. why is it misleading to say so in the title?
it's pretty obvious that it means that "cowork" is the component vulnerable to exfiltration, not the prime actor.
I don’t think rushed is the right word. Copilot Cowork is still in beta (or „Frontier“ as MS calls beta now) and is not generally available. Beta features have bugs, good on this researcher to find bugs before its release.
It's not the first time we hear about prompt injection attacks, and for sure it's the fault of Microsoft. Many talking about the prompt injection itself, whether Copilot should be able to defense prompt injections, etc. But that's not the problem.
IMO the real vulnerability is located at the "Act" part of "ReAct" (reasoning and action) agent framework.
> “[Copilot] Cowork asks for your permission before taking sensitive actions...” ... when the recipient is the active user, these actions execute immediately without requiring human approval (users do not have a setting to modify this behavior).
> Copilot Cowork can retrieve ‘pre-authenticated download links’ for files the user has access to, which allow anyone who opens the link to download that file.
> Microsoft Copilot Cowork has read access to essentially any resource a user does through Microsoft Graph. As such, the primary mechanism to reduce the blast radius of attacks like this is to restrict excessive permissioning across one’s Microsoft ecosystem.
Take it easy. Inside the whole attack flow, Microsoft gives Cowork unrestricted access and the ability to bypass approvals. I don't find much problem with LLMs here. It's said the attack is also a threat for Opus 4.7, but I've found several times Opus 4.7 forbidding context7.com's "prompt injections" only requiring opus to ask me creating an context7 API key to get more requests for free. From my personal experience, such models indeed are trained to perceive injections, but these injections could mask themselves as sth like Agent Skills, and there are always ways to win as red teams.
We may not lay our hope too much on defense of injections, but concentrating on restricting LLM's permissions. The popular usage of CLIs in agents' (especially coding agents) workflow has also concerned me since most cli tools an agent can access actually have the same permissions with users.
“IMO the real vulnerability is located at the "Act" part of "ReAct" (reasoning and action) agent framework.”
This is a fancy way of saying that “the problem is tool calling”, which is obviously true. The problem is that, when it works correctly (99.99% of the time), it adds so much more value to LLMs.
Sandboxing is a step in the right direction, but can also add friction.
Using guardrails is also good, but adds latency, expenses, and also doesn’t solve 100% of the issues.
IMHO there currently does not exist a proper solution to this problem, and it has yet to be discovered. The proper solution, however, should NOT be based on LLMs, so guardrails are the incorrect direction (albeit effective and easier to implement).
Ultimately it all sounds like variations of “don’t blame the tool for situations the tool enables,” which has never been particularly convincing as an argument if you ask me.
By using "ReAct", I just wanted to emphasize the "agentic" perspective of tool calling, which makes tool calling facing the real world and at risk sometimes. So I'm not downplaying the significance of tool callings.
Yes I'm a builder of an agent infra on PCs, so I can completely sense that the protective measures are weak and inadequate, sometimes seeming like an unsolvable problem. But according to the article, what Microsoft did was hard to tell in a polite way. If they had even a little security awareness, I could completely understand, but it's like they've vibe coded the entire permissions system of Cowork.
The problem is natural language as a medium. It is too ambiguous and has way too many variants to say literally anything imaginable that there is no way of protecting against prompt injection without some kind of NLP filter or something. I don't really see how someone can develop a kind of protection against this given these problems.
2001zhaozhao | 19 hours ago
I think this isn't surprising, nor do I think it should be considered a prompt injection at all. An AI skill is akin to a plugin for traditional software - if you install a malicious IDE extension or Outlook plugin, the attacker can also do whatever they want to the PC and exfiltrate whatever data they want to. So this article is a big nothingburger.
Jabrov | 19 hours ago
aabhay | 19 hours ago
nico | 19 hours ago
cyanydeez | 19 hours ago
0gs | 19 hours ago
lelandfe | 18 hours ago
And it was expired!
And I was happy. And some time passed - and I realized it had read my .env file and performed operations on my API keys.
That these models do all this stuff already makes me assume any skill take over is simply trivial.
SpicyLemonZest | 19 hours ago
mdavidn | 19 hours ago
Shank | 14 hours ago
Yes. It can read email.
bberenberg | 19 hours ago
degamad | 15 hours ago
Or if it has access to a tool call which allows it to exfiltrate data.
In the example identified, the AI agent never accesses the exfiltration URL.
The agent sends an innocuous-looking message to a user via a teams message.
MSTeams previews the link, accessing the exfiltration URL.
tuo-lei | 3 hours ago
ares623 | 19 hours ago
datadrivenangel | 19 hours ago
Yokohiii | 18 hours ago
Can we stop the apologetic framing? It's increasingly common to create exploits from multiple vulnerabilities. Each one is bad. Downloading corporate malware is stupid. Adding random prompt injection is reckless. Insane to run autonomous agents on top of it.
Prompt injection is more serious in this regard, because there is no known solid protection. All the other problems are failure in process, prompt injection is failure at the first thought.
hansmayer | 19 hours ago
bestony | 19 hours ago
arjie | 19 hours ago
mdavidn | 19 hours ago
jychang | 18 hours ago
This reads to me as "user installed exe file can upload your data to a server". Um, yes, that's the point?
This seems like this generation's equivalent of "don't open Linkin-Park.mp3.exe from limewire"
xigoi | 12 hours ago
prpl | 18 hours ago
Quothling | 19 hours ago
> Note: Admins have limited oversight of ‘Skills’, as Skills in Copilot Cowork are automatically loaded from a specific path in a user’s OneDrive.
I feel this part is a bit disingenuous. We have full control over the sharepoint containers which house users personal onedrives. We actively scan them and prevent a lot of files from getting in them. That being said, it's still a fair point, because a "skill" could basically be a text file.
pwarner | 19 hours ago
keyle | 19 hours ago
The amount of brokenness in Teams never stops to astonish. It's that bad I think it's a psyop to nudge people back to the office.
pwarner | 15 hours ago
keyle | 14 hours ago
hennell | 19 hours ago
The combo of rushing with a technology that isn't very easy to control, understand or securely limit is just mad to me.
brookst | 18 hours ago
They have no taste. And no aspiration to taste. The industry is moving too quickly for the quasi-monopoly strategy of forcing users to buy their product.
Books will be written about how Microsoft had an amazing strategic position and failed at AI because they never prioritized an actual great product.
Awsum_IceCream | 19 hours ago
TZubiri | 19 hours ago
Here's my repo for running copilot in a vm
github.com/gokuvegeta894/node-copilot-vm
(Fake link, if someone typosquats the above link and it exists, assume it's malware)
mlacks | 18 hours ago
I'm not going to defend Microsoft here, but the title (at the source blog) is misleading and a bit rage-baity. What happened with Cowork may have been rushed, possibly due to incompetence, but incompetence is not malice. This framing is also recycled across a few of the author's other interesting findings.
Within the article, the wording is much more accurate: “The victim uploads a skill file to Copilot Cowork that contains a prompt injection,” and “The injection manipulates Microsoft Copilot Cowork into posting a Teams message that exfiltrates pre-authenticated file download links when viewed.”
codebje | 18 hours ago
This is an intrinsic risk associated with giving LLMs access to sensitive material. It's reckless of Microsoft to give an LLM such broad access based on the user's own permissions.
If there were a confirmation prompt for the Teams message, why would even a highly competent user refuse it? That's what the skill says it will do. The message is expected, the visible content is expected, a confirmation prompt is just a nuisance.
mlacks | 18 hours ago
znort_ | 16 hours ago
it's indeed accurate and clearly states what the outcome is: an exfiltration. why is it misleading to say so in the title?
it's pretty obvious that it means that "cowork" is the component vulnerable to exfiltration, not the prime actor.
moontear | 13 hours ago
simonw | 18 hours ago
throwaway85825 | 18 hours ago
EFLKumo | 18 hours ago
OpenAI released their LLM-driven browser Atlas last year. Though their team is brilliant (https://openai.com/index/hardening-atlas-against-prompt-inje...), there has been a number of succeeded injection attacks.
IMO the real vulnerability is located at the "Act" part of "ReAct" (reasoning and action) agent framework.
> “[Copilot] Cowork asks for your permission before taking sensitive actions...” ... when the recipient is the active user, these actions execute immediately without requiring human approval (users do not have a setting to modify this behavior).
> Copilot Cowork can retrieve ‘pre-authenticated download links’ for files the user has access to, which allow anyone who opens the link to download that file.
> Microsoft Copilot Cowork has read access to essentially any resource a user does through Microsoft Graph. As such, the primary mechanism to reduce the blast radius of attacks like this is to restrict excessive permissioning across one’s Microsoft ecosystem.
Take it easy. Inside the whole attack flow, Microsoft gives Cowork unrestricted access and the ability to bypass approvals. I don't find much problem with LLMs here. It's said the attack is also a threat for Opus 4.7, but I've found several times Opus 4.7 forbidding context7.com's "prompt injections" only requiring opus to ask me creating an context7 API key to get more requests for free. From my personal experience, such models indeed are trained to perceive injections, but these injections could mask themselves as sth like Agent Skills, and there are always ways to win as red teams.
We may not lay our hope too much on defense of injections, but concentrating on restricting LLM's permissions. The popular usage of CLIs in agents' (especially coding agents) workflow has also concerned me since most cli tools an agent can access actually have the same permissions with users.
stingraycharles | 18 hours ago
This is a fancy way of saying that “the problem is tool calling”, which is obviously true. The problem is that, when it works correctly (99.99% of the time), it adds so much more value to LLMs.
Sandboxing is a step in the right direction, but can also add friction.
Using guardrails is also good, but adds latency, expenses, and also doesn’t solve 100% of the issues.
IMHO there currently does not exist a proper solution to this problem, and it has yet to be discovered. The proper solution, however, should NOT be based on LLMs, so guardrails are the incorrect direction (albeit effective and easier to implement).
Forgeties79 | 18 hours ago
EFLKumo | 17 hours ago
Yes I'm a builder of an agent infra on PCs, so I can completely sense that the protective measures are weak and inadequate, sometimes seeming like an unsolvable problem. But according to the article, what Microsoft did was hard to tell in a polite way. If they had even a little security awareness, I could completely understand, but it's like they've vibe coded the entire permissions system of Cowork.
ethin | 16 hours ago
MengerSponge | 17 hours ago
https://news.ycombinator.com/item?id=47587866
ogundipeore | 16 hours ago
ElenaDaibunny | 15 hours ago
hulitu | 8 hours ago