Are any companies doing this sharing the code being produced or some example Pull Requests? I am wondering if a lot of the human review is substantive or rubber stamping - as we see with long Pull Requests from humans. I know I would half-ass a review of a PR containing lots of robot code. I assume stripe has higher standards than me but would be nice to see some real world examples.
On thing that troubles me is that code reviews are also an educational moment for seniors teaching juniors as well as an opportunity for people who know a system to point out otherwise undocumented constraints of the system. If people slack on reviews with the agent it means these other externalities suffer.
Are being handling this at all? Is it no longer needed because it gets rolled into AGENTS.md?
I find working with Ai a lot like working with a junior employee... with the junior employee they learn and get better (skill level and at dealing with me) but with Ai the mentoring lessons reset once you type /clear
Skills are a positive development for task preferences, agents.md for high level context, but a lot of the time its just easier to do things the way your Ai wants.
You see, this is no longer necessary - companies are firing all the non-seniors, are not hiring any juniors, and delegating everything to AI. This is the future apparently!
The glass-half-full here is it’s an incredible signal that one of the largest financial gateways in the world is _able_ to do this with current capabilities.
They still need someone to review and hopefully QA every PR. I doubt it’s saving much time except maybe the initial debug pass of building human context of the problem. The real benefit here is the ability for the human swe to quickly context switch between problems and domains.
But again: the agent can only move as fast as we can review code.
They are enforcing rigor, on agents, the same way they would on humans. Do people think Stripe's engineering team would have been able to progress if each individual (human | machine) employee was not under harness and guardrail, and just wrote code willy nilly according to their whims? Vibe coding is whimsical, agentic engineering is re-applying what brought and brings rigor to software engineering in general, just to LLM outputs. Of course, it's not only that and there are novel problem spaces.
Where is the detail? Examples? Something concrete? I don't think it is, but it does read like LLM generated content marketing. Lots of generic statements everyone knows. Yes, dev environments are helpful. Have been for 20 years. Yes, context and rules are important for agents. Surprise.
TLDR "look we use AI at Stripe too, come work here"
Why would you go and work for a company that markets itself in the blog post that they just managed to automate a substantial part of the software engineering work?
I'm sure there are lots of Stripe engineers that cruise the comments here. Anyone care to provide some color on how this is actually working? It's not a secret that agents can produce tons and tons of code on their own. But is this code being shipped? Maintained? Reviewed?
The few guys who they haven't laid off are too busy reviewing and being overworked, doing the work of 10 to scroll HN. Gotta get their boss another boat, AI is so awesome!
Stripe hasn't had a layoff in a good while. Stripe is hiring like mad and is planning on growing engineering significantly. Your comment isn't grounded in reality
successful how? the only metric i see is # of pull requests which means nothing. hell, $dayJob has hundreds of PRs generated weekly from renovate, i18n integrations, etc. with no LLM in the mix!
Part 1 is linked in this article and explains a bit: “Minions are Stripe’s homegrown coding agents. They’re fully unattended and built to one-shot tasks. Over a thousand pull requests merged each week at Stripe are completely minion-produced, and while they’re human-reviewed, they contain no human-written code.”
I could be wrong, but my educated guess is that, like many companies, they have many low hanging fruit tasks that would never make it into a sprint or even somewhat larger tasks that are straight forward to define and implement in isolation.
How is this already #1 on the front page with 12 upvotes and 9 comments…
The article doesn’t reveal much. It feels like a fluff piece, and I can’t comprehend what the goal of sharing “we use AI agents” means for the dev community, with little to no examples to share. For a “dev” micro blog, this feels very lackluster. Maybe the Minion could have helped with the technical docs?
EDIT: slightly adjusts tinfoil hat minutes later it’s at #6
Or the simpler explanation (which is probably closer to the truth): Stripe is a very popular company on HN as many people use them, their founders sometimes comment here and if they share their opinion on something people pay attention and upvote it.
I was an early MCP hater, but one thing I will say about it is that it's useful as a common interface for secure centralization. I can control auth and policy centrally via a MCP gateway in a way that would be much harder if I had to stitch together API proxies, CLIs, etc to provide capabilities.
However, it is also light on material. I would also like to hear more technical details, they're probably intentionally secretive about it.
But I do, however, understand that building an agent that is highly optimized for your own codebase/process is possible. In fact, I am pretty sure many companies do that but it's not yet in the ether.
Otherwise, one of the most interesting bits from the article was
> Over 1,300 Stripe pull requests (up from 1,000 as of Part 1) merged each week are completely minion-produced, human-reviewed, but containing no human-written code.
I feel like code review is already hard and under done the 'velocity' here is only going to make that worse.
I am also curious how this works when the new crop of junior devs do not have the experience enough to review code but are not getting the experience from writing it.
Agents can already do the review by themselves. I'd be surprised they review all of the code by hand. They probably can't mention it due to the regulatory of the field itself. But from what I have seen agentic review tools are already between 80th and 90th percentile. Out of randomly picked 10 engineers, it will provide more useful comments than most engineers.
the problem with LLM code review is that it's good at checking local consistency and minor bugs, but it generally can't tell you if you are solving the wrong problem or if your approach is a bad one for non-technical reasons.
This is an enormous drawback and makes LLM code review more akin to a linter at the moment.
I mean if the model can reason about making the changes on the large-scale repository then this implies it can also reason about the change somebody else did, no? I kinda agree and disagree with you at the same time, which is why I said most of the engineers but I believe we are heading towards the model being able to completely autonomously write and review its own changes.
There's a good chance that in the long run LLMs can become good at this, but this would require them e.g. being plugged into the meetings and so on that led to a particular feature request. To be a good software engineer, you need all the inputs that software engineers get.
If you read thoroughly through Stripe blog, you will see that they feed their model already with this or similar type of information. Being plugged into the meetings might just mean feed the model with the meeting minutes or let the model listen to the meeting and transcribe the meeting. It seems to me that both of them are possible even as of today.
well, it's very important, now you know the financial code is handled by a bunch of barely supervised AI tools and can make decisions on whether to use product or not based on that
That's for the Stripe customer to configure. Stripe itself has supported 3DS since ages ago.
Edit: also you'll find a pretty common sentiment among US website owners is that the new API that supports 3DS is overcomplicated and they want their 7 lines of code create-a-charge-with-a-token back. Screw the Europeans because they only care about US buyers anyway.
This is a devops post. They just brag about the plumbing.
Dark secret of dark factory is high quality human input, which takes time and focus to draft up, otherwise human will end up multiple shot it, and read thru the transcript to tune the input.
Stripe had invested a lot in dev experience over the years precisely because of how "unique" some of the technology choices were: Mongo and originally normal Ruby for a system that mainly deals with money? Without a massive test suite, letting a normal dev make changes without a lot of rails is asking for a sea of incidents. If I recall correctly, the parallelization needed to run the unit tests for developers used to make the cost of continuous integration higher than the cost of the rest of the ec2 instances. Add the dev boxes, as trying to put a useful test environment in a laptop became unreasonable, and they already start with a pile of guardrails tooling that other companies never even needed. A hassle for years, but now a boon, as the guardrails help the LLMs.
It'd be nice to get an old school, stripey blog post, the kind that has a bit less fluff, and is mostly the data you'd all have put in the footnotes of the shipped email. Something that actually talks about the difficulties, instead of non-replicable generalities. After all, if one looks at the stock price, it's not as if competitors are being all that competitive lately, and I don't think it's mainly the details of the AI that make a difference. It'd also be nice to hear what goes one when not just babysitting minions, if there's actually anything else a dev is doing nowadays. AI adoption has changed the day to day experience within the industry, as most managers don't seem to know which way is up. So just explaining what days look like today might even sell as a recruiting initiative.
Seems like a compliance thing? I too run my LLMs inside some sort of containment and does "manual" development inside the same environment, but wouldn't make sense to have that containment remotely, so I'm guessing that's because they need to have some sort of strict control over it?
While there are compliance/security benefits it is not the primary motivation.
If you have fairly complicated infrastructure it can be way more efficient to have a pool of ready to go beefy EC2 instances on a recent commit of your multi-GB git repo instead of having to run everything on a laptop.
Amazon developers use similar devboxes. I think it is mostly so that developers can use a production-like Linux environment with integrated Amazon dev tooling. You're not required to use a devbox, but it can be easier and more convenient than running stuff on your laptop.
is there a way to visualize what your agents are doing? I'm adding a bunch of debug code to the claude agent sdk, but it's a bit overwhelming to read at some time , but I just want to see visually what how it does all the tools calling, what files it reads etc…
> Since MCP is a common language for all agents at Stripe, not just minions, we built a central internal MCP server called Toolshed, which hosts more than 400 MCP tools spanning internal systems and SaaS platforms we use at Stripe.
Are there ecisting open source solutions for such a toolshed?
When the norm on HF daily papes is a pdf, a website, & a git, corp. thinkpieces like this are going to have to work a lot harder to get any meaningful engagement and onward corp word-of-mouth. like i now think stripe has fallen a but, where before they were tier...
this is the risk you run putting out subpar content in a world where the norms and appetites shift daily. slightly unfair? yes, as stripe has never been one to shine in the "corp infra open source" category tho; Which is crazy because it doesn't reflect the ridiculous talent they have internally.
3rodents | a day ago
fnord123 | a day ago
Are being handling this at all? Is it no longer needed because it gets rolled into AGENTS.md?
blitzar | a day ago
Skills are a positive development for task preferences, agents.md for high level context, but a lot of the time its just easier to do things the way your Ai wants.
yunohn | a day ago
You see, this is no longer necessary - companies are firing all the non-seniors, are not hiring any juniors, and delegating everything to AI. This is the future apparently!
vbs_redlof | a day ago
Next up: let's vibe code a pacemaker.
trevorhinesley | a day ago
Personally, this is exciting.
kypro | a day ago
Hard to do an exact ROI, but they're probably saving something like $20,000,000+ / year from not having to hire engineers to do this work.
qudat | a day ago
But again: the agent can only move as fast as we can review code.
echelon | a day ago
Financial capital at scale will begin to run circles around labor capital.
handfuloflight | a day ago
handfuloflight | a day ago
ndr | a day ago
> Cardiologist wins 3rd place at Anthropic's hackathon.
https://x.com/trajektoriePL/status/2024774752116658539
jobs_throwaway | a day ago
gas9S9zw3P9c | a day ago
TLDR "look we use AI at Stripe too, come work here"
menaerus | 15 hours ago
rco8786 | a day ago
dakolli | a day ago
malfist | a day ago
co_king_5 | a day ago
snayan | a day ago
rileymichael | a day ago
etothet | a day ago
I could be wrong, but my educated guess is that, like many companies, they have many low hanging fruit tasks that would never make it into a sprint or even somewhat larger tasks that are straight forward to define and implement in isolation.
testfrequency | a day ago
The article doesn’t reveal much. It feels like a fluff piece, and I can’t comprehend what the goal of sharing “we use AI agents” means for the dev community, with little to no examples to share. For a “dev” micro blog, this feels very lackluster. Maybe the Minion could have helped with the technical docs?
EDIT: slightly adjusts tinfoil hat minutes later it’s at #6
BiteCode_dev | a day ago
Marketting is a major goal of HN after all.
handfuloflight | a day ago
dewey | a day ago
nullstyle | a day ago
BiteCode_dev | a day ago
steveklabnik | a day ago
nylonstrung | a day ago
Reinventing the wheel without explaining why existing tools didn't work
Creating buzzwords ("blueprints" "devboxes") for concepts that are not novel and already have common terms
Yet they embrace MCP of all things as a transport layer- the one part of the common "agentic" stack that genuinely sucks and needs to be reinvented
throwaway-aws9 | a day ago
CuriouslyC | a day ago
__float | a day ago
menaerus | a day ago
However, it is also light on material. I would also like to hear more technical details, they're probably intentionally secretive about it.
But I do, however, understand that building an agent that is highly optimized for your own codebase/process is possible. In fact, I am pretty sure many companies do that but it's not yet in the ether.
Otherwise, one of the most interesting bits from the article was
> Over 1,300 Stripe pull requests (up from 1,000 as of Part 1) merged each week are completely minion-produced, human-reviewed, but containing no human-written code.
tempest_ | a day ago
"LGTM..."
I feel like code review is already hard and under done the 'velocity' here is only going to make that worse.
I am also curious how this works when the new crop of junior devs do not have the experience enough to review code but are not getting the experience from writing it.
Time will tell I guess.
menaerus | a day ago
tibbar | a day ago
This is an enormous drawback and makes LLM code review more akin to a linter at the moment.
menaerus | a day ago
tibbar | a day ago
menaerus | a day ago
croes | a day ago
Won‘t that be the nee normal with all those AI agents?
No frameworks, no libraries, just let AI create everything from scratch again
netule | a day ago
PunchyHamster | a day ago
nottorp | a day ago
lmz | a day ago
Edit: also you'll find a pretty common sentiment among US website owners is that the new API that supports 3DS is overcomplicated and they want their 7 lines of code create-a-charge-with-a-token back. Screw the Europeans because they only care about US buyers anyway.
nottorp | 6 hours ago
Keeping my subscriptions to Asimov's and Ars Technica is becoming a pain though because ... Stripe I guess. Ars staff even confirmed it.
A Revolut card works fine, local banks' cards deny the charge by default and if you're lucky they call you and ask if they should allow it.
jimmydoe | a day ago
Dark secret of dark factory is high quality human input, which takes time and focus to draft up, otherwise human will end up multiple shot it, and read thru the transcript to tune the input.
_ache_ | a day ago
hibikir | a day ago
It'd be nice to get an old school, stripey blog post, the kind that has a bit less fluff, and is mostly the data you'd all have put in the footnotes of the shipped email. Something that actually talks about the difficulties, instead of non-replicable generalities. After all, if one looks at the stock price, it's not as if competitors are being all that competitive lately, and I don't think it's mainly the details of the AI that make a difference. It'd also be nice to hear what goes one when not just babysitting minions, if there's actually anything else a dev is doing nowadays. AI adoption has changed the day to day experience within the industry, as most managers don't seem to know which way is up. So just explaining what days look like today might even sell as a recruiting initiative.
wooptoo | a day ago
embedding-shape | a day ago
kkl | 18 hours ago
If you have fairly complicated infrastructure it can be way more efficient to have a pool of ready to go beefy EC2 instances on a recent commit of your multi-GB git repo instead of having to run everything on a laptop.
steveklabnik | a day ago
chrchr | a day ago
eric_khun | a day ago
_ink_ | a day ago
hrgahb | a day ago
tuhgdetzhh | a day ago
Are there ecisting open source solutions for such a toolshed?
rmac | a day ago
this is the risk you run putting out subpar content in a world where the norms and appetites shift daily. slightly unfair? yes, as stripe has never been one to shine in the "corp infra open source" category tho; Which is crazy because it doesn't reflect the ridiculous talent they have internally.