Thanks for sharing your experience. I want to share two observations / tips:
It seems to me that you have pretty high expectations, and I am not surprised that you get a minimum-effort solution from Claude when the goal you define is:
I want to fix a build failure in my Nix configuration.
I expect that Claude’s solutions will get “better” the more you spell out what “better” means in your situation. For example, I might try a prompt like “I want to contribute a clean and minimal fix to Nixpkgs that fixes the following build problem:”
For example, I have recently asked Claude to help change a program from load-everything-into-memory to windowed-processing and it produced a working solution that was a little verbose for my taste. I asked it to produce a solution with minimal diff instead, and it rewrote into something easier for me to follow.
With regards to hitting the token limit: I have heard similar things from plenty of people who tried the Pro subscription; it seems to me the subscription just isn’t sufficiently sized. I have used the Max subscription for months and only ever ran into the limits once.
I realize “you just need to pay more money!” isn’t exactly a popular thing to say :)
But, as before, it’s important to consider expectations. Keep in mind that Claude subscriptions are heavily subsidized; if you paid for the actual token usage, you’d pay a lot more.
This is a great, concrete example of where these agents fail in their correctness. I think "haskell + nix" would be at the tip of topics these things are not great at yet
I'm not surprised that you got the low quality output given your initial prompt. Forgive me for finding that slightly amusing given your referenced prompt engineering tool.
@stapelberg put it perfectly in his comment. All I would add is that in my case when I want Claude to one-shot something, I start with a brainstorming session.
The simplest way to do this is to use plan mode (shift+tab in Claude Code until it says "plan mode on" in the status bar).
Depending on the complexity of the task I may use the brainstorming skill from https://github.com/obra/superpowers/ This will use the Socratic method to unpack what needs to be done, it will create a plan and write it to a file. You can then get Claude to execute this plan to the tee.
Maybe that's "too much effort" for a one-shot. I would compare that to how much back-and-forth you describe in your blog post. A bit of architecture, design and planning can go along way. Even with an agent.
I have some nix experience and a little claude experience, and I would have gone about this in a much different way. At this stage:
The build succeeded on one machine but failed on another.
I would have asked claude to help me run the build on both machines with more debugging information and compare the build logs in order to determine why it was failing... before trying to dive right into fixing it.
For many things you can indeed dump a compiler error or traceback into claude and say "this is broken, fix it".. but I'm not surprised that it would go wrong on a complicated build issue like this.
Separately, you should always start in plan mode first. This might not have made a difference in this particular scenario, but you will get better results in general if you iterate on a detailed plan first before making changes. If you find yourself interrupting claude and changing course, it probably could have been avoided if more time was spent planning.
stapelberg | 9 hours ago
Thanks for sharing your experience. I want to share two observations / tips:
It seems to me that you have pretty high expectations, and I am not surprised that you get a minimum-effort solution from Claude when the goal you define is:
I expect that Claude’s solutions will get “better” the more you spell out what “better” means in your situation. For example, I might try a prompt like “I want to contribute a clean and minimal fix to Nixpkgs that fixes the following build problem:”
For example, I have recently asked Claude to help change a program from load-everything-into-memory to windowed-processing and it produced a working solution that was a little verbose for my taste. I asked it to produce a solution with minimal diff instead, and it rewrote into something easier for me to follow.
With regards to hitting the token limit: I have heard similar things from plenty of people who tried the Pro subscription; it seems to me the subscription just isn’t sufficiently sized. I have used the Max subscription for months and only ever ran into the limits once.
I realize “you just need to pay more money!” isn’t exactly a popular thing to say :)
But, as before, it’s important to consider expectations. Keep in mind that Claude subscriptions are heavily subsidized; if you paid for the actual token usage, you’d pay a lot more.
With the Max subscription, and by “aiming higher” and letting Claude run unsupervised (in safe MicroVMs, see https://michael.stapelberg.ch/posts/2026-02-01-coding-agent-microvm-nix/), I have achieved progressively more impressive results.
Hope some of this is useful to someone!
samcat116 | 7 hours ago
This is a great, concrete example of where these agents fail in their correctness. I think "haskell + nix" would be at the tip of topics these things are not great at yet
deevus | an hour ago
I'm not surprised that you got the low quality output given your initial prompt. Forgive me for finding that slightly amusing given your referenced prompt engineering tool.
@stapelberg put it perfectly in his comment. All I would add is that in my case when I want Claude to one-shot something, I start with a brainstorming session.
The simplest way to do this is to use plan mode (shift+tab in Claude Code until it says "plan mode on" in the status bar).
Depending on the complexity of the task I may use the brainstorming skill from https://github.com/obra/superpowers/ This will use the Socratic method to unpack what needs to be done, it will create a plan and write it to a file. You can then get Claude to execute this plan to the tee.
Maybe that's "too much effort" for a one-shot. I would compare that to how much back-and-forth you describe in your blog post. A bit of architecture, design and planning can go along way. Even with an agent.
st3fan | an hour ago
This is not an example of vibe coding.
JustinAzoff | 8 hours ago
I have some nix experience and a little claude experience, and I would have gone about this in a much different way. At this stage:
I would have asked claude to help me run the build on both machines with more debugging information and compare the build logs in order to determine why it was failing... before trying to dive right into fixing it.
For many things you can indeed dump a compiler error or traceback into claude and say "this is broken, fix it".. but I'm not surprised that it would go wrong on a complicated build issue like this.
Separately, you should always start in plan mode first. This might not have made a difference in this particular scenario, but you will get better results in general if you iterate on a detailed plan first before making changes. If you find yourself interrupting claude and changing course, it probably could have been avoided if more time was spent planning.