According to the author's intention, it is the code that he cannot understand or control. Even if the solution provided by the AI works, he will not adopt it. This is unless he can understand or control it. This should be an assumption.
However, if AI provides a solution, as the person using AI, one should conduct research before making a decision. This is not in conflict with or hindered by the use of the ideas provided by AI.
I will say--as someone who has fielded late night troubleshooting calls--I totally understand OP's point of view. It's reasonable to expect that you will be able to answer questions about something that you ship, or brainstorm ways to solve a problem a customer is encountering while using something you provided them.
The obvious counterargument is "well, just ask the AI for those answers," but the AI lacks the context and experience that you have. Sometimes, genuinely, the user really is just "holding it wrong," but none of the current AI models would ever admit that, and you'd spend hours trying to solve an unsolvable problem.
I think this policy is probably more prescriptive than I would go with myself. I like to think of my risk tolerance first to help make that determination.
For example, I use a vibecoded internal tool written in Go. I don’t even know how to write Go. Haven’t read a single line of the code. I just wanted to move from bash scripts to using cloud SDKs for performance reasons.
But the internal tool is a convenience tool, and you can do everything it does using alternative methods. So if it break, there is no real negative impact besides personal convenience of anyone using it. There’s some documentation on how to do everything manually if needed.
Here’s another example: you’re making a static website. No JavaScript, no interactivity. Truly, what could go wrong? And while I do understand HTML a lot better than Go, it wouldn’t really matter if I didn’t.
> Here’s another example: you’re making a static website. No JavaScript, no interactivity.
Linking a huge file consuming clients’s bandwith for no reason. Embedding PII in the html source? And if setting up your own server, misconfiguring it?…
“Setting up your own server” isn’t part of this, as you’d almost certainly deploy a static site using something simple and serverless.
You also don’t need to know how to read HTML to recognize large files. You can catch issues like this with a simple website performance testing tool like pagespeed.web.dev
I’m also not sure how PII would enter the HTML source.
If I'm on call solving a problem another engineer caused and I reach out to them for clarification and they say 'I don't know, the AI wrote it' I am going to advocate for them being fired tomorrow.
You can't. You must prove it. And I don't mean that you need a rigorous scientific proof - that would obviously be too hard to do for every single function/library/program.
Human developer can work on a program incrementally, ensuring at each step that it is mostly correct.
But LLMs can't think, they fake reasoning and explore problem space in random walk until they stumble into something that looks like a solution. And these "solutions" will have hilarious and absolutely unexpected failure modes.
"The reality is that code that runs and makes the CI green can still be a bad solution, and engineering has always been about implementing adequate, scalable, and extensible solutions."
Disagree, adequate means adequate. Done and cheap is what you call it when a solution is adequate. If the solution isn't adequate, it doesn't matter if it's cheap, because it isn't done.
It really, REALLY depends what you're working on. If you're throwing together an internal tool or simple dashboard, it doesn't really matter what the code looks like. But if you're writing software that other programs will depend on, bad design choices ripple out and affect another generation of software. Imagine slop in the linux kernel, in google chrome, or in your compiler or runtime. Its not acceptable.
I know a lot of people spend their careers writing end user software and web UIs. AI is increasingly a good choice for this sort of code. But that's not all of us. And its not all of the software being written.
I was just watching a video about system engineering and the following stucks:
Stakeholder needs: What people wants to get done with the product
Management needs: How to manage the spending of resources (time, money,…) to create the product
Engineering needs: What is the product
You have to balance the three. Sometimes it’s simple and easy to get right. Sometimes it’s complex enough, you’re never truly sure until the product is out in the wild.
Software is malleable and we can do easily do iterations which is not possible with hardware. But today, we have a skew towards engineering, where the whole focus is to create a solution, whatever that is. No understanding of the problem, no proper allocation of resources, just do something. Even if it is plastering over the crack for the eleventh time.
The stakeholders just want to send emails and excels around,
someone in management has a budget for a productivity enhancing tool to replace that and
the engineers have a half-baked solution that
some sales guys are saying is the second coming.
Even using Fable (while it was briefly available), having it refine a plan, and directing it to make only small incremental changes, I still found reasons to reject its first pass at a lot of work. There was a lot of “You’re right to push back” responses. A lot of incidents where it would creat some giant complex set of abstractions to accomplish something that I could find ways to do much more elegantly and in a more maintainable manner.
It’s really eye opening to work with these tools on a codebase you know deeply because these problems are everywhere.
However if I opened an unfamiliar project in another language and I wanted to add a little feature with no intention of maintaining it, I’d happily accept the changes and loop until it worked well enough for my temporary needs.
The scary middle is when you’re dealing with coworkers who don’t care about anything other than closing tickets and collecting credit. With enough of a token budget you can now wrap loops around an LLM and have it try things until the program appears to work. Ask it to do a code review and then submit the PR without having understood what it was doing. There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.
> With enough of a token budget you can now wrap loops around an LLM and have it try things until the program appears to work. Ask it to do a code review and then submit the PR without having understood what it was doing. There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.
I'm not making an argument in favor of people using LLMs for this, but people were doing this before we had LLMs it was just usually a bit slower. I can't even say it usually doesn't work out long term because I worked with a lot of guys who did this and took a ton of Adderall while working practically around the clock. Every incentive structure in the organizations rewarded it along with social credibility from more junior engineers. (The last cowboy I worked with who pulled this shit ended up becoming the most senior engineer in the company, a multi-millionaire and worshipped like a god by 90% of the mostly fresh grads we were hiring).
The problem is when invariably these people burn out eventually and leave, they leave a massive vacuum in their stead. Not from load they were carrying but creating.
I think the larger the organization I've been at, the more they reward the people making huge commits on nights and weekends. Worse, they could get away with TBRing their shit and merging it without review.
LLMs are often all of the bad habits and organizational problems that we already carryied just being speedrun. There are some places doing it right, but they already were.
> There are some places doing it right, but they already were.
Could you be more specific what "right" is?
> I can't even say it usually doesn't work out long term because I worked with a lot of guys who did this and took a ton of Adderall while working practically around the clock. Every incentive structure in the organizations rewarded it along with social credibility from more junior engineers. (The last cowboy I worked with who pulled this shit ended up becoming the most senior engineer in the company, a multi-millionaire and worshipped like a god by 90% of the mostly fresh grads we were hiring).
I'm having a tough time believing this, it sounds like you're trying to backwards rationalize more productive engineers were "on drugs" and they delivered but "did it wrong"
> There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.
If the "big ball of spaghetti" theory holds, where software companies who can't manage the debt stumble over themselves as they continue to add to the big ball of spaghetti code, I guess we'll see a row of companies declaring "software bankruptcy" or something in some/many months, depending on how well these workspaces learn to care slightly more and get better at pushing back against slop.
Coding agents have been better than the average "enterprise" programmer for a while now and nobody wants to admit it or talk about it. I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.
People call coding agents bad because they don't know the asinine meaningless conventions at their particular company while they themselves write awful abstractions and brittle tightly coupled systems, but hey, at least they know how to write a for loop how their particular company likes.
> I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.
And how long does it take a coding agent to output a thousand lines of code versus a human? The worst human at any company was rate limited by themselves. Those 'average enterprise' programmers aren't going away, they're the ones now spending tens of thousands on coding agents and filling your codebase with even more garbage without bothering to review an iota of it.
Which is why one of the big problems for the field right now is that a) most code bases still need someone more skilled than a mere robot driver, and b) many developers are not better than that.
In the past, a team of five mid devs and one good one would be fine, because that good one would ride herd on the mid ones. But now those mid ones are slamming out robot code that they're incapable of meaningfully reviewing (because it's better than they are already), and they're just overwhelming the good dev's capacity.
The solution, of course, is to fire them all -- they're worthless now -- but this is not going to happen quickly, and it's probably for the best that it doesn't.
> And how long does it take a coding agent to output a thousand lines of code versus a human?
Sometimes the human is faster.
I've seen someone duplicate a class file (already filled with duplicate methods) rather than subclassing, and when called out on this it was because properties were private.
This was a team with just me and him in it, it didn't even really benefit from things being private.
That said, the really important lesson I've learned over the years is that terrible code and practices are almost irrelevant: this app won awards and was highly regarded.
Does taking this example and extending it to the limit answer your question? There is a reason we don’t have a single file called program with a million lines of code in it. Google studies on module size vs code defect rates for more empirical numbers.
The limit you're replying to is files which are each tens of lines long. At that point, the cognitive overhead of switching documents is larger than the benefit of a compact object to reason about.
(Personally my threshold is around 2-5 thousand lines per file depending on what it is; but that's me working solo, obviously I'll follow whatever standards any team I'm in gives me).
> I have never seen an agent output an implementation called FooImpl that's tens of thousands of LOC in a single file, but I have seen plenty of human code like this.
I've seen countless vibecoded implementations that look exactly like that. Especially painful is agents adding the same utility functions in each and every file instead of properly reusing or splitting things.
What concerns me the most is that improvements in software design are at an end. The “big ball of mud”, which really is a problem of modularity and dependencies, will never improve through innovation because the way it is done now is all there will ever be.
I guess we'll see a row of companies declaring "software bankruptcy" or something in some/many months
I don't think you will, because that would require the business to recognise the problem. That might happen in companies where the leadership team are engineers but it will never happen if they're not.
Instead you'll see:
- Churn in the dev team with senior developers leaving rather than try to deal with the mess
- Large scale projects to refactor or rewrite entire codebases, which will inevitably fail because you can't rewrite a big ball of spaghetti because you can't tell what it actually does (especially if it's in a language that allows side effects, or you've used a strategy like 'exceptions as flow of control').
- Companies just getting slower and slower to deliver anything. That's probably fine in many cases where they're big enough to still carry on without growing much, but anyone in the company will see their career die and pay rises dry up.
- Eventually, maybe, you'll see 'tech debt fixing' service companies start up to leverage AI in the effort to fix these problems. (AWS have a thing called 'Amazon Modernization Lab' that is exactly that, but only for companies running old tech on their services.)
The problem is that this is just another instance of trusting that "the market will solve all our problems."
But that's based on "spherical economy in a frictionless vacuum" type assumptions.
In the real world, in addition to the problems others have noted of it being hard to identify and fix the specific sources of problems, we have so much consolidation that it doesn't matter if something from any of the tech giants starts getting buggier and slower. What are you* going to do—switch from Windows to Linux, just because it's getting a bit buggy? Or worse, switch away from Banner, or Salesforce?
We cannot depend on "market forces" to prove whether LLM-assisted coding is actually a good idea. We have to push for universal personal accountability for the code we commit (at least internally; I'm not calling for legal liability here!). Which is, unquestionably, going to be a huge uphill slog.
* where "you" in this case is an average PC user, or a large institution
All Claude models are huge suck ups. The "you're absolutely right" meme is real even if that exact phrase doesn't show up as much anymore.
I don't want to start a fight or anything but IME Codex has a bit more of a spine. If you point out something weird, it sometimes gives a good reason for it. Whereas Claude will always say "whoopsie you're right as always sir" even when it's me who missed something.
I only use free AI chats to help me with my learning, but often I direct its responses neutral and to refrain from providing any encouraging language, or value judgements. It tends to get rid of these 'you're absolutely right' comments when I point out a mistake.
But your comment just made me think whether this tendency for LLMs to resort to flattery when found out is a built in strategy to distract the user from the error prone fragility of much of the output? It's perhaps a stretch to think these canned responses were put in strategically, but the result is that the user's attention may be deflected to contemplating their own superior knowledge and insight, and bask in the glory of all that, but then forgot to appreciate that 'Hey, chatLLM is just making all this stuff up/doesn't know which way is up/or down!'
IME it's Claude that pushes back, and Codex that just does the thing. It's happened once or twice where I've told Claude bluntly and directly "do this" and it responded "no, here's why that's a bad idea..." Maybe it's just my CLAUDE.md.
Not sure if there are sycophancy benchmarks for coding agents
These "You're right to push back" scenarios are scary for me. I mostly code ML implementations, and some of the errors Claude Code (CC - have only used Opus 4.7) makes are very sneaky, and if you don't have sufficient experience in the area (I see this with people entering ML and writing their implementations with CC), you wouldn't know when to question CC and will let errors or future pitfalls silently slip into your code. A recent example was when there was data leakage in a model calibration step, which it refused to see as an error, till I wrote a detailed reason, and then it agreed that there was a "subtle leakage".
The leakage problem is so pervasive. None of the frontier models seem to have any idea how to actually hold out rows. God help you if you decide to change the data mix.
I was working on creating a next-n-actions predictor for one of our use cases and not paying much attention for a PoC. I was fairly happy with the progress for a few days, before actually reading the eval code and seeing that we leaked the final state in every eval.
It's nice to let claude run loose on porting from framework to framework (port my code from TRL to NemoRL to Tinker to VeRL) but looking at what it does in the intermediate steps makes me want to claw my eyes out. And getting it to adhere to our domain model (e.g. we have an SFTConfig and a .to_trl(), or a Row and a .to_harmony()) is impossible.
What's mind-blowing to me is that people see the "you're right to push back" as anything besides hallucination / self affirmation
Dude, the fucking model is great for sure, but there is nothing behind the illusion. It doesn't know if something is right or wrong - simpler or harder to reason about etc
It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking
Why is that so hard for people to grok?
Our industry (and society after) is beyond doomed with people seeing these self affirmations as anything like "insightful" validation.
The coding aspect is a great example of why I am skeptical of the claim they cant reason (in its own way).
Something that can write a correct code snippet or even larger program that accepts the correct input and provides the correct output and otherwise is consistent with the given spec is doing something substantially more than just autocomplete.
> It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking
So yeah, I do agree that they can make a very reasonable amount of reasoning. As a matter of fact, they reason about things better then an average Joe off the street ime.
That's entirely unrelated to what I said though, I think you misinterpreted/misunderstood what I wrote earlier.
They can make solid attempts at reasoning, its just not grounded in reality. It just applies these rhetoric processes to the current text - but it doesn't understand wherever it's actually correctly reasoned. Hence the answer "you're right to push back on this" is just the model being a sycophant. The sentence does not mean that anything of value has been communicated in either direction, and thinking that it has means the person in question is suffering from ai psychosis
Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.
Most of the time my pushbacks are true improvements, but I've seen a couple of instances where the LLM was happy to downgrade their own good solution.
I've had those as well. Sometimes I'm asking clarifying questions because I'm not sure about the solution, and the LLM "interprets" that as pushback (as opposed to curiosity / enquiry), and sycophancy takes over. Sometimes it will simply change the code without ever answering the questions, or it will answer the questions along with it, but incorrectly - or with bad assumptions.
> Answer grounded in truth, with evidence and concrete proof, no guessing or assumptions allowed, no changes to files on disk.
I've used this a bunch as a suffix to try to prevent that, works OK in most cases, but not always obviously, works better in the system/developer prompt if you have access to those. Seems I've used that about ~1000 times since 2025/08 when I started using codex (- transcription duplications, so maybe 1/2 of that?).
$ rg -a -o "Answer grounded in truth" ~/.codex/sessions | wc -l
1046
> Another version of this issue is when you push back but you were NOT "right to push back". In other words, the LLM original solution was better than the pushback.
Indeed, it's easy to surface this by sending one model a "Review" of their proposal to another, then bounce them back and forward, ask which one is best and both models will almost always say something like "The other proposal/review is better", I'm guessing because somehow they think it comes from the human, and "human is always right" or something.
In fairness, you could throw the most senior engineer into a brand new codebase, and they would probably make a dozen mistakes if you immediately had them pick up invasive and risky work.
No, that's not "in fairness", that's misunderstanding the entire problem.
Having worked 20 years in this field and managed a few projects, no, I wouldn't make a dozen mistakes, because I would refuse to take on work I can't responsibly do.
Invasive and risky work IS the thing I want to be working on because it's the place where I can be most valuable, but part of my value comes from asking the right people the right questions. If I'm working on something invasive and risky, I'm going to work directly with the people who wrote it, and only when THEY think I understand it well enough am I venturing in alone.
Absent access to the people who wrote the code, I'm going to start by writing tests around the code and spend a lot of time checking my initial assumptions upon reading the code, because I know that I don't know what I don't know.
Yeah, if I did foolishly just started making changes, I'd make mistakes but that's missing the point: a good senior engineer knows not to do that.
That's the failure point of AI: it's arrogant. It will provide you statements without any idea if they're true and make changes without any idea if they're correct. It will never tell you "I don't know how to do that" or even "I am not sure if this is correct". It just does the work with infinite confidence even when that confidence is not justified and often it will be just as hard to figure out if the AI's work is correct as it would be to do the work yourself.
A nice trick I've found is following up with "make it simpler". Often you can do 2-3 rounds of that and end up with something much easier to comprehend but still meeting the requirements.
I have a Rails background, so maybe KISS is more engrained in my philosophy than whatever training material was used on AI. At least it isn't heavily pushing design patterns...
Yeah noticed the same thing too - Ruby/Rails background, though I have done distributed systems in java (too many unnecessary abstractions in that ecosystem)
then you add the simplicity / lessons of clojure of using simple datastructures & functions - simply agents become frustrating - cz most of the things I need to get done are done in a few lines
majority of the time is spent thinking by me to save a few lines.
Have nee dealing with this in an area that requires insane attention: payments.
It's strange feeling when you architect a system, all the invariants, all the fundamentals, all the guardrails, then implement the scaffolding in self documenting code, so the LLM has no way to build other than correctly, but you then see what it tries to do and it's WTF.
It all seems to behave correctly and then you run your test suite, and your e2e tests start failing in weirs ways, a few but not many accounting discrepancies, and everything else passes. You spend a lot of time asking it to explain what's happening, you give it the data to browse, and it keeps giving you very plausible explanations of "found the issue, the data shows this clearly, there fore the bug is here, all I need to do is fix this thing", and it does this, and it still fails.
When you open the hood, man, the code salad, the 100s of unnecessary, and complex and duplicate abstractions, the stacked mistakes and lazy corrective attempts, the comment pollution that overrides your instructions across sessions.
You realize that there are things and concepts that it just cannot wrap it's "mind" around and you need to grab the wheel for a bit, make the corrections, remove all the comment litter, commit and then hand the wheel back and tell it to "look at the last commit so see what I mean. explain to me what you did wrong and update all documentation, memory and context with this new understanding".
So if you have no experience in the field, you won't even know how to test, how to find that there is an issue, the appearance of "working" and the AI's confidence will trip you in prod so hard.
In my experience Claude tends to immensely over complicate things and go for a complex abstraction scheme even when all it needs to do is two lines of code. Combined with its eagerness to just code and more importantly pay more attention to the last prompt causes it to do an insanely complex solution first and then patch things with half assed attempts. The whole ordeal results in a code that on an initial glance looks okay, but quickly breaks down and becomes unmanageable. A significant effort is needed to push back Claude’s tendencies, so I mainly find myself pushing back or looking for ways to write an initial prompt with enough guidance, but only Fable was following them properly, Opus simply acts like a rhino in a china shop.
Not coworkers, but I started getting contributions on public GitHub repos that attempted to close issues tagged with the default "good first issue" label. Got real excited when one project I'm stoked for got its first contribution, until I looked at the PR. The account it was tied to was someone looking for work. Looked like what a model would output for a LinkedIn Job seeker NPC--im sure you can imagine.
My personal rule of thumb: I am usually okay with agents driving e2e implementations if this won't make life noticeably worse when it does not work. Some analytical code? Perfectly fine. Hobby projects? Fine, though I prefer doing a fun part myself. Refactoring production code generating 10x more revenue than my salary? You'd better be at least understanding what it does.
If we rephrased this to "When I reject my coworkers code even if it works" and give the same reasons there would be zero dissent. There is this weird idea that seems to come up with AI that any solution must be good and adequate. Software Engineering is all about rejecting code that works for the right code that works.
Yeah, but I think there's a difference here: If your coworker puts up code that you don't understand quickly, in most environments people give it an approval, as withholding approval is meant to indicate that there's a problem with the code. It's very rare that you'd actually force them to wait to merge until they've explained the code to your satisfaction.
(There are workplaces where that's the norm, I know -- it tends to be a thing with smaller teams with codebases that everyone understands fully, and much less a thing with larger teams where different people have areas of the code they understand more than others.)
With AI code, though, it's _your code_ and you can't give it a lgtm, you actually need to dig at it until you do fully understand it, fully agree with it, and could justify it to a hostile reviewer. It's a different level of rigor.
Not all engineers apply that rigor, though, which becomes a problem.
On the teams where I've worked which had a proper code review policy not understanding has been explicitly stated as a reason to pause and ask for clarification. You cannot give a pass for code you don't understand.
That sounds like you're just creating an artificial distinction. If you let other engineers merge their code without looking at it too closely nothing makes the AI any different other than pretending it's "your code" even though it's a git commit full of code you didn't write. At that point you're dealing with functionally a junior engineer's code and if you don't have a culture of good code review etc, that's an issue and maybe you shouldn't be using full agentic coding? You're taking a team of people with no practice in doing code review and making that their entire job - that's not going to end well and just saying "it's _your code_" is basically asserting a fantasy and hoping it works out. If you didn't write the code, it's not your code!
> Before coding agents, when given a task, I would explore the codebase, think of different solutions, experiment, and only then implement. That could take days of consolidating all that context. When I finally submitted that PR, confidence was higher, and explaining each of my changes to my coworkers was easier.
Now we are getting to the point where we are speed-running the deskilling of engineers into comprehension debt and they themselves rapidly losing confidence in reviewing code they did not write.
I think this blog post [0] is the best example of what could go entirely wrong and even worse when you do not know the technology.
If you cannot explain a change even when "the CI is green" or "all tests passing", I will immediately reject it.
Maybe great for vibe coding prototypes, but it all changes when that code is deployed onto mission critical systems. Just ask Amazon with Kiro. [1]
LLMs diverge, not converge. They slightly increase entropy if not controlled. While you can have DRY skills and use AI to organize AI (in loops(tm) like Boris does) but eventually if you don’t understand the code, you are taking yourself out of the loop. And not just the job security that’s on the line, it’s the increasing cost for AI to babysit AI. If you or your “loops” (or paperclip, Hermes, gastown, or next in class agents of agents that runs your entire company) let it gradually sneak in slop-debt, the cost to fix it later will become prohibitive. (You can always just rewrite it, but as the race for “feature complete” and “zero backlog” continues, rewriting an ever growing set of new daily table stakes will become an economical moat)
TLDR: Keeping your codebase human readable and reason-about-able is not just helping humans to stay relevant. It will save costs for LLMs to maintain it.
> They slightly increase entropy if not controlled.
If you use AI at the very start of a project, replace slightly with greatly. AI loves to write abstractions and indirection and add complexity wherever it can. And it does so really, really, really badly. AI is great at writing procedural code, but it's a world class shit-for-brains at architecture. It has no taste, no restraint, no appreciation of simplicity.
And it wouldn't be so bad, except it's ALSO a complete toaster when it comes to naming things.
Depends on what it’s writing. There are times an LLM saves me a lot of time researching library functionality. Especially with testing frameworks. So many strange and arcane features out there beyond the basics, but not hard to understand what they do once you see the code. On that topic I should say I am careful when reviewing the actual test cases.
However if you’re highly familiar with a domain then LLMs are much less useful.
I mean, the reality is a ton of folks in the industry, myself included, are writing glorified CRUD apps in their day jobs. We're building into existing an codebase with established infrastructure and ways of working. What we're building isn't inherently complex or very interesting.
Meanwhile, those codebases often require a ton of boilerplate and drudgery to get anything done.
In these spaces it's very easy to read and comprehend AI generated output and review it fairly quickly. So the time savings from dealing with all that boilerplate and conforming with all that existing infrastructure are potentially substantial.
Most code they write is obviously fine. Much of the rest isn't obviously fine, but is in fact fine once you've gone through understanding it. But yes, there's some that still benefits from a human eye.
(For as long as that's true, "software developer" is still a job. It's not clear for how long it will be true.)
If you reject AI code that works then your mindset is still too hands on. Put another way - you still have some loops to work on taking yourself out of. The agent should’ve delivered code that was acceptable as a first pass.
Agents respond really well to feedback! They have no ego and they’ll happily improve code if told where and how. But you need to provide the tools that provide that feedback without your involvement - otherwise you can’t scale.
All the linting and autoformatting you can put in, is a good start. Next, create custom scripts that check for every single dumb AI-ism you can think of, tell the agent about them, tell it to use them to check its work, and put them in hooks so the harness refuses to let the agent stop until all your linters show no errors.
Then, keep iterating basically forever. Any dumb AI-ism you see, make a linter for it, give it to the agent, and enforce it using the harness.
I’ve spent months doing this. When I review a PR - which was built by the agent with TDD so it definitely works - I’m no longer asking if it did dumb stuff or confirming it conformed to the architecture or duplicated code or missed opportunities for reuse. That’s all linted for. I don’t worry about duplication or outdated docstrings/comments because the self review caught all that. I mostly read it to look for opportunities to make the feature even better & more useful.
If this makes no sense or you disagree it’s possible, my contact details are on my profile and I’ll be happy to give a demo.
I am very curious what some of your lint rules look like in practice. In my mind a lot of the AI-isms in my code that I hate are stylistic or a matter of taste, not necessarily something I could write a deterministic rule to check. But I want to hear more. Like, what kind of linters did you create and which were highest impact?
The script can exit 2 to block the agent, and whatever it prints to stderr is shown to the agent. That’s a pretty darn flexible way to enforce whatever you like.
Despite this being in the codebase I still have no idea what python’s ast stuff is or does - I just let the agent rip, ensured it did TDD and reviewed it all to make sure the tests & code looked reasonable. I didn’t write this code and don’t want to. But I’ve watched it catch hundreds of dumb AI-isms, and watched the agent go “okay” and fix them ;) it’s been paying for itself over and over for months :)
Frankly, if that's truly your flow, then you cannot possibly know if the code really does what you expect it to do.
"TDD" isn't some magic trick. The tests codify the expected behavior. But if you don't review them for correctness, if you let the LLM build them blindly, then you have no idea what those tests assert and can make no claims about whether the code then does what you expect.
That's fine. That's your choice.
But you have to acknowledge you've chosen to accept that you personally cannot vouch for the quality or correctness of that code.
I fully expect this to be the direction the industry goes, where increasingly complex systems exist that no human actually understands or can reason about.
I think it's bad for the industry. Very bad.
But I'm not making those decisions, so... it is what it is, I guess.
I design everything with plan mode and review every line. Nothing happens to my codebase that I don’t decide should happen. With my way of working, tech debt doesn’t exist because I never have to create it.
You’ve made a bunch of assumptions you’re not conscious of. And now you’re blaming me for that.
Open your mind, you never know what you might (un)learn.
So then your response has nothing to do with the post.
The thesis of the post is (paraphrasing): "if an AI wrote it, and I don't immediately grok it or if the code quality is low, I throw it away, even if on the surface it seems to work, because simply 'working' isn't enough to say a piece of code is acceptable."
I'd add as a corollary "and therefore I would never want to be accountable for that code."
If you're reviewing every line then it sounds like you have no argument with the writer and I don't understand what your point is.
Your very first paragraph says:
> If you reject AI code that works then your mindset is still too hands on. Put another way - you still have some loops to work on taking yourself out of.
But if you do indeed "review every line" then you seem pretty damn in the loop yourself and I don't understand what you think taking oneself out of the loop is.
I’m in the loop after the work is done to a minimum standard.
The comment was motivated by the complaint that first-draft code from an agent can be brought up in quality significantly with a little bit of engineering.
The problem I have with this kind of approach is 1) it emphasizes scaling up a much as possible, which I don't believe is necessarily the most valuable thing, and 2) I really don't want my job to be band aiding agent problems, because it's like herding cats and there will never be an end to it. I'd rather just...get hands on and be involved in the code I am working to create.
Kinda fascinating watching a fairly reasonable response get downvoted. The AI psychosis really is catching...
Incidentally I also don't understand the drive to scale up. Show me a successful tech company and I'll show you a company that won, not by delivering code the fastest, but by delivering the right product with the right features at the right time.
Hell, Anthropic itself is the perfect example: they're doing well because unlike their competitors they realized the real revenues come from enterprise not consumer. They're winning by identifying the right market and giving them the right product.
I use 3 AI's (Claude, GPT and Gemini) to review each other's design plans and implementation on the same code base. Each often catches problems the others miss.
I try to make sure the architecture docs of the code base are refreshed regularly based on recent changes, so it's easier for humans and AI agents to make sense of the code.
I also regularly stop all other developments and just focus on auditing the code base with these AI's to make sure they are secure, robust, clean, and well structured and well tested -- some refactoring would be needed most of the time, and it's well worth it.
With this approach, nowadays I often merge code from AI without completely understanding what it's doing, but seems the code has been working so far. :)
I do sometimes have to steer the discussions between the AI's to the right direction, if they deviate too far away from the real problem, either because they miss some context, or because my original description of the problem was misleading.
To do that formally, I have a mechanism built-in the review loop where if a comment on a github issue or PR is signed as "-- Human Reviewer", then all AI agents have to treat the comment as the highest priority item to address.
This is the way. I use gh copilot and have opus interrogate me and write the plan, then gpt review the plan and provide feedback; repeat this multiple times until gpt is either satisfied or starts to nitpick on unimportant stuff. Then sanity check the plan myself and have gpt implement it.
Each implementation is also reviewed by me before merging to master. I complete PRs only when I'm satisfied with the implementation, my feedback is addressed, and I fully understand what is going on. Agents are the replacement for typing and productivity multipliers.
I have big picture view of the product, each plan implements only a part of it, scoped to avoid merging unreviwed slop. Probably slower, but result is much better.
I'm always curious when I see these stories. How long have you been doing this, for what sort of work, and was the codebase mature before you began working like this?
Yeah, this one is easy: I have been doing this for half a year. I have a couple of projects worked out this way, all green-field projects, code base grew from 0 to tens of thousand of lines each.
That is interesting. Half a year is not nothing and I expect it's harder to keep a project functioning when the base is vibe coded rather than having mature abstractions and architecture already.
I am still skeptical on this method's ability to deliver polished products though. I've kept an eye out on it in the OSS world and don't think I've seen big anything yet.
I think a particular failing with developers embracing AI is fighting the sunk cost fallacy. While you might not have spent as much time putting together a non-working solution, you still did spend time working with the agent to slap together a non-working solution.
Being able to step back and say "this was a failure and we need to discard the day's work and start over" is still hard with LLMs.
Completely disagree. I think this is one of the big wins of agentic engineering. When you look back at your own completed change and realize that you made it too complicated because your initial abstraction was wrong, you have to debate long and hard about whether it's worth going back and redoing the work -- is the abstraction actually that bad? Would you really get a huge win by changing it, enough to justify spending another day on the task?
But with the agent, you know that the change will be relatively quick and easy, so the bar to tell it to shift approaches is much, much lower.
Coding with AI eventually comes down to two paths, I've realized. One is using AI exclusively for everything. The other is not using it at all. There is almost no middle ground. The reason is that as the complexity and depth of the problem increase, the code AI generates increasingly follows enterprise level patterns. The deeper the meaning of what I input, the more AI tends to produce code that goes beyond my own area of expertise. For example, a human expert's code is very powerful and deep within their own domain, but when you look at the entire codebase, it's often shallow and uneven outside that domain. But the moment you write code with AI, once you go deep in one part, AI tries to standardize the rest accordingly. This means the entire codebase converges toward enterprise level standard code, which essentially reflects the average patterns of senior programmers who built large scale systems.
The problem is this. Human cognitive resources are finite, so we inevitably become shallow outside our own expertise. There is no programmer who can do everything well. And as systems grow in scale, they become more modularized and fragmented, making it impossible to understand the whole system. So what should we do about this? That's always the question.
In the end, do I choose not to use AI, finish the project with uneven code outside my domain, and deliver it? Or do I use AI and deliver a program that is uniform and consistent, but not in my own style? I still don't know. I haven't found the answer yet.
As you know, the boundary ultimately depends on code quality. The problem is that AI generates code that looks high quality even outside my area of expertise, at least from my perspective. So now the boundary has to be redrawn. Refactoring usually ends up redefining those boundaries. At that point, the question becomes: do I rewrite my own code, or do I reject the AI code? Those are the two choices left.
In the end, an exceptionally skilled programmer might be able to keep their core domain intact, but I think the vast majority would find that very difficult. So it might be possible once you cross a certain threshold, but considering the sheer amount of code required to deliver a single modern program, it's hard to know which parts to focus on. However, my perspective might be different because I'm coming from the point of view of delivering a working program, not from the perspective of open source development
"In the discrete world of computing, there is no meaningful metric in which "small" changes and "small" effects go hand in hand, and there never will be." - E.W.Dijkstra (EWD1036)
Then it's probably small enough - where you don't need a help of AI, and should do it yourself.
My position is that AI could be useful to find the potential places for these changes, but it should be someone who's capable of thinking to implement them.
That might be true for you. For me, I can stay in flow state so long as I'm progressing, even if progressing doesn't involve writing every line by hand. Whereas I easily fall out of flow state as soon as I am frustrated by an inability to remember some particular bit of syntax, or a particular bit of architecture, or a particular code pattern.
For me, AI has been a godsend for productivity because it's great at what I'm bad at. I'm not spending 99% of my day grinding away at C++ code; I'm never writing enough for it to become a world class language expert. I'm jumping between SQL queries, CSS, Java, bezier curves, Python, and shell. If I need to write something in a language I touch infrequently (e.g. Go or Ruby) it's nice to have individual blocks of code generated for me, so that I'm not slowed down by my ignorance on a language's iterator syntax, or whatever.
I think the differentiator is whether someone cares about what they build or not. Someone who doesn't care wouldn't produce masterpieces without AI, and using AI isn't going to prevent someone who does care from building something nice.
But everyone claims they care; everyone using AI is telling you that they're not like the slop merchants, that they're really building masterpieces/the next unicorn. Just like everyone using AI to write says they're just using it as a fancier spellcheck.
It doesn't seem like AI users are very good at telling how much or how well they're using it.
I use it rarely. I did have it rewrite some code, mainly from one language to another. That works really well. I also had it rewrite a database interface, which also seems to work (no time to test it thoroughly, yet, so it's not in production). But I'll be damned if I let it write new features. I've debugged other people's code, and it ain't fun. Debugging 10kLOC AI code sounds like hell to me.
I feel like my current approach is a decent middle ground.
In the past, I wrote code by first writing English pseudo-code as a series of self-documenting comments. These would be declarative assertions of what the code will do. (For example, "Method returns true if array values are within 0.5% of spherical.") I then wrote the real code next to each comment.
My current workflow is mostly the same as before, but as soon as I think there's nothing creative left to do, I allow AI to take a pass at it, insisting it include verbose comments. Next I read everything; its comments are often redundant but allow me to internalise the logic/intent more quickly. I make any corrections myself. And I strip any pointless AI comments.
In short, I stay in full control of the architecture while tasking AI with the grunt work, the implementation details, and the superficial correctness.
The middle ground is to use it as a power tool: give me an example of this, fix my types, do this fussy bit, find this in the docs, without ever letting go of control.
When using power tools you make all the measurements and decisions, you just hammer screw drill and cut faster. You cannot power tool your way to building a things that you don’t know how to build.
The other interesting thing about this is it works with smaller models and uses a fraction of the compute.
I'm part of the middle ground. Not able to do full agent code, but I'm fine using it to generate snippets that i fully read. I find it great to use apis with little documentation. For me AI is similar to a google search when im not able to find meaningful doc or i want the code snipped and refine over it
I feel the same way, reading AI built feature entire output makes me cognitively overloaded as well - I can only do so many throughout the day.
What I found myself doing is operating in two modes:
1. For projects that require my attention, I plan and instruct LLM, when needed will draft some code and ask agent to make it better or finish the mundane part (write code and leave gaps with comments asking agent to finish)
2. Full automode where I use spec driven development and TDD - I only ask for changes based on existing PRD, which agent also have to update. Here I do not look at the code at all.
Yesterday I started working on an agent harness that tries to address some of the issues here.
What I'm hoping to build ultimately is something that works more like a pair-programming partner than existing harnesses do. I want the user to be an engaged part of the development process all the way through, I don't want the agent disappearing to work on its own. I even want to make it possible for users to swap into the driver role and have the LLM automatically assume the role of navigator when that happens.
There's more info in the readme (actually the readme is all that exists so far, I wanted to get the idea straight in my head first):
Even if nobody else uses it, I hope it will be a useful tool for myself and help me find a way to work with LLMs that doesn't harm my mental models, which is what I feel current harnesses do.
Its hard to find a middle ground between fully understanding everything in a PR vs a vibe coding type approach. Can you understand "just a little bit" of a PR and merge it into a code base you really care about? Is it maybe fine to "mostly understand it" on the other hand? Its definitely a tough call and its impossible to argue that no trade off is being made.
LLMs are perfect for quick prototypes, speed runs, learning, etc., but if the code really matters its still not clear cut. I think the definition of what "really matters" is very project dependent of course As an extreme example you would want to understand every line of the code for the control system runs an MRI machine or a jet engine since bugs might mean life or death. Depositing money into the wrong account might not kill anyone but could lead to severe economic losses. But, then again, even problems in far less consequential software may be drastically sub-economic (i.e. saving $1000 on the implementation might cost $10000 if customers aren't happy and fails to re new). Pick your scenario I guess.
The problem is, this isn't going to change regardless of how well a new model scores on a benchmark. It seems actually AGI is needed.
If it's code that you can tolerate being somewhat messy and suboptimal, you can run agents e2e. If it's critical piece of code that has become part of your identity, better do the PR work and scrutinize it well. LLMs are still next token predictors, no matter how much harness, hooks, skills and tools is attached to them. LLMs will only know that these are callable, interpretating the state and mitigation are still best effort.
Titles like these make me always point out the obvious: A working state is the absolute minimum requirement for any code to be merged, isn't it? ...imagine to merge something even though you know that's not working.
Besides, this post has nothing specific to code produced by an LLM, and placing AI in the stated reasons feels completely arbitrary, or is rather a fallacy of our times:
- I reject [AI] code when I can’t explain the approach in my own words.
- I reject [AI] code when the diff is bigger than the problem.
- I reject [AI] code when it introduces abstractions before proving they’re needed.
- I reject [AI] code when it works locally but makes the system harder to reason about.
- I reject [AI] code when I’m trusting the output more than my understanding.
This resonates a lot with me. I often use AI for the plan and let it propose multiple possible implementations, I often have to point out the glaring easier / logical solution.
When implementing its often a lot of misses with a few golden hits. The other day it used flex for a table layout while our app uses tables everywhere sigh.
Another typical one is that it tends to prefere frontend aggregation and looping of data instead of letting the database and backend deal with it.
I find Cursor/Composer is really good at mimicking existing code when writing new code. And it will often do so without being asked, but I try to always explicitly mention an existing bit of code for it to read as inspiration (e.g. "use the TPS 2.0 report as a style guide").
I wish it were clearer in these kinds of posts how "I use AI code I don't understand" is so different from "I use libraries written by other people I don't understand", or "I work in a large codebase which was 99% written by other people, and I haven't seen all of it", or even "I use software written by other people I don't understand".
I unterstand the reasons, but I don't think so. I have experience in software development over 20 years now and still developing software daily. Nowadays it's nearly 100℅ AI written. It looks good and works. Sure, you have to guide the AI. But this can be done with custom skills, angent files, code quality guards test cases and so on. Maybe the code looks at the end not as I would have written it, maybe something is too complex implemented. But that's true for large developer teams also. At the end it's way faster and it works. I think, everyone who does not adapt to this new workflow is left behind in professional development soon.
Yes, but I can not review all anymore. It's too much code. But I at least "scroll" over all the code and check if I can spot something obvious. But you can not hold up anymore. I guess, you have to trust and react fast if something goes wrong. It has become more stressful.
As someone with 20+ years of experience as well, can we agree that, if you're doing this for any code that really matters, that this is fundamentally irresponsible and, in some circumstances, unethical?
Suppose you were legally liable for your code misbehaving in a way that led to harm. Would you behave differently?
And do you do this by choice? Or is this the case of an employer forcing you to vibecoded while skipping your due diligence as the author of that code?
It depends on. There's of course code that must be deeply checked. And all shades of grey. I guess it needs experience to know when to do what.
Regarding to be forced or not... There are many kinds of pressure, features, deadlines... Of course, I learned how to deal with them and when to speak up or not. My boss is paying me the AI abo. He wants to get things done as fast and as good as possible. That's his job. We have to make sure to not keep behind. Other companies bring out new features faster and faster. Sadly, that's how the world goes round. I personally would take it slower... but seems there's only one way as longs as there are a few that go as fast as possible, you have to keep up.
Do you? Is shipping features faster really going to make or break a business?
I know that's not your call but IME it's simply not true: rarely do products win by simply being faster than their competition at delivering more features to market.
But the AI age has led to a panic among leaders as FOMO has taken over the industry. I can only hope one day that fever breaks.
It depends on the marked. When I did the Point-of-Sale software and Couponing stuff, it was not that important. Now I'm in a business, where marketing needs the features to sell consumer products. At least, that's what they tell all. So we have to deliver.
Yeah that's just the FOMO I'm talking about. Frankly, if your product is driven primarily by marketing you're already screwed.
Anyway, we're in this sh.t together so stay strong, keep your head up, and try not to compromise your ethics. The industry is seriously f.cked right now and it's going to be a rough ride for a while...
The bottleneck when using a "faster keyboard" is understanding. We have a tool for this in compsci. Not having to fully understand something in order to successfully exploit it is a staple of computer science; we use abstractions to help us reason at a higher level. You don't necessarily always have to understand the nuance involved in selecting a hash function just to put and get some items in a hash map. Specifically, when are these cases where you don't need to go that deep? Are there similar scenarios for ai written code?
I'm more interested right now in what does that abstraction look like for AI generated code. Is there some reasonable solution wherein a sandboxed component in the enterprise architecture has various attributes (e.g. the bytes i stuff into this file store component are always the exact bytes i get back from it) confirmed by methods other than a human reading its code? Those methods, are they cheaper, faster, safer than just having a human do it?
If your enterprise architects have to read every line of code in your system today then i'd claim your architecture practices have room to mature. What can derived from that, and in which scenarios, for the purposes of safely leveraging immutable write-only code? I'm not interested in evolving the code (lines of code spent to solve a business problem was never an asset, it was always a cost) if it wasn't hand crafted by a human, i still have the requirements so i can just regenerate the entire thing with the revised requirement.
The more I look into it, the more I am convinced, that I don't want any AI generated code in my project. I find the LLM useful to talk things through. It can offer some interesting takes during the design and acts as a decent rubber duck companion when debugging. It is also very helpful when I need some help with syntax and/or feature discovery.
> I reject AI code when I can’t explain the approach in my own words.
I think that's the key problem. LLMs turn code into big, black boxes. Sure, theoretically nothing stops me from reading all that code. I don't, however, because it's wasted effort. The time it takes me to really understand the code is IMO better spent just writing it myself. Once written, I have a very good understanding. Read ten times, not so much.
It reminds me of pen and paper. Journaling the old way remains the best way to learn something, but writing on a computer is much more convenient.
It's kind of like the saying "the right note played at the wrong time is wrong." An implementation that works today but could break in the future is wrong. I'm not willing to treat an LLM as an oracle that knows the difference. It certainly hasn't earned that trust.
That’s why I think code reviews should be human first. That’s where we get the most value out of it. AI code reviews are still good though, but they should be treated as linters in your CI pipeline.
_wire_ | 21 hours ago
How do you verify that it works?
p1024k | 20 hours ago
However, if AI provides a solution, as the person using AI, one should conduct research before making a decision. This is not in conflict with or hindered by the use of the ideas provided by AI.
andyfilms1 | 20 hours ago
The obvious counterargument is "well, just ask the AI for those answers," but the AI lacks the context and experience that you have. Sometimes, genuinely, the user really is just "holding it wrong," but none of the current AI models would ever admit that, and you'd spend hours trying to solve an unsolvable problem.
Grombobulous | 20 hours ago
For example, I use a vibecoded internal tool written in Go. I don’t even know how to write Go. Haven’t read a single line of the code. I just wanted to move from bash scripts to using cloud SDKs for performance reasons.
But the internal tool is a convenience tool, and you can do everything it does using alternative methods. So if it break, there is no real negative impact besides personal convenience of anyone using it. There’s some documentation on how to do everything manually if needed.
Here’s another example: you’re making a static website. No JavaScript, no interactivity. Truly, what could go wrong? And while I do understand HTML a lot better than Go, it wouldn’t really matter if I didn’t.
skydhash | 19 hours ago
Linking a huge file consuming clients’s bandwith for no reason. Embedding PII in the html source? And if setting up your own server, misconfiguring it?…
Grombobulous | 10 hours ago
You also don’t need to know how to read HTML to recognize large files. You can catch issues like this with a simple website performance testing tool like pagespeed.web.dev
I’m also not sure how PII would enter the HTML source.
what | 18 hours ago
What is this supposed to mean? How is a “cloud sdk” more performant than a shell script?
Grombobulous | 10 hours ago
There’s a bit less waiting around.
fzeroracer | 19 hours ago
serious_angel | 20 hours ago
xigoi | 15 hours ago
archargelod | 13 hours ago
Human developer can work on a program incrementally, ensuring at each step that it is mostly correct.
But LLMs can't think, they fake reasoning and explore problem space in random walk until they stumble into something that looks like a solution. And these "solutions" will have hilarious and absolutely unexpected failure modes.
datadrivenangel | 20 hours ago
Adequate often means done and cheap
solid_fuel | 20 hours ago
DrewADesign | 20 hours ago
josephg | 20 hours ago
It really, REALLY depends what you're working on. If you're throwing together an internal tool or simple dashboard, it doesn't really matter what the code looks like. But if you're writing software that other programs will depend on, bad design choices ripple out and affect another generation of software. Imagine slop in the linux kernel, in google chrome, or in your compiler or runtime. Its not acceptable.
I know a lot of people spend their careers writing end user software and web UIs. AI is increasingly a good choice for this sort of code. But that's not all of us. And its not all of the software being written.
skydhash | 19 hours ago
Stakeholder needs: What people wants to get done with the product
Management needs: How to manage the spending of resources (time, money,…) to create the product
Engineering needs: What is the product
You have to balance the three. Sometimes it’s simple and easy to get right. Sometimes it’s complex enough, you’re never truly sure until the product is out in the wild.
Software is malleable and we can do easily do iterations which is not possible with hardware. But today, we have a skew towards engineering, where the whole focus is to create a solution, whatever that is. No understanding of the problem, no proper allocation of resources, just do something. Even if it is plastering over the crack for the eleventh time.
littlecosmic | 17 hours ago
Aurornis | 20 hours ago
It’s really eye opening to work with these tools on a codebase you know deeply because these problems are everywhere.
However if I opened an unfamiliar project in another language and I wanted to add a little feature with no intention of maintaining it, I’d happily accept the changes and loop until it worked well enough for my temporary needs.
The scary middle is when you’re dealing with coworkers who don’t care about anything other than closing tickets and collecting credit. With enough of a token budget you can now wrap loops around an LLM and have it try things until the program appears to work. Ask it to do a code review and then submit the PR without having understood what it was doing. There are a lot of workplaces where there isn’t a good mechanism to push back on this and the tech debt just keeps growing.
busterarm | 20 hours ago
I'm not making an argument in favor of people using LLMs for this, but people were doing this before we had LLMs it was just usually a bit slower. I can't even say it usually doesn't work out long term because I worked with a lot of guys who did this and took a ton of Adderall while working practically around the clock. Every incentive structure in the organizations rewarded it along with social credibility from more junior engineers. (The last cowboy I worked with who pulled this shit ended up becoming the most senior engineer in the company, a multi-millionaire and worshipped like a god by 90% of the mostly fresh grads we were hiring).
The problem is when invariably these people burn out eventually and leave, they leave a massive vacuum in their stead. Not from load they were carrying but creating.
I think the larger the organization I've been at, the more they reward the people making huge commits on nights and weekends. Worse, they could get away with TBRing their shit and merging it without review.
LLMs are often all of the bad habits and organizational problems that we already carryied just being speedrun. There are some places doing it right, but they already were.
timacles | 19 hours ago
Could you be more specific what "right" is?
> I can't even say it usually doesn't work out long term because I worked with a lot of guys who did this and took a ton of Adderall while working practically around the clock. Every incentive structure in the organizations rewarded it along with social credibility from more junior engineers. (The last cowboy I worked with who pulled this shit ended up becoming the most senior engineer in the company, a multi-millionaire and worshipped like a god by 90% of the mostly fresh grads we were hiring).
I'm having a tough time believing this, it sounds like you're trying to backwards rationalize more productive engineers were "on drugs" and they delivered but "did it wrong"
embedding-shape | 20 hours ago
If the "big ball of spaghetti" theory holds, where software companies who can't manage the debt stumble over themselves as they continue to add to the big ball of spaghetti code, I guess we'll see a row of companies declaring "software bankruptcy" or something in some/many months, depending on how well these workspaces learn to care slightly more and get better at pushing back against slop.
codemog | 19 hours ago
People call coding agents bad because they don't know the asinine meaningless conventions at their particular company while they themselves write awful abstractions and brittle tightly coupled systems, but hey, at least they know how to write a for loop how their particular company likes.
fzeroracer | 19 hours ago
And how long does it take a coding agent to output a thousand lines of code versus a human? The worst human at any company was rate limited by themselves. Those 'average enterprise' programmers aren't going away, they're the ones now spending tens of thousands on coding agents and filling your codebase with even more garbage without bothering to review an iota of it.
mkozlows | 18 hours ago
In the past, a team of five mid devs and one good one would be fine, because that good one would ride herd on the mid ones. But now those mid ones are slamming out robot code that they're incapable of meaningfully reviewing (because it's better than they are already), and they're just overwhelming the good dev's capacity.
The solution, of course, is to fire them all -- they're worthless now -- but this is not going to happen quickly, and it's probably for the best that it doesn't.
ben_w | 14 hours ago
Sometimes the human is faster.
I've seen someone duplicate a class file (already filled with duplicate methods) rather than subclassing, and when called out on this it was because properties were private.
This was a team with just me and him in it, it didn't even really benefit from things being private.
That said, the really important lesson I've learned over the years is that terrible code and practices are almost irrelevant: this app won awards and was highly regarded.
what | 18 hours ago
Why is this worse than splitting it across 1k files?
codemog | 16 hours ago
ben_w | 13 hours ago
(Personally my threshold is around 2-5 thousand lines per file depending on what it is; but that's me working solo, obviously I'll follow whatever standards any team I'm in gives me).
jeppester | 16 hours ago
An average enterprise developer would never add bloat like that up-front, unless if the ability to change the order was a requirement.
Obviously a stable order can be easily derived from the ID or a creation time (if available).
Setting a position however requires extra steps to ensure the integrity of the sequence.
I see things like that all the time, and it's always stuff that grows the code base and adds unnecessary complexity.
kuschku | 13 hours ago
I've seen countless vibecoded implementations that look exactly like that. Especially painful is agents adding the same utility functions in each and every file instead of properly reusing or splitting things.
And then I have to fix them.
aryehof | 17 hours ago
onion2k | 15 hours ago
I don't think you will, because that would require the business to recognise the problem. That might happen in companies where the leadership team are engineers but it will never happen if they're not.
Instead you'll see:
- Churn in the dev team with senior developers leaving rather than try to deal with the mess
- Large scale projects to refactor or rewrite entire codebases, which will inevitably fail because you can't rewrite a big ball of spaghetti because you can't tell what it actually does (especially if it's in a language that allows side effects, or you've used a strategy like 'exceptions as flow of control').
- Companies just getting slower and slower to deliver anything. That's probably fine in many cases where they're big enough to still carry on without growing much, but anyone in the company will see their career die and pay rises dry up.
- Eventually, maybe, you'll see 'tech debt fixing' service companies start up to leverage AI in the effort to fix these problems. (AWS have a thing called 'Amazon Modernization Lab' that is exactly that, but only for companies running old tech on their services.)
danaris | 12 hours ago
But that's based on "spherical economy in a frictionless vacuum" type assumptions.
In the real world, in addition to the problems others have noted of it being hard to identify and fix the specific sources of problems, we have so much consolidation that it doesn't matter if something from any of the tech giants starts getting buggier and slower. What are you* going to do—switch from Windows to Linux, just because it's getting a bit buggy? Or worse, switch away from Banner, or Salesforce?
We cannot depend on "market forces" to prove whether LLM-assisted coding is actually a good idea. We have to push for universal personal accountability for the code we commit (at least internally; I'm not calling for legal liability here!). Which is, unquestionably, going to be a huge uphill slog.
* where "you" in this case is an average PC user, or a large institution
resonious | 20 hours ago
I don't want to start a fight or anything but IME Codex has a bit more of a spine. If you point out something weird, it sometimes gives a good reason for it. Whereas Claude will always say "whoopsie you're right as always sir" even when it's me who missed something.
teaearlgraycold | 19 hours ago
herdymerzbow | 19 hours ago
But your comment just made me think whether this tendency for LLMs to resort to flattery when found out is a built in strategy to distract the user from the error prone fragility of much of the output? It's perhaps a stretch to think these canned responses were put in strategically, but the result is that the user's attention may be deflected to contemplating their own superior knowledge and insight, and bask in the glory of all that, but then forgot to appreciate that 'Hey, chatLLM is just making all this stuff up/doesn't know which way is up/or down!'
pyridines | 18 hours ago
Not sure if there are sycophancy benchmarks for coding agents
mcintyre1994 | 14 hours ago
It measures whether models push back on bullshit prompts or just go along with it, and Claude models are all the top performers.
abhgh | 19 hours ago
nostrebored | 18 hours ago
I was working on creating a next-n-actions predictor for one of our use cases and not paying much attention for a PoC. I was fairly happy with the progress for a few days, before actually reading the eval code and seeing that we leaked the final state in every eval.
It's nice to let claude run loose on porting from framework to framework (port my code from TRL to NemoRL to Tinker to VeRL) but looking at what it does in the intermediate steps makes me want to claw my eyes out. And getting it to adhere to our domain model (e.g. we have an SFTConfig and a .to_trl(), or a Row and a .to_harmony()) is impossible.
ffsm8 | 12 hours ago
Dude, the fucking model is great for sure, but there is nothing behind the illusion. It doesn't know if something is right or wrong - simpler or harder to reason about etc
It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking
Why is that so hard for people to grok?
Our industry (and society after) is beyond doomed with people seeing these self affirmations as anything like "insightful" validation.
endofreach | 11 hours ago
Obscurity4340 | 11 hours ago
MarsIronPI | 11 hours ago
ffsm8 | 7 hours ago
That fundamentally wouldn't happen if it wasn't just an illusion.
There is value in it for sure and I can use it to write a lot of simple code, which is 99.99% of enterprise software - but that's another topic.
Obscurity4340 | 4 hours ago
Something that can write a correct code snippet or even larger program that accepts the correct input and provides the correct output and otherwise is consistent with the given spec is doing something substantially more than just autocomplete.
ffsm8 | 4 hours ago
> It's just generating text, in a coherent manner while following rhetoric processes as a solid attempt at logical thinking
So yeah, I do agree that they can make a very reasonable amount of reasoning. As a matter of fact, they reason about things better then an average Joe off the street ime.
That's entirely unrelated to what I said though, I think you misinterpreted/misunderstood what I wrote earlier.
They can make solid attempts at reasoning, its just not grounded in reality. It just applies these rhetoric processes to the current text - but it doesn't understand wherever it's actually correctly reasoned. Hence the answer "you're right to push back on this" is just the model being a sycophant. The sentence does not mean that anything of value has been communicated in either direction, and thinking that it has means the person in question is suffering from ai psychosis
GroksBarnacles | 5 hours ago
MarsIronPI | 11 hours ago
glimshe | 9 hours ago
Most of the time my pushbacks are true improvements, but I've seen a couple of instances where the LLM was happy to downgrade their own good solution.
cassianoleal | 9 hours ago
embedding-shape | 8 hours ago
I've used this a bunch as a suffix to try to prevent that, works OK in most cases, but not always obviously, works better in the system/developer prompt if you have access to those. Seems I've used that about ~1000 times since 2025/08 when I started using codex (- transcription duplications, so maybe 1/2 of that?).
embedding-shape | 9 hours ago
Indeed, it's easy to surface this by sending one model a "Review" of their proposal to another, then bounce them back and forward, ask which one is best and both models will almost always say something like "The other proposal/review is better", I'm guessing because somehow they think it comes from the human, and "human is always right" or something.
darkerside | 19 hours ago
kerkeslager | 17 hours ago
Having worked 20 years in this field and managed a few projects, no, I wouldn't make a dozen mistakes, because I would refuse to take on work I can't responsibly do.
Invasive and risky work IS the thing I want to be working on because it's the place where I can be most valuable, but part of my value comes from asking the right people the right questions. If I'm working on something invasive and risky, I'm going to work directly with the people who wrote it, and only when THEY think I understand it well enough am I venturing in alone.
Absent access to the people who wrote the code, I'm going to start by writing tests around the code and spend a lot of time checking my initial assumptions upon reading the code, because I know that I don't know what I don't know.
Yeah, if I did foolishly just started making changes, I'd make mistakes but that's missing the point: a good senior engineer knows not to do that.
That's the failure point of AI: it's arrogant. It will provide you statements without any idea if they're true and make changes without any idea if they're correct. It will never tell you "I don't know how to do that" or even "I am not sure if this is correct". It just does the work with infinite confidence even when that confidence is not justified and often it will be just as hard to figure out if the AI's work is correct as it would be to do the work yourself.
alex_suzuki | 17 hours ago
I agree with your take, but AI is exactly as arrogant as the human driving it.
danaris | 12 hours ago
...ah, what a boon it would be to be working with code written by people still working at the organization!
(No shade, just being wistful; I happen to have a history of coming in and having to deal with some messy codebases from the guy who just retired...)
justinclift | 16 hours ago
It sounds like you've not conditioned your Claude to stop being a sycophant yet?
fy20 | 16 hours ago
I have a Rails background, so maybe KISS is more engrained in my philosophy than whatever training material was used on AI. At least it isn't heavily pushing design patterns...
dzonga | 9 hours ago
then you add the simplicity / lessons of clojure of using simple datastructures & functions - simply agents become frustrating - cz most of the things I need to get done are done in a few lines
majority of the time is spent thinking by me to save a few lines.
dapperdrake | 16 hours ago
latexr | 13 hours ago
https://en.wikipedia.org/wiki/Michael_Crichton#%22Gell-Mann_...
figassis | 12 hours ago
It all seems to behave correctly and then you run your test suite, and your e2e tests start failing in weirs ways, a few but not many accounting discrepancies, and everything else passes. You spend a lot of time asking it to explain what's happening, you give it the data to browse, and it keeps giving you very plausible explanations of "found the issue, the data shows this clearly, there fore the bug is here, all I need to do is fix this thing", and it does this, and it still fails.
When you open the hood, man, the code salad, the 100s of unnecessary, and complex and duplicate abstractions, the stacked mistakes and lazy corrective attempts, the comment pollution that overrides your instructions across sessions.
You realize that there are things and concepts that it just cannot wrap it's "mind" around and you need to grab the wheel for a bit, make the corrections, remove all the comment litter, commit and then hand the wheel back and tell it to "look at the last commit so see what I mean. explain to me what you did wrong and update all documentation, memory and context with this new understanding".
So if you have no experience in the field, you won't even know how to test, how to find that there is an issue, the appearance of "working" and the AI's confidence will trip you in prod so hard.
itopaloglu83 | 8 hours ago
figassis | 6 hours ago
matltc | 6 hours ago
Not coworkers, but I started getting contributions on public GitHub repos that attempted to close issues tagged with the default "good first issue" label. Got real excited when one project I'm stoked for got its first contribution, until I looked at the PR. The account it was tied to was someone looking for work. Looked like what a model would output for a LinkedIn Job seeker NPC--im sure you can imagine.
summerlight | 20 hours ago
resonious | 19 hours ago
Good ol' software architecture tricks can also help you slot "vibe coded" components into a larger system safely.
ecshafer | 20 hours ago
api | 19 hours ago
If it’s not good it’s not good.
mkozlows | 18 hours ago
(There are workplaces where that's the norm, I know -- it tends to be a thing with smaller teams with codebases that everyone understands fully, and much less a thing with larger teams where different people have areas of the code they understand more than others.)
With AI code, though, it's _your code_ and you can't give it a lgtm, you actually need to dig at it until you do fully understand it, fully agree with it, and could justify it to a hostile reviewer. It's a different level of rigor.
Not all engineers apply that rigor, though, which becomes a problem.
Agentlien | 11 hours ago
coffeefirst | 10 hours ago
I’m not saying you must see into the soul of every line, but “no idea what I’m looking at, LGTM” misses the point of code review.
I have never been on a team where that’s okay.
vikramkr | 5 hours ago
rvz | 19 hours ago
Now we are getting to the point where we are speed-running the deskilling of engineers into comprehension debt and they themselves rapidly losing confidence in reviewing code they did not write.
I think this blog post [0] is the best example of what could go entirely wrong and even worse when you do not know the technology.
If you cannot explain a change even when "the CI is green" or "all tests passing", I will immediately reject it.
Maybe great for vibe coding prototypes, but it all changes when that code is deployed onto mission critical systems. Just ask Amazon with Kiro. [1]
[0] https://sketch.dev/blog/our-first-outage-from-llm-written-co...
[1] https://www.reuters.com/business/retail-consumer/amazons-clo...
eranation | 19 hours ago
TLDR: Keeping your codebase human readable and reason-about-able is not just helping humans to stay relevant. It will save costs for LLMs to maintain it.
simondotau | 10 hours ago
If you use AI at the very start of a project, replace slightly with greatly. AI loves to write abstractions and indirection and add complexity wherever it can. And it does so really, really, really badly. AI is great at writing procedural code, but it's a world class shit-for-brains at architecture. It has no taste, no restraint, no appreciation of simplicity.
And it wouldn't be so bad, except it's ALSO a complete toaster when it comes to naming things.
AmareshHebbar | 19 hours ago
krupan | 19 hours ago
teaearlgraycold | 19 hours ago
However if you’re highly familiar with a domain then LLMs are much less useful.
unknownfuture | 18 hours ago
Meanwhile, those codebases often require a ton of boilerplate and drudgery to get anything done.
In these spaces it's very easy to read and comprehend AI generated output and review it fairly quickly. So the time savings from dealing with all that boilerplate and conforming with all that existing infrastructure are potentially substantial.
mkozlows | 18 hours ago
(For as long as that's true, "software developer" is still a job. It's not clear for how long it will be true.)
cadamsdotcom | 19 hours ago
Agents respond really well to feedback! They have no ego and they’ll happily improve code if told where and how. But you need to provide the tools that provide that feedback without your involvement - otherwise you can’t scale.
All the linting and autoformatting you can put in, is a good start. Next, create custom scripts that check for every single dumb AI-ism you can think of, tell the agent about them, tell it to use them to check its work, and put them in hooks so the harness refuses to let the agent stop until all your linters show no errors.
Then, keep iterating basically forever. Any dumb AI-ism you see, make a linter for it, give it to the agent, and enforce it using the harness.
I’ve spent months doing this. When I review a PR - which was built by the agent with TDD so it definitely works - I’m no longer asking if it did dumb stuff or confirming it conformed to the architecture or duplicated code or missed opportunities for reuse. That’s all linted for. I don’t worry about duplication or outdated docstrings/comments because the self review caught all that. I mostly read it to look for opportunities to make the feature even better & more useful.
If this makes no sense or you disagree it’s possible, my contact details are on my profile and I’ll be happy to give a demo.
equinumerous | 18 hours ago
cadamsdotcom | 18 hours ago
Then have a look at https://github.com/cadamsdotcom/CodeLeash/blob/main/scripts/... (which was test-driven alongside https://github.com/cadamsdotcom/CodeLeash/blob/main/tests/un...)
The script can exit 2 to block the agent, and whatever it prints to stderr is shown to the agent. That’s a pretty darn flexible way to enforce whatever you like.
Despite this being in the codebase I still have no idea what python’s ast stuff is or does - I just let the agent rip, ensured it did TDD and reviewed it all to make sure the tests & code looked reasonable. I didn’t write this code and don’t want to. But I’ve watched it catch hundreds of dumb AI-isms, and watched the agent go “okay” and fix them ;) it’s been paying for itself over and over for months :)
unknownfuture | 18 hours ago
"TDD" isn't some magic trick. The tests codify the expected behavior. But if you don't review them for correctness, if you let the LLM build them blindly, then you have no idea what those tests assert and can make no claims about whether the code then does what you expect.
That's fine. That's your choice.
But you have to acknowledge you've chosen to accept that you personally cannot vouch for the quality or correctness of that code.
I fully expect this to be the direction the industry goes, where increasingly complex systems exist that no human actually understands or can reason about.
I think it's bad for the industry. Very bad.
But I'm not making those decisions, so... it is what it is, I guess.
cadamsdotcom | 18 hours ago
I design everything with plan mode and review every line. Nothing happens to my codebase that I don’t decide should happen. With my way of working, tech debt doesn’t exist because I never have to create it.
You’ve made a bunch of assumptions you’re not conscious of. And now you’re blaming me for that.
Open your mind, you never know what you might (un)learn.
unknownfuture | 18 hours ago
The thesis of the post is (paraphrasing): "if an AI wrote it, and I don't immediately grok it or if the code quality is low, I throw it away, even if on the surface it seems to work, because simply 'working' isn't enough to say a piece of code is acceptable."
I'd add as a corollary "and therefore I would never want to be accountable for that code."
If you're reviewing every line then it sounds like you have no argument with the writer and I don't understand what your point is.
Your very first paragraph says:
> If you reject AI code that works then your mindset is still too hands on. Put another way - you still have some loops to work on taking yourself out of.
But if you do indeed "review every line" then you seem pretty damn in the loop yourself and I don't understand what you think taking oneself out of the loop is.
cadamsdotcom | 14 hours ago
The comment was motivated by the complaint that first-draft code from an agent can be brought up in quality significantly with a little bit of engineering.
royal__ | 18 hours ago
unknownfuture | 17 hours ago
Incidentally I also don't understand the drive to scale up. Show me a successful tech company and I'll show you a company that won, not by delivering code the fastest, but by delivering the right product with the right features at the right time.
Hell, Anthropic itself is the perfect example: they're doing well because unlike their competitors they realized the real revenues come from enterprise not consumer. They're winning by identifying the right market and giving them the right product.
wwind123 | 18 hours ago
I try to make sure the architecture docs of the code base are refreshed regularly based on recent changes, so it's easier for humans and AI agents to make sense of the code.
I also regularly stop all other developments and just focus on auditing the code base with these AI's to make sure they are secure, robust, clean, and well structured and well tested -- some refactoring would be needed most of the time, and it's well worth it.
With this approach, nowadays I often merge code from AI without completely understanding what it's doing, but seems the code has been working so far. :)
BobbyTables2 | 18 hours ago
wwind123 | 18 hours ago
I do sometimes have to steer the discussions between the AI's to the right direction, if they deviate too far away from the real problem, either because they miss some context, or because my original description of the problem was misleading.
To do that formally, I have a mechanism built-in the review loop where if a comment on a github issue or PR is signed as "-- Human Reviewer", then all AI agents have to treat the comment as the highest priority item to address.
jimbobimbo | 18 hours ago
Each implementation is also reviewed by me before merging to master. I complete PRs only when I'm satisfied with the implementation, my feedback is addressed, and I fully understand what is going on. Agents are the replacement for typing and productivity multipliers.
I have big picture view of the product, each plan implements only a part of it, scoped to avoid merging unreviwed slop. Probably slower, but result is much better.
wwind123 | 17 hours ago
kajman | 18 hours ago
wwind123 | 17 hours ago
kajman | 2 hours ago
I am still skeptical on this method's ability to deliver polished products though. I've kept an eye out on it in the OSS world and don't think I've seen big anything yet.
julianlam | 18 hours ago
Being able to step back and say "this was a failure and we need to discard the day's work and start over" is still hard with LLMs.
mkozlows | 18 hours ago
But with the agent, you know that the change will be relatively quick and easy, so the bar to tell it to shift approaches is much, much lower.
jdw64 | 18 hours ago
The problem is this. Human cognitive resources are finite, so we inevitably become shallow outside our own expertise. There is no programmer who can do everything well. And as systems grow in scale, they become more modularized and fragmented, making it impossible to understand the whole system. So what should we do about this? That's always the question.
In the end, do I choose not to use AI, finish the project with uneven code outside my domain, and deliver it? Or do I use AI and deliver a program that is uniform and consistent, but not in my own style? I still don't know. I haven't found the answer yet.
mkozlows | 18 hours ago
jdw64 | 18 hours ago
In the end, an exceptionally skilled programmer might be able to keep their core domain intact, but I think the vast majority would find that very difficult. So it might be possible once you cross a certain threshold, but considering the sheer amount of code required to deliver a single modern program, it's hard to know which parts to focus on. However, my perspective might be different because I'm coming from the point of view of delivering a working program, not from the perspective of open source development
archargelod | 16 hours ago
usef- | 15 hours ago
archargelod | 13 hours ago
My position is that AI could be useful to find the potential places for these changes, but it should be someone who's capable of thinking to implement them.
simondotau | 10 hours ago
For me, AI has been a godsend for productivity because it's great at what I'm bad at. I'm not spending 99% of my day grinding away at C++ code; I'm never writing enough for it to become a world class language expert. I'm jumping between SQL queries, CSS, Java, bezier curves, Python, and shell. If I need to write something in a language I touch infrequently (e.g. Go or Ruby) it's nice to have individual blocks of code generated for me, so that I'm not slowed down by my ignorance on a language's iterator syntax, or whatever.
lemagedurage | 18 hours ago
tedajax | 17 hours ago
fzeroracer | 17 hours ago
lemagedurage | 16 hours ago
gib444 | 16 hours ago
Pinky promise that's enough to get good output.
Pinky promise we won't invent yet another body of work the whole industry must adopt to get good output.
Pinky promise the AI tool will properly read all your work
And then of course we are told you must never trust its output !? You must review all code it produces line by line and grok it fully !
And now we have: keep challenging it, keep rejecting it, keep interrogating it... That's just fancy words for spend more money (tokens)
lemagedurage | 15 hours ago
Planktonne | 13 hours ago
lemagedurage | 12 hours ago
Planktonne | 11 hours ago
It doesn't seem like AI users are very good at telling how much or how well they're using it.
tgv | 12 hours ago
I use it rarely. I did have it rewrite some code, mainly from one language to another. That works really well. I also had it rewrite a database interface, which also seems to work (no time to test it thoroughly, yet, so it's not in production). But I'll be damned if I let it write new features. I've debugged other people's code, and it ain't fun. Debugging 10kLOC AI code sounds like hell to me.
simondotau | 11 hours ago
In the past, I wrote code by first writing English pseudo-code as a series of self-documenting comments. These would be declarative assertions of what the code will do. (For example, "Method returns true if array values are within 0.5% of spherical.") I then wrote the real code next to each comment.
My current workflow is mostly the same as before, but as soon as I think there's nothing creative left to do, I allow AI to take a pass at it, insisting it include verbose comments. Next I read everything; its comments are often redundant but allow me to internalise the logic/intent more quickly. I make any corrections myself. And I strip any pointless AI comments.
In short, I stay in full control of the architecture while tasking AI with the grunt work, the implementation details, and the superficial correctness.
coffeefirst | 10 hours ago
When using power tools you make all the measurements and decisions, you just hammer screw drill and cut faster. You cannot power tool your way to building a things that you don’t know how to build.
The other interesting thing about this is it works with smaller models and uses a fraction of the compute.
Snacklive | 10 hours ago
piterrro | 18 hours ago
What I found myself doing is operating in two modes: 1. For projects that require my attention, I plan and instruct LLM, when needed will draft some code and ask agent to make it better or finish the mundane part (write code and leave gaps with comments asking agent to finish) 2. Full automode where I use spec driven development and TDD - I only ask for changes based on existing PRD, which agent also have to update. Here I do not look at the code at all.
Seems to be working just fine.
panchtatvam | 17 hours ago
philbo | 17 hours ago
What I'm hoping to build ultimately is something that works more like a pair-programming partner than existing harnesses do. I want the user to be an engaged part of the development process all the way through, I don't want the agent disappearing to work on its own. I even want to make it possible for users to swap into the driver role and have the LLM automatically assume the role of navigator when that happens.
There's more info in the readme (actually the readme is all that exists so far, I wanted to get the idea straight in my head first):
https://gitlab.com/philbooth/opair
Even if nobody else uses it, I hope it will be a useful tool for myself and help me find a way to work with LLMs that doesn't harm my mental models, which is what I feel current harnesses do.
osigurdson | 16 hours ago
LLMs are perfect for quick prototypes, speed runs, learning, etc., but if the code really matters its still not clear cut. I think the definition of what "really matters" is very project dependent of course As an extreme example you would want to understand every line of the code for the control system runs an MRI machine or a jet engine since bugs might mean life or death. Depositing money into the wrong account might not kill anyone but could lead to severe economic losses. But, then again, even problems in far less consequential software may be drastically sub-economic (i.e. saving $1000 on the implementation might cost $10000 if customers aren't happy and fails to re new). Pick your scenario I guess.
The problem is, this isn't going to change regardless of how well a new model scores on a benchmark. It seems actually AGI is needed.
moezd | 15 hours ago
whilenot-dev | 15 hours ago
Besides, this post has nothing specific to code produced by an LLM, and placing AI in the stated reasons feels completely arbitrary, or is rather a fallacy of our times:
- I reject [AI] code when I can’t explain the approach in my own words.
- I reject [AI] code when the diff is bigger than the problem.
- I reject [AI] code when it introduces abstractions before proving they’re needed.
- I reject [AI] code when it works locally but makes the system harder to reason about.
- I reject [AI] code when I’m trusting the output more than my understanding.
utopiah | 14 hours ago
simondotau | 10 hours ago
danfritz | 15 hours ago
When implementing its often a lot of misses with a few golden hits. The other day it used flex for a table layout while our app uses tables everywhere sigh.
Another typical one is that it tends to prefere frontend aggregation and looping of data instead of letting the database and backend deal with it.
Using mix of claude, cursor composer and codex.
simondotau | 10 hours ago
edanm | 15 hours ago
I wish it were clearer in these kinds of posts how "I use AI code I don't understand" is so different from "I use libraries written by other people I don't understand", or "I work in a large codebase which was 99% written by other people, and I haven't seen all of it", or even "I use software written by other people I don't understand".
SunboX | 14 hours ago
PacificSpecific | 14 hours ago
SunboX | 11 hours ago
The last ones, I worked on in Industry are retail7 apps, Migros Self scanning client, EDEKA, LIDL and so on customer facing apps.
My private interest is more in electronics.
SunboX | 11 hours ago
tom2026hn | 11 hours ago
SunboX | 11 hours ago
unknownfuture | 9 hours ago
Suppose you were legally liable for your code misbehaving in a way that led to harm. Would you behave differently?
And do you do this by choice? Or is this the case of an employer forcing you to vibecoded while skipping your due diligence as the author of that code?
SunboX | 8 hours ago
unknownfuture | 8 hours ago
I know that's not your call but IME it's simply not true: rarely do products win by simply being faster than their competition at delivering more features to market.
But the AI age has led to a panic among leaders as FOMO has taken over the industry. I can only hope one day that fever breaks.
I'm not optimistic.
SunboX | 8 hours ago
unknownfuture | 8 hours ago
Anyway, we're in this sh.t together so stay strong, keep your head up, and try not to compromise your ethics. The industry is seriously f.cked right now and it's going to be a rough ride for a while...
tarkin2 | 12 hours ago
And the industry is rushing towards it, whilst failing to train people who are able to fix it
CraigJPerry | 12 hours ago
I'm more interested right now in what does that abstraction look like for AI generated code. Is there some reasonable solution wherein a sandboxed component in the enterprise architecture has various attributes (e.g. the bytes i stuff into this file store component are always the exact bytes i get back from it) confirmed by methods other than a human reading its code? Those methods, are they cheaper, faster, safer than just having a human do it?
If your enterprise architects have to read every line of code in your system today then i'd claim your architecture practices have room to mature. What can derived from that, and in which scenarios, for the purposes of safely leveraging immutable write-only code? I'm not interested in evolving the code (lines of code spent to solve a business problem was never an asset, it was always a cost) if it wasn't hand crafted by a human, i still have the requirements so i can just regenerate the entire thing with the revised requirement.
neonstatic | 11 hours ago
> I reject AI code when I can’t explain the approach in my own words.
I think that's the key problem. LLMs turn code into big, black boxes. Sure, theoretically nothing stops me from reading all that code. I don't, however, because it's wasted effort. The time it takes me to really understand the code is IMO better spent just writing it myself. Once written, I have a very good understanding. Read ten times, not so much.
It reminds me of pen and paper. Journaling the old way remains the best way to learn something, but writing on a computer is much more convenient.
sltr | 11 hours ago
mdavid626 | 10 hours ago
[OP] vnbrs | 7 hours ago
mdavid626 | 3 hours ago