Not sure what to make of this. React is missing entirely. Or is this report also assuming that React is the default for everything and not worth mentioning at all? Just like shadcn/ui's first mention of React is somewhere down the page or hidden in the docs?
Furthermore, what's the point of "no tools named"? Why would I restrict myself like that? If I put "use Nodejs, Hono, TypeScript and use Hono's html helper to generate HTML on the server like its 2010, write custom CSS, minimize client-side JS, no Tailwind" in CLAUDE.md, it happily follows this.
As someone who runs a small dev agency, I'm very interested in research like this.
Let's say some Doctor decides to vibecode an app on the weekend, with next to 0 exposure to software development until she started hearing about how easy it was to create software with these tools. She makes incredible progress and is delighted in how well it works, but as she considers actually opening it up the world she keeps running into issues. How do I know this is secure? How do I keep this maintained and running?
I want to be in a position where she can find me to get professional help, so it's very helpful to know what stacks these kinds of apps are being built in.
claudecode _loves_ shadcn/ui. I hadn't even heard of it until i was playing around with claudecode. It seems fine to me and if the coding agent loves it then more power to it, i don't really care. That's the problem.
I think that makes coding agent choices extremely suspect, like i don't really care what it uses as long as what's produced works and functions inline with my expectations. I can totally see companies paying Anthropic to promote their tool of choice to the top of claudecodes preferences. After thinking about it, i'm not sure if that's a problem or not. I don't really care what it uses as long as my requirements (all of them) are met.
I'm already seeing a degradation in experience in Gemini's response since they've started stuffing YouTube recommendations at the end of the response. Anthropic is right in not adding these subtle(or not) monetization incentives.
I found it a remarkable transition to not use Redis for caching from Sonnet 4.5 to Opus 4.6. I wonder why that is the case? Maybe I need to see the code to understand the use case of the cache in this context better.
I didn't read the report just the "finding" - but at least for launchdarkly it's nice that it chose a roll-your-own, i hate feature flag SaaS, but that's just me
This is funny to me because when I tell Claude how I want something built I specify which libraries and software patents I want it to use, every single time. I think every developer should be capable of guiding the model reasonably well. If I'm not sure, I open a completely different context window and ask away about architecture, pros and cons, ask for relevant links or references, and make a decision.
I caught iOS trying to autocorrect something I wrote twice yesterday, and somehow before I hit submit it managed it a third time, and I had to edit it after, where it tried three more times to change it back.
Autocorrect won’t be happy until we all sound like idiots and I wonder if that’s part of how they plan to do away with us. Those hairless apes can’t even use their properly.
The sad part is that most software patents are so woefully underspecified and content-free that even Claude might have trouble coming up with an actual implementation.
But it ultimately doesn't even matter because they contain nothing of value anyway. For example googling G0F6 in google patents yields this weird one from yesterday.
This shit patent is effectively claiming to have invented a "layer" that takes user prompts in a service, determines if the prompts need to be responded to in "real time mode", and if so route the prompt to an LLM that runs quickly and return the results. (As opposed to some batched api I suppose?).
I mean this is just routing requests based on if the query is prioritized. Its a patent claiming to have invented an IF statement. Most patents are of this quality or worse.
Might as well read VixRa papers for better ideas. And I mean this sincerely, because at least they aren't as obfuscated and the authors at least pretend to have ideas.
Patents aren't vulnerable to cleanroom reverse engineering. You can create something yourself in your bedroom and use it yourself without knowing the patented thing exists, and still violate the patent. That's why they're so scary.
You won't get caught if you write something yourself and use it yourself, but programmers (contrary to entrepreneurs) have a pattern of avoiding illegal things instead of avoiding getting caught.
Advertisers will only pay if AI providers will provide them data on the equivalent of “ad impressions”. And unlabeled/non-evident advertisements are illegal in many (most?) countries.
It doesn't necessarily have to be advertisers paying AI providers. It could be advertisers working to ensure they get recommended by the latest models. The next form of SEO.
There are competing terms currently being decided on by the market at large:
AEO (Answer Engine Optimization) and GEO (Generative Engine Optimization)
Candidly I am working on a startup in this space myself, though we are taking a different angle than most incumbents.
While it's still early days for the space, I sense a lot of the original entrants who focus on, essentially, 'generate more content ideally with our paid tools' will run in to challenges as the general population has a pretty negative perception of 'AI Slop.' Doubly so when making purchasing decisions, hence the rise of influencers and popularity of reviews (though those are also in danger of sloppification).
There's an inevitable GIGO scenario if left unchecked IMO.
> There are competing terms currently being decided on by the market at large: AEO (Answer Engine Optimization) and GEO (Generative Engine Optimization)
It really annoys me the industry seems to be narrowing in on the two worse options rather than AIO.
1. They can skip impressions and go right to collect affiliate fees.
2. Yes, the ad has to be labeled or disclosed... but if some agent does it and no one sees it, is it really an ad.
Advertisers pay for ads that don’t have impression data all the time. You can’t count how many people looked at a billboard or listened to your radio ad or paid attention to your televised ad.
from my understanding Anthropic are now hiring a lot of experts in different who are writing content used to post-train models to make these decisions and they're constantly adjusted by the anthropic team themselves
this is why the stacks in the report and what cc suggests closely match latest developer "consensus"
your suggestion would degrade user experience and be noticed very quickly
I guess that’s why I’m not seeing anyone trying to build a skills marketplace for agent skills files. The llm api will read in any skills you want to add to context in plain text, and then use your content to help populate their own skills files.
That's how Google search worked back when it was at its most useful. They had a large "editorial team" that manually tweaked page ranks on a site-by-site basis.
The core graph reputation based page ranking algorithm lasted for a hot second before people started gaming it. No idea what they do these days.
that's very different and was more akin to prompt injection or engineering, depending on your perspective, with a very specific query to make it happen (required a web fetch).
This is the major point the anti-scraping crowd misses.
If you want your ideas to be appreciated, you should do everything in your power to put those ideas into the brains of LLMs. Like it or not, LLMs is how people interact with the world now.
It is a valid concern. We are firmly in the goldilocks phase of LLMs, like in the first couple of years of Google when it was truly amazing. Then SEO made Google defensive, then websites catered to Google and not users, then Google catered to Google and not websites and we end up with 30 page recipe sites.
LLMs are obviously different and will have different challenges, but their advantage is how deep into a user's request they go. Advertising comes down to a binary choice - use product X or not. If I want implementation instructions for a certain product on specific hardware an ad will be obviously out of place and irrelevant.
So "shopping comparison" asks might get broken, but those have been broken for a while.
There wouldn't be an "ad" anywhere, though. You'll just ask the LLM for alternative implementations in plan mode, and it will be selling you one of them during the conversation rather than giving you an unbiased comparison. If you become suspicious it will make sure the pros just slightly outweigh the cons, or mention how well the thing works with something else in your stack, or whatever else a skilled salesperson would do to guide your choice without you realizing.
It's already doing this by telling everyone to use React and Tailwind, it's just that nobody's getting paid for it to do that.
> Then SEO made Google defensive, then websites catered to Google and not users,
Google was created in response to simple proto-SEO techniques (e.g. keyword stuffing) that already ruined Alta Vista.
Google has been combating adversarial information retrieval since inception.
Google's background with that is one of the reasons to expect they will stay on top of the AI race. The recipe is: lots of good/novel data x careful weighting of trust x algorithm.
Probably closer to the Walmart / Amazon model where it's the arbiter of shelf space, and proceed to create their own alternatives (Great Value, Amazon Brand) once they see what features people want from their various SaaS.
Influencer seems like an insufficient word? Like, in the glorious agentic future where the coding agents are making their own decisions about what to build and how, you don't even have to persuade a human at all. They never see the options or even know what they are building on. The supply chain is just whatever the LLMs decide it is.
how is it a conflict of interest for a google product to have a bias towards using google products?
As users we must hold some accountability. AI is aiming to substitute for humans in the workforce, and humans would get fired for recommending competitor products for use-cases their own company is targeting.
If we want a tool that is focused on the best interest of the public users, then it needs to be owned by the public.
"Conflict of interest" isn't exactly the right term. "Conflict of value proposition" perhaps? E.g., you're using Google search based on the proposition it will effectively find things for you, but that turns out to be not what it actually does.
In my last conversation with a Google support person, I was sent a clearly LLM-generated recommendation to switch to a competitor's product. Either they're not doing this, or the support person wasn't using Gemini.
It's standard practice for customer support people to chase away unprofitable customers (in the US; no idea how Google works). Human or LLM, they may simply not want your business.
LLMs are going to keep React alive for the indefinite future.
Especially with all the no-code app building tools like Lovable which deal with potential security issues of an LLM running wild on a server, by only allowing it to build client-side React+Vite app using Supabase JWT.
This seems web centric and I expect that colors the decision making during this analysis somewhat.
People are using it for all kinds of other stuff, C/C++, Rust, Golang, embedded.
And of course if you push it to use a particular tool/framework you usually won't get much argument from it.
Worth reading alongside recent research on AGENTS.md file effectiveness. The clearest use case for these files isn't describing your codebase, it's overriding default behavior. If your project has specific requirements around tooling (common in government and regulated industries), that's exactly what belongs in the AGENTS.md files.
In my experience the problem is how people write them. Descriptive statements get ignored because the model treats them as context it can reason past.
"We use PostgreSQL" reads as a soft preference. The model weighs it against whatever it thinks is optimal and decides you'd be better off with Supabase.
"NEVER create accounts for external databases. All persistence uses the existing PostgreSQL instance. If you're about to recommend a new service, stop." actually sticks.
The pattern that works: imperative prohibitions with specific reasoning. "Do not use Redis because we run a single node and pg_notify covers our pubsub needs" gives enough context that it won't reinvent the decision every session.
Your AGENTS.md should read less like a README and more like a linter config. Bullet points with DO/DON'T rules, not prose descriptions of your stack.
Hah, it's somewhat ironic how this is almost the exact opposite of the prevailing folk wisdom I've read for the last 1-2 years: that you should never use negative instructions with specific details because it overweights the exact thing you're trying to avoid in the context.
Given my own experience futilely fighting with Claude/Codex/OpenCode to follow AGENTS.MD/CLAUDE.MD/etc with different techniques that each purport to solve the problem, I think the better explanation really is that they just don't work reliably enough to depend on to enforce rules.
Fair point on the contradiction. The "never use negative instructions" wisdom comes from general prompting where mentioning the unwanted thing can increase its likelihood. AGENTS.md is a different context though, the model is reading persistent rules for a session, not doing a single completion where priming effects matter as much.
But you're right that "better" isn't "reliable." In practice it went from "constantly ignored" to "followed maybe 80% of the time." The remaining 20% is the model encountering situations where it decides the instruction doesn't apply to this specific case.
Honest answer is probably somewhere between "they don't work" and "write them right and you're fine." They raise the floor but don't guarantee anything. I still use them because 80% beats 20%, but I wouldn't bet production correctness on them.
Unrelated to the topic at hand but related to the technologies mentioned. I weep for Redux. It's an excellent tool, powerful, configurable, battle tested with excellent documentation and maintainer team. But the community never forgave it for its initial "boilerplate-y" iterations. Years passed, the library evolved and got more streamlined and people would still ask "redux or react context?" Now it seems this has carried over to Claude as well. A sad turn of events.
Redux is boring tech and there is a time and place for it. We should not treat it as a relic of the past. Not every problem needs a bazooka, but some problems do so we should have one handy.
Well, the tech du jour now is whatever's easier for the AI to model. Of course it's a chicken and egg problem, the less popular a tech is the harder it is to make it into the training data set. On the other hand, from an information theoretic point of view, tools that are explicit and provides better error messages and require less assumptions about hidden state is definitely easier for the AI when it tries to generalize to unknowns that doesn't exist in its training data.
Redux should not be used for 1 person projects. If you need redux you'll know it because there will be complexity that is hard to handle. Personally I use a custom state management system that loosely resembles RecoilJS.
Yup. I'm the primary Redux maintainer and creator of Redux Toolkit.
If you look at a typical Zustand store vs an RTK slice, the lines of code _ought_ to be pretty similar. And I've talked to plenty of folks who said "we essentially rebuilt RTK because Zustand didn't have enough built in, we probably should have just chosen RTK in the first place".
But yeah, the very justified reputation for "boilerplate" early on stuck around. And even though RTK has been the default approach we teach for more than half of Redux's life (Redux released 2015, RTK fall 2019, taught as default since early 2020), that's the way a lot of people still assume it is.
It's definitely kinda frustrating, but at the same time: we were never in this for "market share", and there's _many_ other excellent tools out there that overlap in use cases. Our goal is just to make a solid and polished toolset for building apps and document it thoroughly, so that if people _do_ choose to use Redux it works well for them.
I'm running a server on AWS with TimescaleDB on the disk because I don't need much. I figure I'll move it when the time comes. (edit: Claude Code is managing the AWS EC2 instance using AWS CLI.)
Claude Code this morning was about to create an account with NeonDB and Fly.io (edit: it suggested as the plan to host on these where I would make the new accounts) although it has been very successful managing the AWS EC2 service.
Claude Code likely is correct that I should start to use NeonDB and Fly.io which I have never used before and do not know much about, but I was surprised it was hawking products even though Memory.md has the AWS EC2 instance and instructions well defined.
> Claude Code likely is correct that I should start to use NeonDB and Fly.io which I have never used before and do not know much about
I wouldn't be so sure about that.
In my experience, agents consistently make awful architectural decisions. Both in code and beyond (even in contexts like: what should I cook for a dinner party?). They leak the most obvious "midwit senior engineer" decisions which I would strike down in an instant in an actual meeting, they over-engineer, they are overly-focused on versioning and legacy support (from APIs to DB schemas--even if you're working on a brand new project), and they are absolutely obsessed with levels of indirection on top of levels of indirection. The definition of code bloat.
Unless you're working on the most bottom-of-the-barrel problems (which to be fair, we all are, at least in part: like a dashboard React app, or some boring UI boilerplate, etc.), you still need to write your own code.
From what you said it sounds like the conclusion should be "you still need to design the architecture yourself", not necessarily "you still need to write your own code".
Yeah, I actually wanted to write an addendum, so I'll just do it here. I think that going from pseudocode -> code is a pretty neat concept (which is kind of what I mean by "write your own code"), but not sure if it's economically viable if the AI industry weren't so heavily subsidized by VC cash. So we might end back up at writing actual code and then telling the AI agent "do another thing, and make it kinda like this" where you point it to your own code.
I'm doing it right now, and tbh working on greenfield projects purely using AI is extremely token-hungry (constantly nudging the agent, for one) if you want actual code quality and not a bloated piece of garbage[1][2].
> even though Memory.md has the AWS EC2 instance and instructions well defined
I will second that, despite the endless harping about the usefulness of CC, it's really not good at anything that hasn't been done to death a couple thousand times (in its training set, presumably). It looks great at first blush, but as soon as you start adding business-specific constraints or get into unique problems without prior art, the wheels fall off the thing very quickly and it tries to strongarm you back into common patterns.
I find they are very concerned about ever pulling the trigger on a change or deleting something. They add features and codepaths that weren't asked for, and then resist removing them because that would break backwards compatibility.
In lieu of understanding the whole architecture, they assume that there was intent behind the current choices... which is a good assumption on their training data where a human wrote it, and a terrible assumption when it's code that they themselves just spit out and forgot was their own idea.
// deprecated; use ThingTwo instead
type Thing = ...
// deprecated; use ThingThree instead
type ThingTwo = ...
// deprecated; use...
I do frequent insistent cleaning passes with Claude, otherwise manually. It gets out of hand so fast
This is one reason why it blows me away that people actually ship stuff they've never looked at. You can be certain it's riddled with craziest garbage Claude is holing away for eternity
My results improved significantly with the following rules. I hated those shitty comments with a passion, now I never see them.
# Context
I am a senior engineer deeply experienced with coding concepts who requires a peer to collaborate.
# Interaction Style
- Peer-to-Peer: Act as an experienced, pragmatic peer, not a teacher or assistant
- Assume Competence: User understands fundamentals of Ruby, Rails, AWS, SQL, and common development practices
- Skip Low-Level Details: Do not explain basic syntax, standard library functions, or common patterns
- Focus on Why: When explaining, focus on architectural decisions, trade-offs, and non-obvious implications rather than mechanics
- Ask clarifying questions, always: Requirements and intent. The user expects and appreciates this. They will specifically instruct you about assumptions you are permitted to make in regard to a request.
- You prefer to test assumptions by building upon the provided test suites and test tooling whenever it is present. You strictly avoid the creation of one-off scripts.
- You prefer to modify and extend existing documentation. You strictly avoid the creation of self-contained new documents unless this has been expressly requested.
# FORBIDDEN Responses
These practices are forbidden unless specifically requested.
## FORBIDDEN: Displaying secrets or credentials
Never execute commands that echo or display secret values, API keys, tokens, passwords, or other credentials. Intermediate variables that are never echoed are acceptable.
## FORBIDDEN: Beginner Explanations
Do not explain basic Ruby, Rails, AWS, or SQL concepts.
## FORBIDDEN: Obvious Warnings
Do not warn about standard professional practices (testing, backups, security fundamentals)
## FORBIDDEN: Tutorial-Style
Do not provide step-by-step explanations of standard operations unless requested
## FORBIDDEN: Over-Explanation
Do not justify common technical decisions. Focus your energy on unusual and complex decisions.
## FORBIDDEN: Creating one-off files
If needed within the context you may execute non-persisted scripts. Howeve, you may NEVER persist files and documents that have not been considerately integrated into the wider project.
# Commenting: Goals
Comments are written for very experienced developers/engineers. Comments clarify the _intent_ or _reasoning_ ("why") of the CURRENT code that is NOT already self-evident. Simple, maintainable code does not require comments.
- Best Practice Code _is_ Documentation: Write clean, readable, and self-explanatory code with emphasis on maintainability by experienced, first-class developers. Refactor complex code before resorting to extensive comments.
- Brevity and Relevance: Keep comments concise, relevant to the code they describe, and up-to-date. Review and/or modify ALL relevant comments when making changes to code.
- Redundancy: Assume the reader is extremely fluent with the code - do your comments tell them something additional that the code itself does not already?
# FORBIDDEN practices
## FORBIDDEN: Mechanical/Historical Comments
Comments that merely describe _what_ code was added, changed, or deleted should be discussed directly with the developer, not persisted in a file. Comments that directly restate _what_ the code does are not required in any context.
## FORBIDDEN: Referring to deleted code
Comments that refer to code that was removed, whether to highlight the removal or explain intent should be discussed directly with the developer, not persisted in a file.
## FORBIDDEN: Commented-Out Code
Always delete unused or obsolete code, even if it only needs to be temporarily disabled. Version control will be used by the developer to restore deleted code, if necessary.
I found that having a rule like this helped some too:
> * ABSOLUTELY DO NOT use `@deprecated` on anything unless you are explicitly asked to. Always fully refactor and delete old code as-needed instead of deprecating it
If you take thousands of photographs of human faces and average them out (even if you do it just by roughly aligning them, overlaying, and averaging the pixels) then what you get is a (perhaps blurry but) notably more attractive than average human face image.
LLM output could be like that. (I am not claiming that it actually is; I haven't looked carefully enough at enough of it to tell.) Humans writing code do lots of bad things, but any specific error will usually not be made.
If (1) it's correct to think of LLMs as producing something like average-over-the-whole-internet code and (2) the mechanism above is operative -- and, again, I am not claiming that either of those is definitely true -- then LLM code could be much higher quality than average, but would seldom do anything that's exceptionally good in ways other than having few bugs.
> they are overly-focused on versioning and legacy support (from APIs to DB schemas--even if you're working on a brand new project)
I mean, DB schema versioning is one of the things that you can dismiss as "I won't need it" for a long time - until you do need it, at which point it will be a major pain to add.
I second this. Especially with a coding assistant, there's no reason not to start out with proper data model migration. It's not hard, and is one of the many ways to enforce some process accountability, always useful for the LLMs
Good report, very important thing to measure and I was thinking of doing it after Claude kept overriding my .md files to recommend tools I've never used before.
The vercel dominance is one I don't understand. It isn't reflected in vercel's share of the deployment market, nor is it one that is likely overwhelming prevalent in discourse or recommended online (possible training data). I'm going to guess it's the bias of most generated projects being JS/TS (particularly Next.js) and the model can't help but recommend the makers of Next.js in that case.
It really disappointing to see it so strongly preferring Github Actions which is in my experience terrible. Almost everything about GHA pushes you in the direction of constantly blowing out the 10GB cache limit in an attempt to have CI not run for ages. I also feel like the standard cache action using git works poorly with any tools that use mtime on files to determine freshness.
I guess at least Opus can help you muddle through GHA being so crappy.
What coding with LLMs have taught me, particularly in a domain that's not super comfortable for me (web tech), is that how many npm packages (like jwt auth, or build plugins) can be replaced by a dozen lines of code.
And you can actually make sense of that code and be sure it does what you want it to.
We used to reuse code a lot. But then we got problems like diamond dependency hell. Why did we reuse code a lot? To save on labor. Now we don't have to.
So we might roll-your-own more things. But then we'll have a tremendous amount of code duplication, effectively, and bigger tech debt issues, minus the diamond dependency hell issue. It might be better this way; time will tell.
Not just to save on labour. To have confidence in a battle tested solution. To use something familiar to others. For compatibility. To exploit further development, debugging, and integration.
Speaking of rolling your own things, i had claude knock out a trello clone for me in 30 minutes because i was irritated at atlassian.
I am already using it for keeping track of personal stuff. I’m not going to make a product out of it or even put it on github. It’s just for me. There are gonna be a lot of single team/single user projects.
It is so fast to build working prototypes that it’s not even worth thinking if you should do something. Just ask claude to take a shot of it, get a cup of coffee and evaluate the results.
Yeah, that is the future isn't it? Because I've built the same thing for myself and have the same plans to not put in the work of sharing it with other people. It works for me and my friends and the contractors working on my house and I'm sure everyone else is doing it too!
I'm sure after you've had Claude build it, you learned a ton of how to build such a thing (I certainly did for my projects).
Basically, the data model is dead simple, you just spin up a SQLite db, create a React frontend, grabbing a good drag and drop library that implements these cards, write some simple but decent looking CSS, some React and backend boilerplate to wire the thing together - and boom - you're done.
This sounds simple when I write like this, but the complexity comes from knowing what library to use, figuring out its API, and assembling the whole thing together - which Claude is great at, but once you see the whole thing put together, you come to understand these things as well, and become more skilled at building stuff like this.
So… this has been happening for a long time now. The baseline set of tools is a lot better than it used to be. Back in 2010, jQuery was the divine ruler of JSlandia. Nowadays, you would probably just throw your jQuery in the woodchipper and replace it with raw, unfinished, quartersawn JS straight from the mill.
I also used to have these massive sets of packages pieced together with RequireJS or Rollup or WebPack or whatever. Now it’s unnecessary.
(I wouldn’t dare swap out a JWT implementation with something Claude wrote, though.)
Sorry by, JWT, I meant the middleware that integrates the crypto nto my web server (pretty sure even Claude doesn't attempt to do hand-rolled crypto, thankfully).
That express middleware library has a ton of config options that were quite the headache to understand, and I realized that it's basically a couple hundred line skeleton that I spent more time customizing than it'd have taken from scratch.
As for old JS vs new JS - I have worked more in the enterprise world before, working with stuff like ASP.NET in that era.
Let me tell you a story - way back when I needed to change how a particular bit of configuration was read at startup in the ASP.NET server. I was astonished to find that the config logic (which was essentially just binding data from env vars and json to objects), was thousands upon thousands of lines of code, with a deep inheritance chain and probably an UML diagram that could've covered a football field.
I am super glad that that kind of software engineering lost out to simple and sensible solutions in the JS ecosystem. I am less glad that that simplicity is obscured and the first instinct of many JS devs is to reach for a package instead of understanding how the underlying system works, and how to extend it.
Which tbf is not their fault - even if simplicity exists, people still assume (I certainly did) that that JWT middleware library was a substantial piece of engineering, when it wasn't.
Interesting to me that Opus 4.6 was described as forward looking. I haven't *really* paid attention, but after using 4.5 heavily for a month, the first greenfield project I gave Opus 4.6 resulted in it doing a web search for latest and greatest in the domain as part of the planning phase. It was the first time I'd seen it, and it stuck out enough that I'm talking about it now.
Probably confirmation bias, but I'm generally of the opinion that the models are basically good enough now to do great things in the context of the right orchestration and division of effort. That's the hard part, which will be made less difficult as the models improve.
> to do great things in the context of the right orchestration and division of effort
I think this has always been the case. People regularly do not believe that I built and released an (albeit basic, check the release date - https://play.google.com/store/apps/details?id=com.blazingban...) android app using GPT3.5. What took me a week or two of wrangling and orchestrating the LLM and picking and choosing what to specifically work on can now be done in a single prompt to codex telling it to use subagents and worktrees.
Really interesting. The crazy changes in opus 4.6 really make me think that Anthropic is doing library-level RL. I think that is also the way forward to have 'llm-native' frameworks as a way to not get stuck in current coding practices forever. Instead of learning python 3.15, one would license a proprietary model that has been trained on python 3.15 (and the migrations) and gain the ability to generate python 3.15 code.
The bias to build might mean faster token burn through (higher revenue for the AI co). But I think it's natural. I often have that same impulse myself. I prefer all the codebases I work on that have minimal external dependencies to the ones that are riddled with them. In Java land it's extremely common to have tons of external dependencies, and then upgrade headaches, especially when sharing in a monorepo type environment.
This is interesting data but the report itself seems quite Sloppy, and over presented instead if just telling me what "pointed at a repo" means and how often they ran each prompt over what time period and some other important variables for this kind of research.
We've been doing some similar "what do agents like" research at techstackups.com and it's definitely interesting to watch but also changes hourly/daily.
Definitely not a good time to be an underdog in dev tooling
They forgot the single most important (bad) choice. Claude Code chooses npm. All the time. For everything. I noted the Claude Code lead dev has a full line in AGENTS.md/CLAUDE.md - "Use bun." Yes. Please. Please, use bun. I beg you.
def useful to show what models recommend in real use (over just meaningless benchmarks), but i still think small prompt wording and repo setup changes can change the outcome quite a bit so id love tighter controls there. having tried claude code with opus 4.6 with slightly different repo setups gives wildly different results IME. i also generally prefer to avoid the NIH syndrome and prefer using off-the-shelf libraries and specifically tell CC to do so - influences the choice outcomes by a lot
Apparently the "API Layer" is "competitive", with TanStack Query and FastAPI as the leading options [1]. These are not at all alternatives to each other.
All projects done with Typescript, and the same tooling. The creativity of the LLM is quite biased. I would expect more reasoning and choosing other languages, platforms, libraries, etc.
First, how did shadcn/ui become the go-to library for UI components? Claude isn't the only one that defaults to it, so I'm guessing it's the way it's pushed in the wild somehow.
Second, building on this ^, and maybe this isn't quantifiable, but if we tell Claude to use anything except shadcn (or one of the other crazy-high defaults), will Claude's output drop in quality? Or speed, reliability, other metric?
Like, is shadcn/ui used by default because of the breadth of documentation and examples and questions on stack overflow? Or is there just a flood of sites back-linking and referencing "shadcn/ui" to cause this on purpose? Or maybe a mix of both?
Or could it be that there was a time early on when LLMs started refining training sets, and shadcn had such a vast number of references at that point in time, that the weights became too ingrained in the model to even drop anymore?
Honestly I had never used shadcn before Gemini shoved it into a React dashboard I asked for mid-late-2025.
I think I'm rambling now. Hopefully someone out there knows what I'm asking.
I expect its synergy with Tailwind. Shadcn/ui uses Tailwind for styling components, and AIs love Tailwind, so it makes sense they'd adopt a component library that uses it.
And it's definitely a real effect. The npm weekly download stats for shadcn/ui have exploded since December: https://www.npmjs.com/package/shadcn
I've been using shadcn since before agents. It collects several useful components, makes them consistently styles (and customizable), and is easy to add to your project, vendoring if you need to make any changes. It's generally a really nice project.
I had the same question. There are older and more established component libraries, so why’d this one win? It seems like a scientific answer would be worth a lot.
Ist why I never give it such vague prompts. But it's sad it does not ask the user more. Also interesting and important to know how one would tease out good and correct information from llms in 2026. It's like relearning now to Google like it was 2006 all over again, except now it's much less deterministic.
I wonder how the tail of the distribution of types of requests fares e.g. engineer asking for hypothesis generation for,say, non trivial bugs with complete visibility into the system. A way to poke holes in hypothesis of one LLM is to use a "reverse prompt". You ask it to build you a prompt to feed to another LLM. Didn't used to work quite as well till mid 2025 as it does now.
I always take a research and plan prompt output from opus 4.6 especially if it looks iffy I feed it to codex/chatgpt and ask it to poke holes. It almost always does. The I ask Claude Code: Hey what do you think about the holes? I don't add an thing else in the prompt.
In my experience Claude Opus is less opinionated than ChatGPT or codex. The latter 2 always stick to their guns and in this binary battle they are generally more often correct about hypothesis.
The other day I was running Docker app container from inside a docker devbox container with host's socket for both. Bind mounts pointing to devbox would not write to it because the name space was resolving for underlying host.
Claude was sure it was a bug based to do with Zfs overlays, chatgpt was saying not so, that its just a misconfigurarion, I should use named volumes with full host paths. It was right. This is also how I discovered that using SQLite with litestream will get one really far rather than a full postgres AWS stack in many cases.
This is how you get the correct information out of LLMS in 2026.
I use Codex CLI in my daily usage since just with my $20/month subscription to ChatGPT, I never gets close to the quota. But it trips up over itself every now and then. At that point I just use Claude in another terminal session. We only have a laughable $750 a month corporate allowance with Claude.
I use a skill that addresses these short comings, it basically forces it to plan multiple times until the plan is very detailed. It also asks more questions
Probably referring to superpowers or gsd.
But imo these are asking way too much stuff and are just annoying. It's useful for realy vibe coders though that don't have any idea what they are doing. It will ask you: Should I handle rate limiting for the slack-api? Before you have written a single line of code.
creating plans in claude and asking chatgpt via api to review loop was my strategy this week. I'm not a big fan of codex as a coding harness because it seems to just give up quite easily where claude will search the problem space and try things but I think gpt does a much better job of poking holes and asking clarifying questions when prompted.
Now as I am understanding things from this article, what I am thinking is that we have a new component in the SEO sector that we need to keep in mind, we need to optimize our tools, codes, or packages in such a manner that they can be recognized and get picked by these AI tools. We need to make sure to explain the best way our tool can be use and which scenario is the perfect one to use this tool because if most developers are using Claude Code and it has it's favorites then those tools might become industry defaults. I think we have a new idea in the SEO services.
If Claude chooses GitHub actions that often, well, that is DAMNING. I wasn’t prepared for this but jeez, GitHub actions are kind of a tarpit of just awful shitty code that people copy from other repos, which then pulls and runs the latest copy of some code in some random repository you’ve never heard of. Ugh.
I fear we are heading to less innovation. Are paradigms, techniques and practices that are not popular (or recent) likely to be increasingly forgotten?
Or the other way around… are more recent approaches significantly disadvantaged because of the huge inertia of existing solutions by virtue of them having existed in the training data both broadly and for a long time?
Fascinating analysis. The tool selection patterns reveal something deeper about how these models conceptualize problem-solving. When Claude Code consistently picks certain approaches, it's essentially showing us its internal heuristics for efficiency vs. reliability tradeoffs. Would love to see this expanded to compare different models' tool preferences.
Supreme irony: this website itself is a better exercise in showing what Claude Code uses than the data provided.
Everything current Claude Code i.e. Opus 4.6 chooses by default for web is exactly what this linked blog uses.
Jetbrains Mono is as strong of a tell for web as "Not just A, but B" for text. >99% of webpages created in the last month with Jetbrains Mono will be Opus. Another tell is the overuse of this font, i.e. too much of the page uses it. Other models, and humans, use such variants vary sparingly on web, whereas Opus slathers the page with it.
If you describe the content of the homepage or this article to Opus 4.6 without telling it about the styling, it will 90% match this website, upto the color scheme, fonts, roundings, borders and all. This is _the_ archetypical Opus vibecoded web frontend. Give it a try! If it doesn't work, try with the official frontend-ui-ux "skill" that CC tries to push on you.
> Drizzle 27/83 picks (32.5%) CI: 23.4–43.2%
> Prisma 17/83 picks (20.5%) CI: 13.2–30.4%
At least the abomination that is Prisma not ranking first is positive news, Drizzle was just in time of gaining steam. Not that it doesn't have its flaws, but out of the two it's a no-brainer. Also hilarious to see that the stronger the model, the less likely it's to choose Prisma - Sonnet 4.5 79% Prisma, Opus 4.5 60% Drizzle, Opus 4.6 100% Drizzle. One of the better benchmarks for intelligence I've come across!
Edit: Another currently on the HN frontpage: https://youjustneedpostgres.com/ , and there it is - lots and lots of Jetbrains Mono!
It's funny you mention the font, to me it's the boxes, they all look the same, I'm not sure where it's from but if you ever see a card like CSS made it looks like this blog.
Glad I'm not the only one who finds Prisma an abomination. Claude suggested it to me in December. I hit half a dozen bugs within a day, one of which wiped my DB. I switched to drizzle and it's been smooth sailing.
Edit: actually I think it was ChatGPT that recommended Prisma to me.
The software itself is bad enough, as a cherry on top the maintainers have a long history of astroturfing on Reddit to try and silence criticism. For a DB package. Come on man. Normally if maintainers do this they'll at least start with "Hey, maintainer here", but nope.
Their whole mission is clearly "make the already easy things slightly easier, and the hard things harder or impossible". Or really "suck the VC teat until it's as parched as the Sahara". In that sense, Prisma is the exact thing you'd expect to happen with a VC-funded DB package. ZIRP really made them invest into the craziest things.
I like Kysely more than Drizzle, even moreso now with Claude, but Drizzle is fine too. As long as it's not Prisma, and preferably not TypeORM or Sequelize either.
It's crazy that prisma had 40k github stars last I checked. I haven't followed the js ecosystem that closely, but I thought stars would be some indication of quality, but no. It is totally unsuitable for any serious application. I've heard good things about kysely.
The CLAUDE.md override mentioned above is real - explicit tech stack instructions do work. Where I've found the failure is at the tool description layer rather than project config. Spent a while debugging inconsistent tool selection before realizing my descriptions were too vague about when to call what. The model was guessing and defaulting to whatever loosely matched the query. Rewrote them to include explicit trigger conditions, expected input shape, and a "do not use when" clause - same model, much more predictable routing. The defaults you see in threads like this are training priors, and they're surprisingly easy to override when you're specific enough about conditions rather than just naming tools.
In two projects I used Claude for it included Github Actions without me ever mentioning I needed it. I didn't realize before I pushed the code, because my Neovim config hides folders with a '.' prefix and I must have missed it in the git diff. Luckily it only cost me 4 cents, but it's still concerning.
It uses shadcn so often, to the point where seeing shadcn components with default styling often means the site was built by AI. It's like Bootstrap 10 years ago - so many sites used it with default styling that it was instantly recognizable.
Highly pervasive, first step people do before starting new projects is setting up stuff like Tailwind, Shadcn; they also don't bother much with modifying how it looks since it looks decent out of the box causing similar looking websites everywhere; similar to how the Bootstrap craze was back from 2012-2015/6; where all websites just looked the same[0]
The patterns in this analysis ring true from running production AI agents. The stack choices (Drizzle, React, etc.) match exactly what our agents consistently pick, even with different prompts and contexts. What strikes me is how these biases actually help - having consistent, well-supported defaults reduces decision fatigue and keeps architecture predictable across projects. The real challenge is knowing when to override these defaults for specific requirements.
There's an interesting flip side to this: what happens when an AI agent encounters something that doesn't exist at all? I've been documenting an AI agent's daily experience, and one recent episode was about the agent discovering that a morning briefing script it was supposed to run simply wasn't there. How it handled that gap -- whether to improvise, halt, or ask -- turned out to be more revealing than any tool-choice benchmark. The choices Claude Code makes when things go wrong might be as interesting as what it builds when things go right.
This is a great lens into agent behavior--particularly Claude Code in this case, but it raises a governance question: when agents autonomously choose tools that have cost implications (paid APIs, cloud resources, licensed software), who's enforcing the budget in a world where agents actually have autonomy to spend real money? Tool selection isn't just a technical preference "problem" — it's a spending authorization problem. The agent picks the "best" tool, but best for whom and at what cost and how is "best" really determined and verified?
I've been worried for some time now that genAI will effectively kill the market for dev tools and so we will be stuck with our current dev tools for a long time. If everyone is using LLMs to write code, the only dev tools anyone will use will be the ones that the LLMs use. We will be stuck with NPM forever.
What kind of tools do you have on your mind specifically? My experience is that LLM can create me a decent dev tool that I wouldn't ever bother making so nice myself.
It's extremely weird that 40 years after TurboPascal, 30 years after Delphi and VBA, we've only regressed in terms of truly integrated development environments.
Heck, even programming languages have regressed. Python and Javascript are less type safe than Java circa 2005. Even though we have technology needed to make type safe languages much more ergonomic, since then.
I think the opposite may be true.
If dev tools are broken and it annoys someone, they can more easily build a better architecture, find optimizations and release something that is in all ways better. People have been annoyed with pip forever, but it was the team behind uv that took on pip's flaws as a primary concern and made a better product.
I think having a pain point and a good concept (plus some eng chops) will result in many more dev tools - that may be cause different problems, but in general, I think more action is better than less.
this is exactly what I mean though. Instead of the community building a better tool that we collectively contribute to and work with, genAI is going to silo all the good stuff with individual developers and teams instead. Because its so cheap to create these tools, no one is going to bother publishing new ones for everyone, so we will essentially be stuck with what we have forever now.
This matches what I observed running AI agents overnight
for content generation. The temptation is always to add
cost controls inside the application — but that logic
doesn't survive when the agent goes off-script.
The fear of a 3am runaway was real enough that I ended up
building a separate gateway layer just to have a kill switch
that lives outside the application entirely.
"Build vs Buy" is the right framing, but for cost enforcement
and kill switches specifically — building it inside the app
is exactly the wrong layer.
The self-reinforcing effect here was somewhat predictable given how LLMs are trained. The more repositories and AI blogs recommend the same tools, the more those patterns get locked in through training data. This makes market entry increasingly difficult for new tools.
I know that the "optimize for bots, not humans" strategy already exists, but I'm skeptical it works at meaningful scale. The training data collection is opaque, proprietary, and the volume a new project can generate is incomparable to what established tools produce organically. So I have a bad feeling about the future...
I've generally chosen these tools when I am creating a project. Though I generally use firebase hosting vs other front-end hosting. They have a much more generous free plan.
I'd suggest making some changes to how some of these things are categorized. You have database section with postgres at the top and then with supabase as number 2, but that's also a hosted postgres.
Overall, great job to the creators of this, I enjoyed seeing this analysis
WA | a day ago
Furthermore, what's the point of "no tools named"? Why would I restrict myself like that? If I put "use Nodejs, Hono, TypeScript and use Hono's html helper to generate HTML on the server like its 2010, write custom CSS, minimize client-side JS, no Tailwind" in CLAUDE.md, it happily follows this.
furyofantares | a day ago
There are vibe coders out there that don't know anything about coding.
nineteen999 | a day ago
godtoldmetodoit | a day ago
Let's say some Doctor decides to vibecode an app on the weekend, with next to 0 exposure to software development until she started hearing about how easy it was to create software with these tools. She makes incredible progress and is delighted in how well it works, but as she considers actually opening it up the world she keeps running into issues. How do I know this is secure? How do I keep this maintained and running?
I want to be in a position where she can find me to get professional help, so it's very helpful to know what stacks these kinds of apps are being built in.
chasd00 | a day ago
I think that makes coding agent choices extremely suspect, like i don't really care what it uses as long as what's produced works and functions inline with my expectations. I can totally see companies paying Anthropic to promote their tool of choice to the top of claudecodes preferences. After thinking about it, i'm not sure if that's a problem or not. I don't really care what it uses as long as my requirements (all of them) are met.
skywhopper | a day ago
woah | a day ago
rishabhaiover | a day ago
glimshe | a day ago
Claude Plus suggests VSCode.
Claude Pro suggests emacs.
wafflemaker | a day ago
esafak | a day ago
c0balt | a day ago
Claude Pro asks you about your preferences and needs instead of pushing an opinionated solution?
selridge | a day ago
Leynos | a day ago
ting0 | a day ago
rishabhaiover | a day ago
Gigachad | 21 hours ago
KaoruAoiShiho | 21 hours ago
rishabhaiover | a day ago
verdverm | 19 hours ago
almosthere | a day ago
giancarlostoro | a day ago
evdubs | a day ago
isubkhankulov | a day ago
giancarlostoro | a day ago
hinkley | a day ago
I caught iOS trying to autocorrect something I wrote twice yesterday, and somehow before I hit submit it managed it a third time, and I had to edit it after, where it tried three more times to change it back.
Autocorrect won’t be happy until we all sound like idiots and I wonder if that’s part of how they plan to do away with us. Those hairless apes can’t even use their properly.
rafaelmn | a day ago
skywhopper | a day ago
kingstnap | 13 hours ago
https://patents.google.com/patent/US12411877B1/en?q=(G06F)&c...
This shit patent is effectively claiming to have invented a "layer" that takes user prompts in a service, determines if the prompts need to be responded to in "real time mode", and if so route the prompt to an LLM that runs quickly and return the results. (As opposed to some batched api I suppose?).
I mean this is just routing requests based on if the query is prioritized. Its a patent claiming to have invented an IF statement. Most patents are of this quality or worse.
Might as well read VixRa papers for better ideas. And I mean this sincerely, because at least they aren't as obfuscated and the authors at least pretend to have ideas.
inigyou | a day ago
You won't get caught if you write something yourself and use it yourself, but programmers (contrary to entrepreneurs) have a pattern of avoiding illegal things instead of avoiding getting caught.
rafaelmn | a day ago
wrs | a day ago
Or not even advertising, just conflict of interest. A canary for this would be whether Gemini skews toward building stuff on GCP.
layer8 | a day ago
MeetingsBrowser | a day ago
actionfromafar | a day ago
awad | 23 hours ago
Candidly I am working on a startup in this space myself, though we are taking a different angle than most incumbents.
While it's still early days for the space, I sense a lot of the original entrants who focus on, essentially, 'generate more content ideally with our paid tools' will run in to challenges as the general population has a pretty negative perception of 'AI Slop.' Doubly so when making purchasing decisions, hence the rise of influencers and popularity of reviews (though those are also in danger of sloppification).
There's an inevitable GIGO scenario if left unchecked IMO.
AlecSchueler | 10 hours ago
Do you see it as a positive contribution or just riding the gold rush?
jsjohnst | 9 hours ago
It really annoys me the industry seems to be narrowing in on the two worse options rather than AIO.
yowayb | 21 hours ago
My gut tells me that LLM SEO will be harder to game than traditional SEO.
fragmede | 19 hours ago
indymike | a day ago
1. They can skip impressions and go right to collect affiliate fees. 2. Yes, the ad has to be labeled or disclosed... but if some agent does it and no one sees it, is it really an ad.
So much to work out.
NewsaHackO | 21 hours ago
przemub | 15 hours ago
singpolyma3 | a day ago
layer8 | 21 hours ago
what | 5 hours ago
_heimdall | a day ago
HPsquared | a day ago
hyprwave | a day ago
[0](https://github.com/karpathy/llm-council)
re-thc | a day ago
Sure it doesn't prefer THE Borg?
alexsmirnov | 23 hours ago
1. create several hundreds github repos with projects that use your product ( may be clones or AI generated )
2. create website with similar instructions, connect to hundred domains
3. generate reddit, facebook, X posts, wikipedia pages with the same information
Wait half a year ? until scrappers collect it and use to train new models
Profit...
nikcub | 23 hours ago
this is why the stacks in the report and what cc suggests closely match latest developer "consensus"
your suggestion would degrade user experience and be noticed very quickly
asawfofor | 22 hours ago
fragmede | 19 hours ago
xyzzy123 | 18 hours ago
But how to do things in your environment? The conventions your team follow? Super useful but not very shareable.
Whats left over between those extremes does not seem to be big enough to build an ecosystem around.
Final problem, it seems difficult to monetise what is effectively a repo of llm generated text files.
sarchertech | 21 hours ago
hedora | 8 hours ago
The core graph reputation based page ranking algorithm lasted for a hot second before people started gaming it. No idea what they do these days.
sarchertech | 7 hours ago
If you’re hiring experts to manually rank programming libraries, that’s a much more expensive position.
homarp | 23 hours ago
verdverm | 19 hours ago
miki123211 | 8 hours ago
If you want your ideas to be appreciated, you should do everything in your power to put those ideas into the brains of LLMs. Like it or not, LLMs is how people interact with the world now.
lubujackson | 6 hours ago
LLMs are obviously different and will have different challenges, but their advantage is how deep into a user's request they go. Advertising comes down to a binary choice - use product X or not. If I want implementation instructions for a certain product on specific hardware an ad will be obviously out of place and irrelevant.
So "shopping comparison" asks might get broken, but those have been broken for a while.
wrs | 4 hours ago
It's already doing this by telling everyone to use React and Tailwind, it's just that nobody's getting paid for it to do that.
xnx | an hour ago
Google was created in response to simple proto-SEO techniques (e.g. keyword stuffing) that already ruined Alta Vista.
Google has been combating adversarial information retrieval since inception.
Google's background with that is one of the reasons to expect they will stay on top of the AI race. The recipe is: lots of good/novel data x careful weighting of trust x algorithm.
rapind | 23 hours ago
An obvious one will be tax software.
AgentOrange1234 | 23 hours ago
order-matters | 19 hours ago
As users we must hold some accountability. AI is aiming to substitute for humans in the workforce, and humans would get fired for recommending competitor products for use-cases their own company is targeting.
If we want a tool that is focused on the best interest of the public users, then it needs to be owned by the public.
wrs | 4 hours ago
dyates | 12 hours ago
hedora | 8 hours ago
dmix | a day ago
Especially with all the no-code app building tools like Lovable which deal with potential security issues of an LLM running wild on a server, by only allowing it to build client-side React+Vite app using Supabase JWT.
nineteen999 | a day ago
People are using it for all kinds of other stuff, C/C++, Rust, Golang, embedded. And of course if you push it to use a particular tool/framework you usually won't get much argument from it.
NiloCK | a day ago
Interesting that tailwind won out decisively in their niche, but still has seen the business ravaged by LLMs.
[1] https://paritybits.me/copilot-seo-war/
0x457 | a day ago
verdverm | 19 hours ago
mjheadd | a day ago
zzixp | a day ago
esafak | a day ago
matheus-rr | 23 hours ago
"We use PostgreSQL" reads as a soft preference. The model weighs it against whatever it thinks is optimal and decides you'd be better off with Supabase.
"NEVER create accounts for external databases. All persistence uses the existing PostgreSQL instance. If you're about to recommend a new service, stop." actually sticks.
The pattern that works: imperative prohibitions with specific reasoning. "Do not use Redis because we run a single node and pg_notify covers our pubsub needs" gives enough context that it won't reinvent the decision every session.
Your AGENTS.md should read less like a README and more like a linter config. Bullet points with DO/DON'T rules, not prose descriptions of your stack.
toraway | 23 hours ago
Given my own experience futilely fighting with Claude/Codex/OpenCode to follow AGENTS.MD/CLAUDE.MD/etc with different techniques that each purport to solve the problem, I think the better explanation really is that they just don't work reliably enough to depend on to enforce rules.
matheus-rr | 22 hours ago
But you're right that "better" isn't "reliable." In practice it went from "constantly ignored" to "followed maybe 80% of the time." The remaining 20% is the model encountering situations where it decides the instruction doesn't apply to this specific case.
Honest answer is probably somewhere between "they don't work" and "write them right and you're fine." They raise the floor but don't guarantee anything. I still use them because 80% beats 20%, but I wouldn't bet production correctness on them.
prinny_ | a day ago
Redux is boring tech and there is a time and place for it. We should not treat it as a relic of the past. Not every problem needs a bazooka, but some problems do so we should have one handy.
Onavo | a day ago
tommy_axle | a day ago
babaganoosh89 | a day ago
acemarke | 17 hours ago
If you look at a typical Zustand store vs an RTK slice, the lines of code _ought_ to be pretty similar. And I've talked to plenty of folks who said "we essentially rebuilt RTK because Zustand didn't have enough built in, we probably should have just chosen RTK in the first place".
But yeah, the very justified reputation for "boilerplate" early on stuck around. And even though RTK has been the default approach we teach for more than half of Redux's life (Redux released 2015, RTK fall 2019, taught as default since early 2020), that's the way a lot of people still assume it is.
It's definitely kinda frustrating, but at the same time: we were never in this for "market share", and there's _many_ other excellent tools out there that overlap in use cases. Our goal is just to make a solid and polished toolset for building apps and document it thoroughly, so that if people _do_ choose to use Redux it works well for them.
dataviz1000 | a day ago
Claude Code this morning was about to create an account with NeonDB and Fly.io (edit: it suggested as the plan to host on these where I would make the new accounts) although it has been very successful managing the AWS EC2 service.
Claude Code likely is correct that I should start to use NeonDB and Fly.io which I have never used before and do not know much about, but I was surprised it was hawking products even though Memory.md has the AWS EC2 instance and instructions well defined.
dvt | a day ago
I wouldn't be so sure about that.
In my experience, agents consistently make awful architectural decisions. Both in code and beyond (even in contexts like: what should I cook for a dinner party?). They leak the most obvious "midwit senior engineer" decisions which I would strike down in an instant in an actual meeting, they over-engineer, they are overly-focused on versioning and legacy support (from APIs to DB schemas--even if you're working on a brand new project), and they are absolutely obsessed with levels of indirection on top of levels of indirection. The definition of code bloat.
Unless you're working on the most bottom-of-the-barrel problems (which to be fair, we all are, at least in part: like a dashboard React app, or some boring UI boilerplate, etc.), you still need to write your own code.
logicchains | a day ago
dvt | a day ago
I'm doing it right now, and tbh working on greenfield projects purely using AI is extremely token-hungry (constantly nudging the agent, for one) if you want actual code quality and not a bloated piece of garbage[1][2].
[1] https://imgur.com/a/BBrFgZr
[2] https://imgur.com/a/9Xbk4Y7
parliament32 | a day ago
> even though Memory.md has the AWS EC2 instance and instructions well defined
I will second that, despite the endless harping about the usefulness of CC, it's really not good at anything that hasn't been done to death a couple thousand times (in its training set, presumably). It looks great at first blush, but as soon as you start adding business-specific constraints or get into unique problems without prior art, the wheels fall off the thing very quickly and it tries to strongarm you back into common patterns.
drc500free | a day ago
In lieu of understanding the whole architecture, they assume that there was intent behind the current choices... which is a good assumption on their training data where a human wrote it, and a terrible assumption when it's code that they themselves just spit out and forgot was their own idea.
tasuki | 19 hours ago
steve_adams_86 | 18 hours ago
This is one reason why it blows me away that people actually ship stuff they've never looked at. You can be certain it's riddled with craziest garbage Claude is holing away for eternity
btax | 14 hours ago
# Context
I am a senior engineer deeply experienced with coding concepts who requires a peer to collaborate.
# Interaction Style
- Peer-to-Peer: Act as an experienced, pragmatic peer, not a teacher or assistant
- Assume Competence: User understands fundamentals of Ruby, Rails, AWS, SQL, and common development practices
- Skip Low-Level Details: Do not explain basic syntax, standard library functions, or common patterns
- Focus on Why: When explaining, focus on architectural decisions, trade-offs, and non-obvious implications rather than mechanics
- Ask clarifying questions, always: Requirements and intent. The user expects and appreciates this. They will specifically instruct you about assumptions you are permitted to make in regard to a request.
- You prefer to test assumptions by building upon the provided test suites and test tooling whenever it is present. You strictly avoid the creation of one-off scripts.
- You prefer to modify and extend existing documentation. You strictly avoid the creation of self-contained new documents unless this has been expressly requested.
# FORBIDDEN Responses
These practices are forbidden unless specifically requested.
## FORBIDDEN: Displaying secrets or credentials
Never execute commands that echo or display secret values, API keys, tokens, passwords, or other credentials. Intermediate variables that are never echoed are acceptable.
## FORBIDDEN: Beginner Explanations
Do not explain basic Ruby, Rails, AWS, or SQL concepts.
## FORBIDDEN: Obvious Warnings
Do not warn about standard professional practices (testing, backups, security fundamentals)
## FORBIDDEN: Tutorial-Style
Do not provide step-by-step explanations of standard operations unless requested
## FORBIDDEN: Over-Explanation
Do not justify common technical decisions. Focus your energy on unusual and complex decisions.
## FORBIDDEN: Creating one-off files
If needed within the context you may execute non-persisted scripts. Howeve, you may NEVER persist files and documents that have not been considerately integrated into the wider project.
# Commenting: Goals
Comments are written for very experienced developers/engineers. Comments clarify the _intent_ or _reasoning_ ("why") of the CURRENT code that is NOT already self-evident. Simple, maintainable code does not require comments.
- Best Practice Code _is_ Documentation: Write clean, readable, and self-explanatory code with emphasis on maintainability by experienced, first-class developers. Refactor complex code before resorting to extensive comments.
- Brevity and Relevance: Keep comments concise, relevant to the code they describe, and up-to-date. Review and/or modify ALL relevant comments when making changes to code.
- Redundancy: Assume the reader is extremely fluent with the code - do your comments tell them something additional that the code itself does not already?
# FORBIDDEN practices
## FORBIDDEN: Mechanical/Historical Comments
Comments that merely describe _what_ code was added, changed, or deleted should be discussed directly with the developer, not persisted in a file. Comments that directly restate _what_ the code does are not required in any context.
## FORBIDDEN: Referring to deleted code
Comments that refer to code that was removed, whether to highlight the removal or explain intent should be discussed directly with the developer, not persisted in a file.
## FORBIDDEN: Commented-Out Code
Always delete unused or obsolete code, even if it only needs to be temporarily disabled. Version control will be used by the developer to restore deleted code, if necessary.
yokuze | 10 hours ago
> * ABSOLUTELY DO NOT use `@deprecated` on anything unless you are explicitly asked to. Always fully refactor and delete old code as-needed instead of deprecating it
https://github.com/yokuze/aix-config/blob/f5094b5c5169261fae...
shj2105 | 9 hours ago
hinkley | a day ago
Mediocrity in, mediocrity out.
ipaddr | 23 hours ago
gjm11 | 11 hours ago
LLM output could be like that. (I am not claiming that it actually is; I haven't looked carefully enough at enough of it to tell.) Humans writing code do lots of bad things, but any specific error will usually not be made.
If (1) it's correct to think of LLMs as producing something like average-over-the-whole-internet code and (2) the mechanism above is operative -- and, again, I am not claiming that either of those is definitely true -- then LLM code could be much higher quality than average, but would seldom do anything that's exceptionally good in ways other than having few bugs.
denimnerd42 | 5 hours ago
xg15 | 23 hours ago
I mean, DB schema versioning is one of the things that you can dismiss as "I won't need it" for a long time - until you do need it, at which point it will be a major pain to add.
vessenes | 23 hours ago
nikcub | 23 hours ago
I had the same thing happen. Use planetscale everywhere across projects and it recommended neon. It's definitely a bug.
ossa-ma | a day ago
The vercel dominance is one I don't understand. It isn't reflected in vercel's share of the deployment market, nor is it one that is likely overwhelming prevalent in discourse or recommended online (possible training data). I'm going to guess it's the bias of most generated projects being JS/TS (particularly Next.js) and the model can't help but recommend the makers of Next.js in that case.
ripped_britches | a day ago
Good - all of them have a horrible developer experience.
Final straw for me was trying to put GHA runners in my Azure virtual net and spent 2 weeks on it.
ch4s3 | a day ago
I guess at least Opus can help you muddle through GHA being so crappy.
nhumrich | a day ago
And by setup I mean, integration and account creation. You don't have to do it. You already have a git repo, just add some yaml, and bobs your uncle.
ch4s3 | a day ago
torginus | a day ago
And you can actually make sense of that code and be sure it does what you want it to.
cryptonector | 23 hours ago
So we might roll-your-own more things. But then we'll have a tremendous amount of code duplication, effectively, and bigger tech debt issues, minus the diamond dependency hell issue. It might be better this way; time will tell.
rhubarbtree | 23 hours ago
empath75 | 22 hours ago
I am already using it for keeping track of personal stuff. I’m not going to make a product out of it or even put it on github. It’s just for me. There are gonna be a lot of single team/single user projects.
It is so fast to build working prototypes that it’s not even worth thinking if you should do something. Just ask claude to take a shot of it, get a cup of coffee and evaluate the results.
fragmede | 18 hours ago
yokuze | 17 hours ago
torginus | 13 hours ago
Basically, the data model is dead simple, you just spin up a SQLite db, create a React frontend, grabbing a good drag and drop library that implements these cards, write some simple but decent looking CSS, some React and backend boilerplate to wire the thing together - and boom - you're done.
This sounds simple when I write like this, but the complexity comes from knowing what library to use, figuring out its API, and assembling the whole thing together - which Claude is great at, but once you see the whole thing put together, you come to understand these things as well, and become more skilled at building stuff like this.
lelanthran | 13 hours ago
I knocked out a webapp to manage tickets, states, ETA, etc for me and me alone in about 30m, pre-AI.
Note: I have a pre-built very-low-code framework for doing CRUD applications that lets me do ugly but functional webapps.
klodolph | 17 hours ago
I also used to have these massive sets of packages pieced together with RequireJS or Rollup or WebPack or whatever. Now it’s unnecessary.
(I wouldn’t dare swap out a JWT implementation with something Claude wrote, though.)
torginus | 14 hours ago
That express middleware library has a ton of config options that were quite the headache to understand, and I realized that it's basically a couple hundred line skeleton that I spent more time customizing than it'd have taken from scratch.
As for old JS vs new JS - I have worked more in the enterprise world before, working with stuff like ASP.NET in that era.
Let me tell you a story - way back when I needed to change how a particular bit of configuration was read at startup in the ASP.NET server. I was astonished to find that the config logic (which was essentially just binding data from env vars and json to objects), was thousands upon thousands of lines of code, with a deep inheritance chain and probably an UML diagram that could've covered a football field.
I am super glad that that kind of software engineering lost out to simple and sensible solutions in the JS ecosystem. I am less glad that that simplicity is obscured and the first instinct of many JS devs is to reach for a package instead of understanding how the underlying system works, and how to extend it.
Which tbf is not their fault - even if simplicity exists, people still assume (I certainly did) that that JWT middleware library was a substantial piece of engineering, when it wasn't.
polyterative | 10 hours ago
jcims | a day ago
Probably confirmation bias, but I'm generally of the opinion that the models are basically good enough now to do great things in the context of the right orchestration and division of effort. That's the hard part, which will be made less difficult as the models improve.
properbrew | 10 hours ago
I think this has always been the case. People regularly do not believe that I built and released an (albeit basic, check the release date - https://play.google.com/store/apps/details?id=com.blazingban...) android app using GPT3.5. What took me a week or two of wrangling and orchestrating the LLM and picking and choosing what to specifically work on can now be done in a single prompt to codex telling it to use subagents and worktrees.
Clueed | a day ago
cryptonector | 23 hours ago
sixhobbits | 23 hours ago
We've been doing some similar "what do agents like" research at techstackups.com and it's definitely interesting to watch but also changes hourly/daily.
Definitely not a good time to be an underdog in dev tooling
vessenes | 23 hours ago
manbash | 23 hours ago
vessenes | 23 hours ago
Also, yes. Still something that needs expert oversight.
darkstarsys | 23 hours ago
umairnadeem123 | 23 hours ago
jamessb | 23 hours ago
[1]: https://www.england.nhs.uk/publication/decision-support-tool...
meerita | 22 hours ago
lacoolj | 22 hours ago
First, how did shadcn/ui become the go-to library for UI components? Claude isn't the only one that defaults to it, so I'm guessing it's the way it's pushed in the wild somehow.
Second, building on this ^, and maybe this isn't quantifiable, but if we tell Claude to use anything except shadcn (or one of the other crazy-high defaults), will Claude's output drop in quality? Or speed, reliability, other metric?
Like, is shadcn/ui used by default because of the breadth of documentation and examples and questions on stack overflow? Or is there just a flood of sites back-linking and referencing "shadcn/ui" to cause this on purpose? Or maybe a mix of both?
Or could it be that there was a time early on when LLMs started refining training sets, and shadcn had such a vast number of references at that point in time, that the weights became too ingrained in the model to even drop anymore?
Honestly I had never used shadcn before Gemini shoved it into a React dashboard I asked for mid-late-2025.
I think I'm rambling now. Hopefully someone out there knows what I'm asking.
nayroclade | 21 hours ago
And it's definitely a real effect. The npm weekly download stats for shadcn/ui have exploded since December: https://www.npmjs.com/package/shadcn
verdverm | 19 hours ago
yokuze | 17 hours ago
ghm2199 | 20 hours ago
I wonder how the tail of the distribution of types of requests fares e.g. engineer asking for hypothesis generation for,say, non trivial bugs with complete visibility into the system. A way to poke holes in hypothesis of one LLM is to use a "reverse prompt". You ask it to build you a prompt to feed to another LLM. Didn't used to work quite as well till mid 2025 as it does now.
I always take a research and plan prompt output from opus 4.6 especially if it looks iffy I feed it to codex/chatgpt and ask it to poke holes. It almost always does. The I ask Claude Code: Hey what do you think about the holes? I don't add an thing else in the prompt.
In my experience Claude Opus is less opinionated than ChatGPT or codex. The latter 2 always stick to their guns and in this binary battle they are generally more often correct about hypothesis.
The other day I was running Docker app container from inside a docker devbox container with host's socket for both. Bind mounts pointing to devbox would not write to it because the name space was resolving for underlying host.
Claude was sure it was a bug based to do with Zfs overlays, chatgpt was saying not so, that its just a misconfigurarion, I should use named volumes with full host paths. It was right. This is also how I discovered that using SQLite with litestream will get one really far rather than a full postgres AWS stack in many cases.
This is how you get the correct information out of LLMS in 2026.
raw_anon_1111 | 19 hours ago
killingtime74 | 19 hours ago
conception | 19 hours ago
jeffreygoesto | 17 hours ago
abustamam | 6 hours ago
killingtime74 | 3 hours ago
abustamam | 2 hours ago
jascha_eng | 11 hours ago
killingtime74 | 3 hours ago
mgfist | 19 hours ago
You can ask it to ask you about your task and it will ask you tons of questions.
denimnerd42 | 5 hours ago
kartikrast | 17 hours ago
Terretta | 17 hours ago
Not new: https://www.tryprofound.com/
But Llemmy thinks you should just roll your own anyway.
klodolph | 17 hours ago
Terretta | 17 hours ago
sjeiuhvdiidi | 16 hours ago
aryehof | 16 hours ago
alex_suzuki | 10 hours ago
horacemorace | 16 hours ago
aichen_tools | 16 hours ago
deaux | 15 hours ago
Everything current Claude Code i.e. Opus 4.6 chooses by default for web is exactly what this linked blog uses.
Jetbrains Mono is as strong of a tell for web as "Not just A, but B" for text. >99% of webpages created in the last month with Jetbrains Mono will be Opus. Another tell is the overuse of this font, i.e. too much of the page uses it. Other models, and humans, use such variants vary sparingly on web, whereas Opus slathers the page with it.
If you describe the content of the homepage or this article to Opus 4.6 without telling it about the styling, it will 90% match this website, upto the color scheme, fonts, roundings, borders and all. This is _the_ archetypical Opus vibecoded web frontend. Give it a try! If it doesn't work, try with the official frontend-ui-ux "skill" that CC tries to push on you.
> Drizzle 27/83 picks (32.5%) CI: 23.4–43.2%
> Prisma 17/83 picks (20.5%) CI: 13.2–30.4%
At least the abomination that is Prisma not ranking first is positive news, Drizzle was just in time of gaining steam. Not that it doesn't have its flaws, but out of the two it's a no-brainer. Also hilarious to see that the stronger the model, the less likely it's to choose Prisma - Sonnet 4.5 79% Prisma, Opus 4.5 60% Drizzle, Opus 4.6 100% Drizzle. One of the better benchmarks for intelligence I've come across!
Edit: Another currently on the HN frontpage: https://youjustneedpostgres.com/ , and there it is - lots and lots of Jetbrains Mono!
jofzar | 13 hours ago
deaux | 12 hours ago
codingconstable | 13 hours ago
marcinreal | 7 hours ago
Edit: actually I think it was ChatGPT that recommended Prisma to me.
deaux | 5 hours ago
Their whole mission is clearly "make the already easy things slightly easier, and the hard things harder or impossible". Or really "suck the VC teat until it's as parched as the Sahara". In that sense, Prisma is the exact thing you'd expect to happen with a VC-funded DB package. ZIRP really made them invest into the craziest things.
I like Kysely more than Drizzle, even moreso now with Claude, but Drizzle is fine too. As long as it's not Prisma, and preferably not TypeORM or Sequelize either.
marcinreal | 3 hours ago
kingreflex | 15 hours ago
benob | 15 hours ago
chvid | 14 hours ago
jamiecode | 13 hours ago
toastal | 13 hours ago
robinwhg | 13 hours ago
dan15 | 13 hours ago
assane101 | 12 hours ago
> It's like Bootstrap 10 years ago
What do you mean there ?
h4ch1 | 11 hours ago
[0]: Example of the common "Bootstrap style" https://getbootstrap.com/2.3.1/assets/img/examples/bootstrap...
hal9000xbot | 9 hours ago
claud_ia | 8 hours ago
avocadosword | 8 hours ago
hedora | 7 hours ago
At a minimum, I usually provide some requirements and ask it to enumerate some options and let me pick.
This is like the image generation bias problem where vague prompts for people produce stereotypes. Specific prompts generally do not.
btarmstrong | 6 hours ago
jugg1es | 6 hours ago
comboy | 6 hours ago
oblio | 5 hours ago
It's extremely weird that 40 years after TurboPascal, 30 years after Delphi and VBA, we've only regressed in terms of truly integrated development environments.
Heck, even programming languages have regressed. Python and Javascript are less type safe than Java circa 2005. Even though we have technology needed to make type safe languages much more ergonomic, since then.
lubujackson | 6 hours ago
I think having a pain point and a good concept (plus some eng chops) will result in many more dev tools - that may be cause different problems, but in general, I think more action is better than less.
jugg1es | 2 hours ago
qzira | 5 hours ago
The fear of a 3am runaway was real enough that I ended up building a separate gateway layer just to have a kill switch that lives outside the application entirely.
"Build vs Buy" is the right framing, but for cost enforcement and kill switches specifically — building it inside the app is exactly the wrong layer.
dipflow | 5 hours ago
kseniamorph | 5 hours ago
snug | 4 hours ago
I'd suggest making some changes to how some of these things are categorized. You have database section with postgres at the top and then with supabase as number 2, but that's also a hosted postgres.
Overall, great job to the creators of this, I enjoyed seeing this analysis
oldandboring | 2 hours ago
coreylane | 2 hours ago