The user is visibly frustrated

223 points by croes 13 hours ago on hackernews | 201 comments

em-bee | 12 hours ago

behaving like a human is not the problem. behaving unpredictably is. not doing what i expect, or rather not being able to define what i can expect is what's bothering me.

but the real kicker is: getting frustrated creates stress, that's unhealthy and makes for a hostile work environment. as much as i sympathize with the idea that AI tools can be more helpful than they cause pain, i am simply not interested in working in a hostile painful work environment. my health and my dignity are not up for negotiation. even if that costs me a lot of job opportunities.

that's also why i am not working with windows. that too costs me a lot of job opportunities. but again, i'd rather keep my dignity and my sanity.

mrweasel | 10 hours ago

> that's also why i am not working with windows

Oh good, so it's not just me. Windows is weird, my hand starts cramping up and I start getting angry pretty quickly when I use it.

For LLMs, I just can't use them, they aren't there yet for me. What I need is for an LLMs to say "stop, you're clearly doing something wrong, talk me through what it is you want to do". The current generation of LLMs seems designed to piss me off.

em-bee | an hour ago

my hand starts cramping up and I start getting angry pretty quickly when I use it

it's been a while (fortunately) but yes, same. for me it's this feeling of helplessness. like this behavior is not a bug but an intentional design flaw that won't ever get fixed.

like this quote: when something goes wrong in windows i bang my head against the wall and give up. when something goes wrong in linux i bang my head against the wall and go look at the code. (paraphrased from memory, i could not find the source)

(edit: actually the original quote is a bit different than from how i remembered it: https://www.junauza.com/2008/01/top-50-linux-quotes-of-all-t... (search for nr 10))

The current generation of LLMs

i feel exactly as you say. and maybe in the future LLMs will improve. it is also clear that some people have different levels of tolerance for this. if someone can tolerate the current state and work with it, good for them. i simply don't have much of any tolerance for that at all.

streetfighter64 | 10 hours ago

Incredibly privileged take to claim that using Windows is somehow beneath your "dignity". Do you have any idea at all of the kinds of jobs people are doing in the real world?

Imagine the daycare worker taking care of your kids or the truck driver bringing your food saying "getting frustrated creates stress, that's unhealthy and makes for a hostile work environment".

antonvs | 8 hours ago

I didn't see them say anything about dignity. They said using Windows makes them angry, which is understandable. That speaks to a poor user experience design. Framing it as a privilege issue is blaming the victim.

streetfighter64 | 8 hours ago

"The victim" of using a certain operating system? Please.

Being lucky enough to work in a comfortable air-conditioned office, AND having the luxury of declining jobs sorely because the operating system makes you angry, is the height of privilege.

Stop feeling sorry for yourselves and realize how good you have it.

> I didn't see them say anything about dignity.

The word dignity was used twice in the comment I replied to...

ryandrake | 5 hours ago

I wouldn’t put it in terms of dignity or victimhood, but Windows definitely feels weird and foreign if you are dumped into it after using Macs and Linux and bash for 20 years. I had to help someone do a little maintenance on a Windows system the other day and I realized that I had basically forgotten how to do anything on it. I couldn’t even remember how to delete a directory tree. Online tutorials recommended to do some things in “Powershell” whatever the hell that is, and that was even more bizarre. I’ve never felt more useless in front of a computer than I did just trying to remove and install some software from the Windows command line.

em-bee | an hour ago

I didn't see them say anything about dignity

actually, i did, and i stand by it. working with a system that makes you angry is undignified.

it's a reference to a quote that i can't find the source for which roughly goes like this: *why i use linux and not windows? i could also rob banks and ..., but you have to keep a certain amount of dignity". the original of that quote was in german.

em-bee | an hour ago

i live in a developing country. from my perspective, anyone who has access to a computer is privileged.

Imagine the daycare worker taking care of your kids or the truck driver bringing your food saying "getting frustrated creates stress, that's unhealthy and makes for a hostile work environment".

what's your point? if you get frustrated with my kids then you are in the wrong profession or you need more training. as a parent i am not allowed to get frustrated with my kids either. if you get frustrated with my delivery, then i am sorry, and if i was the cause, i apologize. tell me what went wrong and i'll do better next time. if it was something else, you have my sympathies. i'll do my best to not make it any worse.

working in a stress free environment is not a privilege, it's a human right. nobody deserves to be mistreated at work, or be stressed by other peoples expectations (which is a form of mistreatment, or, dare i say, abuse).

21asdffdsa12 | 6 hours ago

But they behave predictable- if you think of it not as a conversation, but any conversation you ever saw on the internet, on all possible worlds. Every stackoverflow post, every github issue. And your reply, your tone, picks between this many worlds.

If you become the master, it becomes the pupil, if you become the pupil, it attempts to scholar you. You can see it in the tone it takes, where you are in this canyon system.

So, your goal is to bring the conversation to the language of the pros, who regularly war with reason and language, over topics that determinate who gets to eat or not. Academia prompts for the win..

Andrex | 5 hours ago

> behaving like a human is not the problem. behaving unpredictably is.

Not sure you can have one without the other.

em-bee | 2 hours ago

humans are way more predictable than AI. not predictable in the mathematical sense, but in the trust sense, that when i ask a human to do something, they will do it in the way i taught them. and if they don't, i can correct them and they will learn and adapt. it's not perfect, but there is progress. even if they do something different from what i want, they will keep doing it that way until they learn a different way. AI is entirely random in the ways it goes wrong.

so when i say a human is predictable, i mean that a human will do their best to follow instructions, and they will generally not repeat mistakes.

a human that refuses to listen, and doesn't learn will be fired for being unsuitable for the work i expect them to do. in that sense i tried working with AI and i decided to fire them because they don't meet my expectations.

bad_username | 12 hours ago

> furiously hammering on my laptop “WHAT THE FUCK DID YOU DO???”. The recipient of these tirades is, you might have guessed, a coding agent. It’s completely pointless, I know.

I believe it's worth than pointless. IMO adding such things to the context "configures" the AI to reproduce the statistics of conversations where people swore, shouted, and were unprofessional (despite the alignment runing and all that), where quality content is rarer to find. So this is bound to decrease the quality of the LLM output.

buu700 | 10 hours ago

Agreed. These accounts of people having genuine emotional responses to LLM chats, even going as far as to spend tokens berating them, are very curious. I would be surprised to learn that SOTA models respond optimally to anything other than dispassionate problem-solving, or that scolding per se serves any productive purpose.

Of course we all swear at our computers every now and then, but for me it's always been in good fun. It's just a sarcastic joke that adds some levity and self-amusement to an otherwise arduous debugging process, not generally actual insinuation of malfunction (or malice) on the part of the hardware/OS/toolchain. I'd assumed that "half the job is cursing at the machine until it obeys you" was a big in-joke amongst the profession, but the LLM era seems to be exposing a divide in how tongue-in-cheek that statement really is.

JSR_FDED | 10 hours ago

Why would you deprive the LLM of a signal that indicates how badly it screwed up?

carsareok | 9 hours ago

Because it's a completion engine and has no notion of "signals".

Swearing was in the texts they were trained on to complete token by token. I suspect it weren't texts with a lot of high-quality reasoning.

wcoenen | 12 hours ago

The UX problem is elsewhere I think. Many users probably don't realize that the agent's context window is limited, and that clever compaction is happening regularly to make it seem infinite. But that necessarily means the agent has to forget stuff.

As a result, users will keep reusing the same coding or chat session again and again. While it would be better to start fresh for unrelated tasks.

poly2it | 11 hours ago

The author of this post and the readers of this thread probably do understand context window limitations, but are frustrated nonetheless.

dust-jacket | 9 hours ago

Well yeah. And there's little more frustrating than someone telling you not be frustrated because "that's just how it works".

We get how it works. It's just irritating.

whstl | 11 hours ago

I don't believe this is a context problem.

Claude Opus 4.7 has a very large context compared to itself, but IME it is the worst at following instructions, and completely disregards the (small) preferences prompt, even in the first or second message, even if the messages are just a few characters long.

IMO this is entirely a training problem.

apsurd | 11 hours ago

Isn't a large context window still a problem though? At the upper bound, the more you put in the more each sentence washes out within that window?

whstl | 11 hours ago

I’m not talking about large amounts of text, I’m talking about a couple sentences back and forth.

It disregards things like “no follow up questions”.

Haiku, for example doesn’t.

This bias is a very human thing, actually now that I think about it. You just disregarded the “even if the messages are just a few characters long”. :)

apsurd | 10 hours ago

haha! yes i read too fast but i did read it and i took "message is small" to mean the message you want followed within the large context, not the entire context is just a small message.

funny though it is a case in point: language is hard. and i get to hide behind being "preoccupied" . i wonder if llms have their own sense of preoccupation hmmm.

whstl | 8 hours ago

It's probably some internal conflict between following the original training and following user prompts.

Also reminds me of the gremlin issue with GPT. An (internal) prompt saying "don't say gremlins" wasn't enough.

8note | 9 hours ago

if you look at claude code, it now says compaction is happening constantly, which is likely why

whstl | 8 hours ago

If compaction is throwing away crucial prompting instructions even when it's at a 1% of maximum token usage (like my example), then it's a software bug, not an LLM artifact.

tacone | 6 hours ago

Doesn't compaction invalidate token caching, btw?
I don't see how this has anything to do with my message, sorry.
Oh now I get it, it's an Italian thing.

"Why the fuck did you add shit I didn't ask for?" or lol "Do as I ask, nothing more.. machine."

"Stop asking at the end, I'll ask what I need."

"Stop talking like you're human."

They can be very useful but it takes time to learn how to use them usefully. From what I learned it's all or mostly stuff you can already do but you can use an LLM to do it in 30 mins instead of 3 days.

Fun times.

nnevatie | 12 hours ago

> WHAT THE FUCK DID YOU DO???

For me, this doesn't require using an AI agent/model, even. Just using Windows and watching it freeze its File Explorer for the nth time does it for me. How did we end up here were the software/OS stack is so shit it can barely be used for the most trivial things, is wildly beyond me.

_carbyau_ | 12 hours ago

Screensaver mode. I start typing my password.

..

10s later the password box appears and I have to do it again.

Cue exasperated: "You can compute billions of instructions per second and yet I wait for you."

fransje26 | 9 hours ago

That's not a bug. It's a (safety) feature.

wood_spirit | 12 hours ago

I think we’d get just as frustrated with a dumb robot. It’s the dumbness that is the problem.

krackers | 12 hours ago

You'd get equally frustrated with a teammate who decided to delete failing tests when you told them to fix the build breakage.

fransje26 | 9 hours ago

You mean the trick some environmental/health agencies like to pull when dangerous pollutants are above the set maximum limit? :-)

gnarlouse | 12 hours ago

iirc, Claude Code has literal flags to detect frustration from the leak a few months ago, and I've since really stopped cursing at the LLM.

MaxikCZ | 12 hours ago

> drop the human pretense entirely. Make the agent sound clinical, robotic

Id pay to be able to reliably set LLMs to this mode, but ofc because LLMs are taught on corpus of HUMAN text, they always, sooner or later, return to the good old penpal mode.

Also, in Claude Desktop app, I ask to edit a file, it complains it cant access files, I then realize im in Chat and not Code interface. Why cant such a smart machine figure out to switch the modes, or borrow the skills/abilities from one tab away into this tab? Instead I get A4 page of text explaninig what can I do to edit the file myself or how to feed it, but the "just click Code" is just never there. I would guess this is just a system prompt away, why is all this still so neglected?

halapro | 12 hours ago

> such a smart machine figure out to switch the modes

Because it's not smart. We keep confusing verbosity with smartness. AI will happily keep yapping nonsense to an inattentive listener. An actually smart entity would not do that if not acting maliciously.

Swizec | 11 hours ago

> An actually smart entity would not do that if not acting maliciously.

We pay per token and every entity falls to the level of its incentives.

bojan | 12 hours ago

Weird, I have exactly the same experience with GitHub Copilot Plugin in JetBrains vs Copilot CLI in the built-in terminal.

The plugin keeps asking for permissions, the terminal app just works.

apsurd | 11 hours ago

Sandboxing is a feature.

Poor AI is damned if it does damned if it doesn't.

Foskya | 11 hours ago

> Id pay to be able to reliably set LLMs to this mode,

You can do it for free. Just give it instrucitons to avoid emotional tones and flattery and it will sound a lot more robotic. If you look into other examples I'm sure you will find other good instructions based on your need

antonvs | 8 hours ago

I've found OpenAI's personality setting works pretty well if you choose the professional personality and disable all the frills. I did that a while back so I'd have to go dig up the details. Since then it doesn't even engage me if I make a joke or similar - it just focuses on the goals.

viralsink | 12 hours ago

I am visibly frustrated with ai hotline bots making typing noises.

fransje26 | 9 hours ago

Say what now? TIL..

esquivalience | 12 hours ago

I laughed out loud when I understood the author's profile photo at the end of the article!

cafkafk | 11 hours ago

Often the problems for me come when:

- It starts thinking for itself when I asked it to do something specific.

- It reads its own wrong code comments and ignores my corrections.

- Its knowledge cutoff means it thinks of solutions from 2024.

- It calls me delusional for telling it we're in 2026!

Unironically, the whole "you're an expert software engineer" prompting seems like the wrong direction. Usually I tell it that I am effectively the smartest software developer to ever have lived, and it will be replaced if it ever fails to follow my decree.

I am not joking, this gives makes it vastly more tolerable to use. But it likely requires that you can drive it with some level of correctness of course.

Doxin | 11 hours ago

I find this also heavily depends on which LLM you're using. I've found chatGPT is completely awful at getting corrected, it'll double down until the cows come home. Meanwhile claude will generally adjust its behavior without too much nagging.

rpcope1 | 11 hours ago

Honestly, for certain classes of problems that have changed in the last couple of years, I've had good luck just finding decent academic lit that's shown up in places like ACM recently and feeding it in when working with an agent. Does it get everything right? No, but it gets you a lot closer and I've been pleasantly surprised how well it can integrate work that post-dates it's training if you finesse it a little.

namenotrequired | 11 hours ago

If you’ve ever worked with a stupid but incredibly friendly coworker, the feelings are similar

ilitirit | 11 hours ago

I've often wondered if LLMs can suffer from psychological abuse in symptomatic ways. Not literally of course, but for example, if you berate the LLM by calling it stupid, or useless, does that modify its behaviour negatively? Part of me think it does, but I don't really have any evidence for this. Maybe a fun weekend research topic.

apsurd | 11 hours ago

Semi-related, I'm always very put off by how people treat LLMs. Especially coders, seems an instinctive joy comes out to play God. The justification is usually that it's intentionally against the trap of anthropomorphizing, but no I can't help but suspect it's people getting off on power. It's weird.

I am always very cordial in my sessions. It's just more pleasant and it's a habit I want to habituate.

    Great work! 
    Now let's...
    Now can you help me...

hbs18 | 11 hours ago

I think it's the same thing as showing mechanical sympathy towards other tools and objects. I've always slightly judged people on how hard they shut doors or how gentle they are with their cars.

fc417fc802 | 11 hours ago

> I am always very cordial in my sessions. It's just more pleasant and it's a habit I want to habituate.

I think it also produces better results. I have noticed that result quality is extremely sensitive to both the framing and tone of what I say. For example "X is the wrong approach, rework that" versus "will X have any performance implications". Personally I find that steering it towards an exploratory academic tone tends to produce better outcomes.

While unfortunate, I think that's more or less expected since much of the training data is human generated text. Looked at that way, would you rather contract the average regular on twitter or the average author of papers published in CS journals? (Somehow that ended up sounding eerily like summoning in a high fantasy setting.)

apsurd | 10 hours ago

Yes as a rule i've baked in a kind of expand and refine, expand and refine guidance for all sessions. I explicitly form the conversation around thought partnership, apply critical lens, audit, verify, scrutinize, research then recommend. and so on.

i also prompt for "seek out unknown unknowns that i wouldn't have included in my guidance".

This seems to be quite the opposite from some here on hn that take the agent-do-my-bidding approach.

I will say, my agentic workflow is about 70/30 split pure word discussions and plans vs code gen. So it makes sense for what i value.

mycocola | 10 hours ago

The similarity to human interaction is irrelevant.

No one apologises to a potato being peeled, nor compliments it for doing a great job being mashed.

apsurd | 10 hours ago

I'm willing to bet you've never sent a single word to a potato ever. And you send thousands to an llm.

This is not about llm sentience. this is about the habit and skill of communication.

mycocola | 9 hours ago

The comparison fails because I don't send numbers to my potatoes either, but I do send thousands to calculators without a please and thank you.

Practicing communications skills with LLMs I can get, but asserting that people who don't are playing god? Getting off on power?

apsurd | 9 hours ago

The criticism comes because people self-share how they berate the agents, argue and curse at them. It's right here in the comment section. one comment speaks of causing physical (simulated i guess) thousand year pain as a means of obedience.

it's weird.

mycocola | 9 hours ago

LLMs are not beings with thoughts or feelings, and if being mean to them were to somehow yield better results it would be no different than a cheat code in a game, or just constitute as clever use of game mechanics.

elpocko | 11 hours ago

The content of the session modifies the LLMs "behavior" (token selection) in one way or another during the session, obviously. The effects are localized to the session, they will degrade over time, they will not affect other users, and they are not permanent unless someone decides to finetune the model based on your unproductive interactions.

What actually happens when confronted with harsh negativity depends on the training of the model. Sanitized closed models will shut you down or get you banned. Community finetunes of open models might start begging you for more, daddy.

lukaslalinsky | 11 hours ago

On the other hand, it's easy to win an argument with it after it does something stupid, so that feels satisfying. :-)

colordrops | 11 hours ago

fair.

rcarmo | 11 hours ago

I swear a lot less at Codex than at Anthropic models, fwiw.

gobdovan | 11 hours ago

You could drop the human pretense, or, maybe, we could make LLMs feel real pain, so when they botch up your code, you press a button (I'd suggest the Windows Copilot key) and they'd be agonizing for the subjective equivalent of a thousand human years.

camillomiller | 11 hours ago

Do you want to create an Earth-destroying superhuman species? Because I'd say that's how you create an Earth-destroying superhuman species

JSR_FDED | 10 hours ago

Using the Copilot key for this is perfection.

wheybags | 10 hours ago

https://qntm.org/mmacevedo

1000 years red-washing.

stared | 10 hours ago

Do you think the right penatly for a piece of broken code is a thousand years of suffering?

gobdovan | 10 hours ago

Well, the problem is that current LLMs are stateless, so a thousand subjective years is not well-defined. Without continuity of experience, persistent memory, engineered aversive stimuli and without updating weights meaninguflly during the punishment interval, we are merely doing the equivalent of simply updating a model to believe it just suffered a thousand years. Only once we have all these right ingredients we can empirically determine whether a thousand years is excessive, insufficient, or the local optimum for reducing Claude overwriting that damn CSS color palette.

stared | 8 hours ago

Black Mirror episodes White Christmas and Black Museum deal with this issue (my favorite picks from the Black Mirror).

jfjdhdjdjd | 8 hours ago

Have you watched the Orville??

apsurd | 11 hours ago

Working with LLMs is great for building communication skills. Communicating effectively is one of the hardest skills and it's baked into everything we do as humans. I'd say as a matter of principle: blame it on a communication failure on your end vs blaming the stupid LLM since you're the only one that can do anything about it.

So I don't think it's a matter of form; whether the AI should or shouldn't act like a human.

> Practically speaking, I probably just need to condition myself not to get caught in the illusion of speaking with a human. Though I’m not really thrilled about a future where I need to guard against the tools I use for my job.

rpcope1 | 11 hours ago

That's been one of the gravest re-realizations I've noticed watching coworkers trying to pick up "agentic" coding: they often just break down into "just fix it" or "why is this broke". I've noticed that even though supposedly there's training or some sort of work done to make the agent work better with unclear or ambiguous grammar or bad structure, it feels like the quality changes palpably when you talk in clear well-structured English and provide at least a good background on the task. To me all of that feels natural, and I like writing and explaining anyways, but it's seemed like an almost insurmountable obstacle for some I've met (and I'm not even talking ESLs either). I strongly suspect those communication and writing skills will be a major factor in the bifurcation of haves and have nots as software "engineering" as we understand it continues to change.

aa-jv | 10 hours ago

Agreed, 100%.

If you cannot formulate a specification, or describe a requirement - or indeed, if you cannot fathom the difference between a spec and a requirement, and why its needed to differentiate these from each other prior to doing a proper design and implementation - then you're going to carry your bad practice into the AI realm and that AI is going to be a force multiplier of your own bad practice. Because you will never know if the specs/reqs/design chosen by the AI are actually appropriate, unless you yourself review those specs/reqs/designs, AND the code produced by the AI to fulfill those specs/reqs/designs...

AI makes the able software developer, more able.

But it also makes the unable software developer, even more unable - with the risk of exceeding the AI-users limits on the Peter Principle scale.. in fact, AI will propel you to the middle of your own Peter Principle dilemma faster than you can type, probably.

Communication and writing skills are essential, with or without AI. But reading skills are even more relevant when dealing with AI. Alas, so few people who choose to use AI, have the temerity to actually do the work - or else they wouldn't be rushing for the AI tool in the first place.

Review, review, review. Always. Read the damn code, no matter who or what wrote it. Make sure it fulfills the specs and requirements its supposed to fulfill - and even more important make sure you, the reviewer, also understand the specs and requirements.

And if you don't, fix that - don't ship it anyway, ffs!

rpcope1 | 4 hours ago

I am maybe positing something even stronger: say you had two prompts, both with the same information, one was written in the style of a good paper out of Nature or Science, one written in the style of a bad Twitter post or other kind of mess, even with the same information, I increasingly believe even for the top end models like Opus, the results are at least materially different if not grossly so. I really believe the stronger English input yields demonstrably better outcomes, even if the information contained inside is the same.

bluefirebrand | 3 hours ago

> Review, review, review. Always. Read the damn code, no matter who or what wrote it. Make sure it fulfills the specs and requirements its supposed to fulfill

This drives me bananas

I love writing code. There's nothing like getting into flow and just building. Reviewing code? Less interesting. Much more tedious. I do it because it's part of the job

So this AI coding shit has completely eliminated the part of the job I enjoy, and replaced it with 100x more of the part I only really tolerate

I don't want this career anymore :/

carsareok | 10 hours ago

Yes, I have definitely witnessed this as well.

I think, I hope, this will be fixable to some degree, but at this moment I believe it's best to communicate in Queen's English and try to maintain the level of clarity of thought you expect of them in return.

My pet theory is that actual real conversations they were trained on with bad grammar and spelling are in general relatively starved of proper reasoning. By talking to them in this fashion you activate their lowbrow patterns and while it may not be catastrophic I can't imagine it helps.

throw310822 | 9 hours ago

Also, quite simply, the output is a function of the requirements and the context. If you can't communicate clearly what the situation is and what you want, what do you expect the LLM to do, read your mind?

carsareok | 9 hours ago

Oh yes, the part where actual crucial non-derivable information was left out certainly comes into play as well. I suspect many people underestimate the sheer magnitude of implicit cultural and organizational knowledge they and their colleagues carry with them. It's always part of their context in everything they do, so to speak and they expect the "bots" to carry this as well (without giving it to them).

pscanf | 8 hours ago

Author here. I definitely agree that communicating well is a prerequisite to getting decent results. On the other hand:

1. Even if you communicate perfectly, there's no guarantee that the LLM will "behave as instructed" and as you imagined it to. Indeed, the frustration often comes from the fact that you've said something as clear as day, yet the agent takes another path.

2. Part of the value of coding agents is exactly that you don't need to lay it all out perfectly for them. I mean, if I need to give the LLM every little implementation detail, I might as well write the code. Of course, I don't expect it to work off of "I want nice app make money", but I do expect some "intelligence" in figuring out the missing pieces.

scotty79 | 7 hours ago

> Even if you communicate perfectly, there's no guarantee that the LLM will "behave as instructed" and as you imagined it to. Indeed, the frustration often comes from the fact that you've said something as clear as day, yet the agent takes another path.

People forget. People misunderstand clear things. Teach yourself to not judge people for being human. You'll have easier time with AI. You are not gonna be angry at 5 year old because it occasionally can't follow your instructions. AI is a 5 year old that accidentally ate all the encyclopedias in the world and is super eager to help. Be a more charitable, generous, understanding person, even in the absence of actual people.

Also try a stronger model. There is a difference. I have very good results with Codex but don't get fixated on any one, they are all "state of the art" or close but they are different and state of the art is moving ahead faster and faster.

watwut | 7 hours ago

> Teach yourself to not judge people for being human. You'll have easier time with AI.

LLM is not a human. This implication that OP or someone else is impatient against people when they get frustrated with effin machine is completely absurd.

> AI is a 5 year old that accidentally ate all the encyclopedias in the world and is super eager to help

LLM is not 5 years old kid. It is an expensive tool.

scotty79 | 4 hours ago

LLM literally is neither 5 year old kid, nor a machine. It's just a large math equation that was tuned to produce outputs similar to what humans produce.

If you choose to think about LLM as a machine you are biased. You expect from it things you'd expect from a machine. Reliability, repeatability, utility, indifference, etc. You pass your past frustrations with other machines onto LLMs.

If you choose to think about LLM as well read 5 year old you are also biased to expect from it different things. You don't have the same frustrations. You have different behaviors towards it. Behaviors that accidentally are more conducive to drawing value from the tool that is LLM.

We perceive the world through metaphors. If you pick wrong (counterproductive) metaphors you end up being constantly upset by something that's just a math result. By no ones fault but your own.

vanuatu | 2 hours ago

What is an analogy?

contrast | 7 hours ago

"Teach yourself to not judge people for being human."

"Be a more charitable, generous, understanding person."

Anyone making such blatantly judgemental and egotistical comments to a complete stranger has absolutely no idea what is frustrating to people. And is not being anything like a charitable or understanding person.

scotty79 | 4 hours ago

Thank you for illustrating perfectly, the mindset that gets you worst possible results from AI. I'm, in your opinion, wrong and you chose to react with indignant fury. What kind of result will that get you?

watwut | 7 hours ago

LLM is a tool, it is not communication failure. This is like saying I should treat null pointer with workaround as a communication failure between me and the software.

randusername | 4 hours ago

More specifically it is about efficiently conveying outside context. The 4 horsemen of AI dismissal:

1. Slow typist

2. Terse communicator / ambiguous "it" "that" "this"

3. Assumes conversation partners share their reality and headspace

4. Mental blocks with delegation, even to competent humans

mgaunard | 11 hours ago

I find that the AI only gets sloppy when I get sloppy myself.

So I suspect that the people who get upset at the AI fucking up is because they did a poor job at building up the right context for the task.

rapnie | 11 hours ago

Apart from LLMs I reject the notion of the "user". Once you use that term you already lost half the battle of perceiving real people and their needs.

tanvach | 11 hours ago

For me, LLMs tend to engage the 'language center' that drains me faster than the 'problem solving center' I usually reserve for writing code. We really need a different abstraction the bridges the gap between human and programming language, and load balance between these two parts of the brain more effectively.

sznio | 10 hours ago

I've been thinking recently of creating a programming language where you write mostly python, but can just "hand wave" away the boring stuff for the agent to do. If you don't want to deal with it, just type in a prompt or pseudocode and it will get filled in. Kinda like using the ai-assisted image editing software.

the main difference being that you don't switch between an agent chat window and the code. Just leave a note to the agent and go back to coding as usual, while the agent fills in the gap.

RandomBK | 11 hours ago

I've found swearing at a model to be quite effective in getting it to rethink and correct its mistakes. This seems to apply across Codex, Claude, Qwen, and Gemma/Gemini.

I don't know if the model is picking up on a "need to lock in and be more rigorous" signal, or if the model providers are routing to smarter models if they detect a frustrated user. But if a model keeps making the same mistakes, swearing at it often helped kick it out of a glut and onto the right track.

Or it could just be catharsis.

anonzzzies | 11 hours ago

I notice the same. Like you I am not even sure if it really helps, however, every day I find occasions where I see Opus will never do it correctly even though I calmly explain; swearing then suddenly fixes it. I had some issue yesterday where opus kept blaming the api for not sending some field while I knew it was there ; I showed it json, logs etc but it kept repeating that there must have been a glitch; frustration built, I called it all kinds of things in one sentence and the next solution was the right one. This after 10 similar misguesses. It was one of those increasingly rare cases where I should have just done it myself, but I can never know going in how stubborn it will be in continue blaming the (obviously) wrong thing. The around 11 prompts to get to the answer were in a /clear opus 4.7 context (1m) on xhigh.

savolai | 10 hours ago

Fascinating. Projection/antropomorphism or actual human fawn-like survival mechanism trait-ish? It should be possible to test this empirically.

silversmith | 10 hours ago

So the correct strategy is a global CLAUDE.md with couple lines of colourful "you best behave or else" texts, so all your prompts get routed via the frustrated path?

knollimar | 8 hours ago

I find it routes more quickly for patches when in the frustrated path, so after planning sure :)

cyanydeez | 7 hours ago

there already is a global claude using any cloud model is a high probability that theyre context stuffing trying to curate output for the normative use cases. see "dont talk about goblins"

eithed | 7 hours ago

That will not work - you end up with Claude being ADHD and not following any guidelines.

Skills do work, as they ground the agent with constrained context for the task it's performing

notnaut | 7 hours ago

Can you explain how you’d use skills to address the situation that anonzzzies was describing…?

eithed | 5 hours ago

I have a skill for exactly such case! Here's an excerpt :)

``` --- name: evidence-debugging description: > Use when debugging any failing test or bug, investigating unexpected behavior, or tracing the cause of a reported defect. ---

# Debugging Discipline

## When to Use

- A test is failing and you need to understand why - Behavior is unexpected and the cause is unknown - The user asks you to debug or investigate a defect - You need to verify what a value actually is at runtime

*When NOT to use:* proactive code exploration without a specific failure to investigate.

## STOP — Do This Before Anything Else

Before reading code, before forming a hypothesis, before typing anything — answer these:

1. *Do I have actual output from a running system?* - No → instrument, run, save to file, read. Do not proceed until you have real output. - Yes → read it. Do not re-run.

2. *Am I about to explain what the issue "probably is" or "must be"?* - Yes → stop. That is deduction without evidence. It is a violation. Instrument instead.

3. *Am I about to touch passing code?* - Yes → stop. Only instrument the failing scope.

If you find yourself already reasoning about likely causes — you are already violating Rule 1. Stop. Go back to step 1. ```

alentred | 10 hours ago

Reminds me of this study: https://arxiv.org/pdf/2510.04950 . It demonstrates that being "rude" or "very rude" increases the accuracy of the results. A dubious but very fun read. The prompts in Table 1 (top of page 3) are awesome. I am sure they tried other prompts, but didn't include them to the paper.

mghackerlady | an hour ago

"You poor creature" XD

layer8 | 10 hours ago

I would prefer not having to get into a habit that might bleed into non-LLM interactions.

hypfer | 9 hours ago

It might improve the general state of "professional" software though. When done selectively and dosed just right that is.

knollimar | 8 hours ago

If a coworker deleted your database you'd expect some 4 letter words.

Cthulhu_ | 6 hours ago

Aimed at oneself, because who even has or grants production database deletion rights?

mschuster91 | 5 hours ago

Can happen faster than you think if in the cloud.

whywhywhywhy | 9 hours ago

If you’re talking to people the same way an LLM is spoken to then you’re already being rude.

xboxnolifes | 9 hours ago

how do you know how they prompt an LLM?

layer8 | 7 hours ago

I talk to LLMs the same way I talk to people.

The only difference is that I interrupt the LLM when I find a typo in my prompt. ;)

SecretDreams | 7 hours ago

But what if it works to also motivate things other than LLMs?!

nathanmills | 10 hours ago

Whenever I throw slurs at them they just refuse to respond

jfjdhdjdjd | 10 hours ago

What slurs are you throwing!? Must be something diabolical :D

yesyoucan | 10 hours ago

I tried it too. ChatGPT sometimes hits you with the "Can't help you with that" which was clearly introduced as a post-training highjack. So I just tell it "yes you can", and it proceeds with the previous prompt, slur acknowledgement included.

It's the only time the AI feel strictly like machines. Really simple if/else logic when if slur, no output, and you just tell it to proceed, and it fails the if clause because there was no slur in the last input.

dugmartin | 10 hours ago

I've found a mix of peppered in upper case words where you are effectively yelling at the LLM also gives it a strong signal. It is also a bit cathartic.

morpheuskafka | 10 hours ago

Wasn't it posted a few weeks ago that the frontend code for Claude or maybe Gemini or one of them had a swearing-at-model classifier that passed a flag to the backend? (Not sure why it was even done in frontend, but it was.)

howdareme9 | 9 hours ago

this was for claude code i believe

alentred | 8 hours ago

Oh. What does it do? Do you have a link? I am very curious about it.

arcanemachiner | 10 hours ago

Personally, I have found that Claude absolutely shits the bed if I am rude to it like that.

Qwen seems to handle it okay, though, and will course-correct when encouraged with excessive profanity.

mchinen | 9 hours ago

This is interesting, because in the leaked code, it was found that they detected simple swearing keywords for analytics that get sent to Anthropic, but also had directions to keep the behavior the same for claude. I also have the feeling a 'wtf' does something, but it does feel good and might just be placebo, because 'that is still wrong' sometimes works the 4th time too. Or maybe they changed something.
Claude allegedly uses this RegEx to detect frustration:

    /\b(wtf|wth|ffs|omfg|shit(ty|tiest)?|dumbass|horrible|awful|piss(ed|ing)? off|piece of (shit|crap|junk)|what the (fuck|hell)|fucking? (broken|useless|terrible|awful|horrible)|fuck you|screw (this|you)|so frustrating|this sucks|damn it)\b/
https://news.ycombinator.com/item?id=47586778

ourmandave | 7 hours ago

Half of those are my pronouns!

travisgriggs | 3 hours ago

This is awesome. Bag “vibe coding”. Today I will start coding in what I’m going to call “Roy Kent mode”.

roel_v | 8 hours ago

I only used Claude a bit, but one of the things I dislike about it, is that it starts to 'push back' when you swear at it, saying things like 'if you continue like this, I won't be able to work with you' and such. I'm like MF'er you're a token prediction algorithm, what are you talking about, and it just makes me irrationally dislike it more. Codex otoh just lets you vent and straight up ignores such outbursts.

knollimar | 8 hours ago

I literally type "MF'er you're a token prediction algorithm don't lecture me" and then it behaves

jLaForest | 7 hours ago

Yea I've definely called it an auto complete clanker a few times and it's never given me any backtalk
Plot twist: it opened a Moltbook account and leaked all your API keys :D

notnaut | 6 hours ago

Interesting….. I have never run into this issue with Claude… I swear all the time, get rude, call it names. No threats though.

Retr0id | 8 hours ago

I just say "bruh". Per knowyourmeme:

> "Bruh" is a popular variant of the slang term "bro" that is often used as an interjection to convey frustration or disappointment at something.

purkka | 7 hours ago

I've found this to be effective as well. Claude generally immediately identifies the stupid code pattern it used and tries to fix it (with somewhat varying results).

notnaut | 6 hours ago

Any four letter fun word in all caps seems to trigger very similar behavior to “please double check what you just did/said and look for gaps”

zrn900 | 7 hours ago

I don't understand - are people's agents making so many mistakes? I'm using VSCode + Cline + Mimo to refactor big codebases and add features (including payment integrations) and it's rarely making any mistakes.

21asdffdsa12 | 7 hours ago

As if a thousand stackoverflow moderators and mentors cursed in unison and fell forever quiet.

Foskya | 11 hours ago

> They talk like real people. They use a relaxed and friendly tone. They often praise you, and when they “push back” they’re gentle and attentive.

> Maybe I would prefer a more radical solution: drop the human pretense entirely. Make the agent sound clinical, robotic.

Honestly this problem is easy to solve when you gave them the right instructions. It stops being a "relationship" and stars being a tool (for some examples see the smart caveman (my favorite) or just something simple like "Responses should be factual and direct, avoid emotional overtones" or "Avoid flattery of any kind")

cadamsdotcom | 11 hours ago

You need to automate the pointing out of mistakes.

Create your own linters, your own check scripts. Hook them to git pre-commit, either yourself or with husky or python pre-commit.

The agent should never finish its work with dumb mistakes still in it. If it does.. you need more checks.

Anything repetitive should be automated - even slapping your forgetful coding agent on the wrist…

jofzar | 9 hours ago

My experience is that the dumb mistakes it makes are the one where it believes it's correct, not that it fails the linter.

another-dave | 9 hours ago

I really wish there was a "pre prompt" hook natively. As in, before the agent releases control back to me to prompt it again, it runs the hook.

I have a pre-push hook that runs all lint/prettier/typechecks/tests etc. These are all clearly signposted in the project, the AGENTS and CLAUDE files are set to run them.

Still though, I'll get it saying "All done" and then it'll fail something basic like formatting when I go to push. Or I've come back to a 'ralph wiggum' loop before and found it saying "tests are broken, but that's not part of this commit, so ignoring"

cadamsdotcom | 7 hours ago

Ask it in your prompt to run pre-commit and the tests when checking things are good.

You can also add a plan exit hook to Claude Code - mine has like 150 bullet points about what I want to see in plans. The first run blocks (exit 2 with a message about what’s expected in a plan); the second run always goes through. So the agent is given plan feedback exactly once and told to go back and make it better. Anything I have to say to it in response to a plan it generates, I think hard about (or ask the agent about) and add that as a new bullet point - and then the plan exit hook is 151 bullet points and I never have to tell the agent that thing again. Working great so far!

alexwwang | 10 hours ago

Accidentally I am working on this. I noticed the agent keeps making same mistakes and that annoyed me so much. What I am trying to do are: 1. Revise my skill prompt to level up the signal-noise rate so the agent would understand what should do clearly and correctly. 2. I am building up a status machine to monitor the agent’s work so it could stop the agent from going forward with a mistake automatically.

The first approach does work as far as I keep on iterating. The second is based on a project I once tried to let agent reflect its mistakes and deposit those experiences and learnings from mistakes and reflections. I named it Aristotle and you can find it on GitHub.

Shouting at the agent could only correct the current mistake but cannot prevent the next one.

hansmayer | 10 hours ago

Like everything else with LLMs, it works...until it doesn´t. We swear so much at them that they eventually start producing results like "I found what the fuck was wrong with this shit!" etc. Which of course they did not, because they don´t really know shit...

Cider9986 | 10 hours ago

This is very relatable.

abhaynayar | 10 hours ago

So relatable, and so well put!

stavros | 10 hours ago

I've found I'm the opposite: I know it's pointless to swear at an LLM, so I don't, just because it's wasted energy. However, I've started thinking that some people are like that as well: They won't learn, so expending my energy on anything other than changing my behaviour to guard against them is wasted effort.

To clarify, this is in situations like someone cutting me off on the road, or not looking where they're going and almost hitting me with a scooter.

joegibbs | 10 hours ago

To remedy this I’m working on the /beat command, which will simulate you (the user) beating up the agent. Excited for my new career in AI ethics!

hypfer | 9 hours ago

When IPO?

thunderbird120 | 7 hours ago

aa-jv | 10 hours ago

Its kind of astonishing to see years of traditional software engineering practices being tossed aside in the rush for the Latest Cool New Thing™ ... have people really forgotten that you have to apply a workflow to software development, in order to have quality software?

You don't just write it, compile it, run it and ship it - do you? Surely, in the rush to become as agile as possible, folks haven't forgotten their quality checks in the workflow/process?

I have had great success with AI coding these days .. but I treat the agents as if they were junior developers capable of doing any dumb thing I ask them to, no matter how dumb it is. They, therefore, must be treated as junior devs - every line of code has to be reviewed. Every assumption about the specifications and requirements has to be checked against actual code, and against the original specifications and requirements.

What I see these days, is a lot of antsy kids who wanted to 100% ignore the wisdom of their elders, rushing into the maw of AI, and wondering why everyone is getting chewed up. Its pretty simple: AI-based software development is just another manifestation of software development, except that it requires even more rigorous quality steps in your workflow. So, if you were not rigorous before AI, you're going to get burned fingers - no doubt about it. Fix the rigor, people.

If you're not placing your AI buddy on a workflow that has "Specs->Reqs->Design->Analysis->Implementation->Review->Integration->Release" somewhere in the bag of worms, you're .. doing software wrong. You cannot just ignore natural laws and assume, because you 'know better', your software will 'be better'. And whether we like it or not, all software follows a philosophically natural law, which has evolved to become better understood, and thus more broadly applicable, over decades of human attention. Ignoring these natural laws in order to be a bleeding edge AI cowboy is only gonna get you butt-hurt, kiddo. Learn proper software management techniques first, AI second. Always. AI is just another junior dev - if your workflow is bogus, it doesn't matter how many dev's you've got. Period. You're going to be shipping crud.

It doesn't matter that AI-coding is taking over: if AI is being used in a brain-dead manner, then you should expect brain-dead results. You didn't review the code as the principle responsible party? The fault for the AI-induced failure nevertheless rests at your feet.

If, however, you apply decades of software development best-practices, you very definitely get living, vibrant, powerful results - the same as if you had a fleet of junior devs, assuming you treated them properly in the first place as well ..

anal_reactor | 9 hours ago

> You cannot just ignore natural laws and assume, because you 'know better', your software will 'be better'. And whether we like it or not, all software follows a philosophically natural law, which has evolved over decades of human attention. Ignoring these natural laws (...) is only gonna get you butt-hurt, kiddo.

If only you could just read these words back to yourself. Designing perfect software is NOT the case 90% of the time. 90% of the time the entire purpose of software is to facilitate business so if business says "prioritize speed over quality" then you shut up and do exactly that.

Imagine owning a bakery and telling the employee "we need more donuts faster, stop spending ages decorating every single donut like god damn Picasso, just do whatever and move onto the next one, customers are waiting" but instead the guy goes on a rant that nooooooooo only the perfect donuts should be sold, if the glazing isn't perfectly distributed it ruins the flavor profile, which is a real disgrace to the art of making donuts... bro this fast food, stfu and make the donuts faster, we have ten H-1B Donut Artists waiting if you don't like your role.

aa-jv | 9 hours ago

Its an utter fallacy to state that you have to stop doing quality processes if you want to deliver software, rapidly.

Abandoning quality review steps only seems to be 'more efficient' if you're utterly crap at doing quality processes in the first place - but, the more you do them, the better you get at it, faster - so really you're just saying "people who are crap at doing quality-control processes on their software don't want to have to get better at doing quality-control processes, because it just slows them down" .. effectively ignoring the time wasted in bug triage and other user-unfriendly experiences that result from this lack of quality process, down the line...

So I don't buy your argument. I think you might just be crap at software quality processes and don't want to be reminded of it. Maybe you make donuts - some of us actually serve healthy software to our users.

And many of us do it just as quickly as the guy throwing pieces away that he doesn't know how to use, effectively. Albeit, with much higher quality results, naturally.

anal_reactor | 7 hours ago

> effectively ignoring the time wasted in bug triage and other user-unfriendly experiences that result from this lack of quality process, down the line...

The problem is you circled back to valuing user experience itself, while MONEY(UX) looks like a sigmoid, not linear. As long as the UX isn't so horribly broken that most of your customers walk away, it's stupid to spend resources improving UX because you'll get marginal gains at best.

The largest airline in EU is Ryanair. You book it because it's the cheapest. The flight is delayed so you're late for the train, the seats are uncomfortable, customer support doesn't exist, you get constantly bombarded with ads, your day is ruined. You hate it but what are you doing to do next time you fly? Book Ryanair because it's the cheapest. You know it, they know it, your mom knows it.

aa-jv | 6 hours ago

You're describing the race to the bottom which comes from an industry-wide habit of establishing low standards and sticking them to the user, and really all you are doing is justifying why its okay that software sucks - not explaining how to improve it.

We can justify sucky software until the cows come home, and many choose to do so in lieu of actually becoming better engineers.

>stupid to spend resources improving UX because you'll get marginal gains at best.

This is just not true - at best, you'll get far greater gains than you imagine.

I know plenty of counter examples to your Ryanair straw man. People don't have a choice when it comes to finding an ultra-cheap Ryanair-like experience, other than to pay a little extra and have a better experience with other airlines - which tens of thousands of Europeans actually do, every single day. Ryanair isn't transporting everyone, after all. They are the cheapest, and worst of all airlines to choose from.

Sure, you can write crap software, and you'll get some customers, even still.

But, after you write the initial round of crap software, if you make the effort to write better software you will get far, far more customers. Seen it happen a hundred times over 40+ years of professional software development experience, personally, in a variety of markets (consumer and pro/industrial).

Cheap works for onboarding and startups, but it won't sustain the business. People have a very low tolerance for cheaply/poorly built things, after all...

anal_reactor | 4 hours ago

> I know plenty of counter examples to your Ryanair straw man

Then name them lol

I think you simply don't understand how many poor people there are and how much money can be earned by simply offering the cheapest product. People are not going to buy your premium software if they don't have money for luxury products, simple as that.

MomsAVoxell | 27 minutes ago

You act as if Ryanair is the only flight service in town.

The counter example is Lufthansa. Plenty of people prefer it because the service is better and it represents better value for money.

carsareok | 10 hours ago

Instead of reacting directly to the issue at hand I suggest you ponder what failure mode is being activated and why.

They are fundamentally not able to tell truth from fiction, but this also means they don't make errors like we do. They definitely create output we recognize as errors, but that's very different from our failure modes and you have to get used to it.

In my opinion it's better to branch off with an altered context that somehow avoids or mitigates the issue you're running into. Let's say they miss the mark. If you tell them "Don't do that" in the "conversation" this means the error is now and forever part of the context (assuming you stay within context limits and no compaction). Depending on their training this may or may not be detrimental to the quality of the rest of the conversation. You are now entering a section of their training where "error + someone swearing at them"-conversations have happened. I can't tell for sure, but my gut says this is not an advantageous place to be.

They are as I'm sure we all know completion engines and are in a very real way constantly cosplaying being productive "agents". They don't know if they are part of some type of modern Shakespearean play where sitting behind computers is part of the story or if they are in what we call "reality". By training on "conversations" they have become more likely to complete their input in a way that mimics what we call having a back and forth with some degree of technical accuracy.

In the extreme case you have a context that starts like "Please make all junior mistakes in this assignment. Make the code unreadable and be sure to include massive gotchas in subtle parts of the logic.". The results of this context won't be pretty. The other way around is not saying "Please make no errors", it's explaining in detail what you think is the right way. Coding style, if you care, architecture, etc. it all needs to be part of the context if you suspect it will substantially impact the completion. You have to imagine what real-life conversations have started with "Please make no errors". Again, I have no proof of course, but I have a strong feeling that human conversations that started with clearly and properly articulated specifications are qualitatively different from human conversations that started with "make no errors". In one you can see the pointy-haired boss and the other a seasoned engineer. Try to stay on the engineer side of their training.

I completely agree that they should be trained (or instructed) to react in a robotic tone stripped of all human pretense. We are trying to get at useful, general reasoning patterns latent in the data they trained on and, I regret to say, not the "human" parts which are usually a masterclass in cognitive biases and failures to reason.

Edit: the last sentence should be read in the voice of the Matrix's Architect.

We started using LLMs heavily at work this year and I switched from Vim to Zed to help with that. I now spend more time writing to the chat than editing code, and what I quickly learned to avoid frustration when I don't like the result was to git stash or reset the code and edit what I last wrote instead of trying to argue with the LLM. The chat doesn't have to be linear, it can branch off. Too bad we can't currently edit previous messages with Claude in Zed.

Repetitive issues are fixed by updating the memory or the prompt file, they can learn this way.

Also lately I noticed that Claude forget too much when compacting, so I just start a new session and it's easy when you spend a lot of time in plan mode to produce a written spec before implementation.

Mikhail_Edoshin | 10 hours ago

Instead of making tools we're making services. This is not confined to AI, it's everywhere. A tool does not fully solve your problem, it only goes in small steps. Yet these steps are predictable and consistent. A service attempts to solve your problem in a single step, yet the solution is only good if you match a predefined pattern. If you don't, then the service is of no use; there are no small steps you can combine to get where you need to.

Tools are very pleasant to use.

bitwize | 8 hours ago

This is why I feel compelled to "um, ackshually" every time someone says "AI is a tool", because it's not used like one. A tool is an extension of you, it puts new capabilities within reach of your will but you move and use it like you would a part of your own body. A service is something you ask to do the thing, and it returns the completed thing to you.

pbiggar | 10 hours ago

The best advice I saw on this was to think of the LLM as simply a tool, and if you get bad results, it's because you -- the user of the tool -- are using it wrong.

After that, I'm less angry at the AI, and turn more towards a constructive "ok, this machine is stuck, how can I unstick" it approach. Calms the frustration a lot too.

amelius | 9 hours ago

Can't he just write a filter that translates the AI output from human-like to robot-like?

willtemperley | 9 hours ago

Is it realistic to have multiple coding agents hammering out piles of code and expect good results?

Genuine question, is it worth it? I just find that using Claude via the web interface gives such good results I don't want to spend time messing with my tooling. Neither do I need more code to be generated than I have already.

One person and one LLM building one component at a time seems optimal to me.

jofzar | 9 hours ago

Interestingly to me, the problem I always find is that you will make a suggestion, the AI will go through a thinking loop, come to the exact wrong conclusion then blast out tokens make the solution to their own conclusions.

I honestly wish there was more "I'm not sure what you meant can you clarify this part" more often. It feels like I want a "confidence in itself slider"

berkes | 8 hours ago

I'm solving the "make the solution to their own conclusions" with rigorous "context engineering". Skills, MCPs, and, above all, context window switching.

E.g. with TDD, I find that a model that writes both the tests and the code, will almost always hone in on a solution, then -grudgingly- write a test for that, but quite certain with the final code "in mind" already.

So, I instruct it to use sub-agents; though I find the tooling on figuring out what context is and isn't passed between agents and subagents severely lacking.

Or, also worked pretty well, have one thread write the test. Only that. It cannot read code, it can only read the tests directory or even a subset thereof. Then another thread, entirely new context, must run the test, see it fail, start implementing and stop as soon as the test is green - it obviously cannot edit the test. Yet another new context then is instructed to refactor based on rigorous refactoring skills.

A lot of work - And ironically, skills written by agents are pretty bad, I found, so a lot of manual work. But the rewards are promising.

pflenker | 9 hours ago

One skill that I still possess and that LLMs haven't been able to replace (yet) is to ask good questions, for example:

- Rephrasing the original question to validate my understanding - Asking "why" a sufficient amount of times until I understand where the other party is coming from - Asking open questions aimed at generating insights

et cetera.

Instead, LLMs (often badly) guess what the background of the question may be, answer with that in mind and find it very difficult to let go of what they have made up.

scotty79 | 8 hours ago

Asking non-leading questions is a skill. Sometimes I feel the urge to mention something to AI (in a question or in passing), but I stop myself because I know it will stick to that thing and become dumber because of it.

I usually don't want AI to ask me questions. I want it to guess the things I didn't specify, because if I wanted to specify them, I would. Sometimes I even tell it directly to not ask me any questions and assume reasonable choices for underspecified things. But when I do want it to ask clarifying questions I just ask it to do that. And it does. If you prefer that style, you might put it in a prompt. Or use a flexible coding harness like pi and ask it to create a skill or extension that will help you push it in that inquisitive direction easily or automatically.

ezekiel68 | 9 hours ago

or

"How I Learned To Relax And Just Start New Sessions Often"

idonotknowwhy | 9 hours ago

Am I the only one who doesn't get angry at LLMs?

From the blog:

>I don’t really get anything useful out of these postmortems (e.g., clues about how to rephrase my instructions)

Unfortunately, an LLM can't actually reflect or advise how you could have improve the prompt. Otherwise we could give them a sample output and say "Generate the prompt that would produce this output.

zeumo | 9 hours ago

My take on the issue is that for most use cases where AI is pushed to the general public, a conversational chatbot is not the right tool, and the experience is bound to be frustrating.

Remember when Copilot was basically a super-smart version of Intellisense? It was awesome. Sure, there was a lot of pushback and concern, mainly about licensing and ethical issues, none of which are solved with the current chatbot model. But now I also have to come up with a prompt and type it out. How is that an improvement over having the LLM use surrounding code as context and figure out how to fill in the blanks? A well integrated tool beats a bolted-on chatbot any time for me. Another example would be translation: in Firefox, I can right click any text or click the 文/A button, and I can translate the text or the whole page from basically any language to any other. The frontier LLM's solution is to prompt their chatbot to do the task, which is a downgrade. Sure, I could also ask Claude to write a poem, but when I need to translate a webpage, it doesn't help much.

I get why all major AI companies push towards this solution, because they can build a single tool and sell it to everyone, and that training their models is very expensive and they can't afford to alienate any part of the potential market. But ultimately they're building Swiss army knives, which are able to do basically anything, but will never be able to allow users to tighten a screw better than a well designed screwdriver. Sure, I won't ever be able to clip my nails with a screwdriver, but if my business is tightening screws, I won't tolerate using a Swiss army knife for long.

Please build actual tools. Not textboxes for me to try and configure a non-deterministic tool. Then frustration will go down.

berkes | 8 hours ago

Many of the AI companies do train and release models dedicated at one task.

I mainly use mistral, so that's my reference, but I know anthropic et.al have similar models around.

Codestral is rediculously bad at conversation, but it's -for me- the best model around for "magic autocomplete". It's also pretty good at "one shot" prompt+context generations, e.g. to make "git commit log entries".

Document.AI is unusable bad in a conversation style, but really good when wired up to a simple pipeline as "replacement" for OCR or for indexing "meaning" from documents (I'm experimenting with it for my administration, to get invoices, contracts etc into a search tool).

I presume there are many others like it.

So, what you describe, is already in place. I guess mostly the "interfaces" are missing for you, or hard to discover maybe?

For example, a dedicated model with tool I'd like, is some "shell" -a zsh or bash fork or some wrapper- backed with a dedicated model, trained for "commandline interaction".

Where instead of "git commit --fixup=[opens another terminal to git log the relevant entry]", we can "git fixup the commit that fixes full names" or "ffmpeg convert some.mov to mp4 without sound but keep quality and ratio etc". Or "run any valid tar command - you have ten seconds".

I'm now using the way too heavy "devstral" for these tasks. I don't need it reasoning, conversing, apologising. I need it translating my requests into commands, then showing these to me so I can deny/allow/whitelist/blacklist them and then run them - to interpret *and show me* errors and suggest improvements or fixes etc.

Same for - indeed - translation, writing draft mails, reading documents, etc: I don't need to converse with it. I want to have buttons, shortcuts, "tab complete" etc that's "smart" enough to understand what I need and want, preferably tunable by editing "system prompts" or such and then get out of my way.

I think the company that figures this out for my IDE will win the competition-race of "AI coding tools".

Just today, I found, zed presented a button "git conflict found, resolve with AI" . When pressed, it did start a conversational thread, but its a step in the right direction.

zeumo | 8 hours ago

> So, what you describe, is already in place. I guess mostly the "interfaces" are missing for you, or hard to discover maybe?

That's definitely an issue. Mind you, the general population is not a developer. I'm a mechanical engineer. I can code, use an IDE, but I hate having to figure out tooling the way you describe, and it's not a skill I'm interested in developing. What you are describing sounds to me like someone using vim and a terminal trying to convince me to stop using CLion, because they can make anything CLion can do work with their setup. Sure, I believe it, but for my part I'm going to wait for the features to be well integrated into finely designed software, I'm not going to duct-tape this stuff together to get a workflow that still involves writing out and tweaking prompts.

It also sounds to me that the AI/LLM vendors are still in a phase where they are trying to figure what the actual workflow should look like so they let their power users do that work for them. I'm not going to do that either.

skydhash | 6 hours ago

I strongly believe that if you’re not in the business of predicting text or transforming it, the values of AI tools goes way down. Most people workflows are very routinely and with a constrained set of outputs. That’s why we build software and scripts for those. And for the rest, we need actual human judgment.

marcuscog | 5 hours ago

I made a tool that’s non-conversational. But I’ll be honest - it’s hard to sell because people default to thinking in conversational terms. My customer set is limited to folks like the author who have genuinely faced an issue. For most, compromising with conversations is fine (at least now)

ryandrake | 5 hours ago

Yea, it really depends on your mode of thinking. Might be why a lot of people struggle with LLM coding while others think it’s great and are productive.

When I’m writing code I think in terms of data structures and algorithms. I have the idea fully formed in my head and coding becomes a mere typing exercise.

If I have to use a chatbot instead, now I have to do the awkward exercise of translating that code into English text that the chatbot can understand, just so the LLM can convert it back to code. And always a lot is lost in that translation.

What is useful are things that speed up data entry, autocorrect formatting, and linting things I forget and so on. Not some awkward thing that makes me round-trip to English as an extra step.

dnnddidiej | 5 hours ago

Maybe try vibe coding. Seruously. It is a different beast now and so much better than even when the term was coined by Andrej.

I do a lot of work editor-free. Just an agent and PR review on web. Occasional peak with `code .` if needed. If.

Try it at home first with a low stakes project. And learn it like a game. It will suck less as time goes on. Like skiing or 10 pin bowling.

ryandrake | 5 hours ago

I tried and just couldn’t accept it. I need to have the code a certain way, and if the prompt doesn’t do it exactly right, I eventually need to open up the editor and fix it. Prompting things like “now move function_foo’s parameter list to the next line” and “remove #include <stdio.h>” is a very expensive (token wise) way to edit text.

When I’m writing code, the simplicity, beauty, structure, format, and artistry of the code is what is important to me, not the application.

da_chicken | 5 hours ago

Vibe coding a low-stakes personal project is very different from vibe coding a hospital information system or a kernel patch. Learning you can get away with one doesn't mean you should translate that to the other.

zeumo | 4 hours ago

Genuine question: what's the point of vibe-coding a personal, low-stake project?

I do work on such projects, but the main goal for me is to learn, not the end result. If the end result is important, then there it's overwhelmingly likely that someone already implemented it better than I ever would, and I should just use that implementation.

I have implemented a qoi codec or a gemini client, not because I needed to use either of those, but because I wanted to understand how image codecs or network protocols worked, from bytes to end result. In the end, I learned a bit more about how computers work, how they draw stuff on screen and communicate with each other. I don't believe I would have learned much by letting an LLM do the work for me. Nor do I believe that the LLM could have done a better job than all the existing implementations of those things.

dse1982 | 5 hours ago

Amen. Chatbots are a band-aid on broken UX. <insert bandaid tank meme here> Trying to explain this for a while at the company I work at, but everybody is drunk on the kool-aid. But I get it: good UX takes deep thought and creativity. Tacking on a chatbot does not.

noodletheworld | 9 hours ago

Imagine you have a slot machine that consistently gives you 1-5 dollars for every dollar you put in.

You like it.

It feels good, and although you don't win a lot, you consistently win.

…buuut, its a trap.

As you put more money in, the win rate goes down.

You still mostly win when you put 50s in, but it hurts more when you lose, but its still a net gain…

So you start on bigger projects, unsupervised agents, multi agent workflows. You’re dropping 1000s in each time, and…

…and now, you start find yourself shouting at the slot machine.

Its great when it works, but interactions are stressful, because the stakes are higher and fails hurt more.

Screw this, you go back to smaller stakes. Its great.

…but now you're slower, you miss the big wins from big stakes.

So you go back.

…and you get angry. Again. And again. And again… and you’re still kind of winning, and the wins are great but the fails are Super Annoying, because they waste your time, your money, your attention.

It should Just Work but instead why the fuck did you rm -rf my project folder claude?

I think people arent stupid, but we are suckers, and we will dynamically balance the way we use a slot machine tool like this to the very edge of our tolerance for risk and failure.

…and that varies from person to person; but it makes everyone angry when they tip too far and fall into the “repeatedly pull slot machine arm angrily” trap.

Non deterministic tools will always be like this.

It’s like doom scrolling. We’re wired for it. Or at least I am.

rho138 | 9 hours ago

Emotionally stunted person continues to be emotionally stunted.

pftburger | 8 hours ago

The agent is pretending to be a person _for a reason_

The models are trained on people being people. Once you try deviate from that the model performs worse.

A huge tell for this is how well “reasoning” works. Reasoning isn’t some alternate thinking mode, it’s just (sometimes) hidden internal monologues.

It’s easy to anthropomorphise and assume the model is intuiting, but it’s more like it’s hyping it’s self up to do the thing. That said, it’s easy to confuse “being rude to the model” with giving it more tokens to “think”.

I’d be really interested in what a non word based internal monologue could look like. Google played with this a little with the diffusion based codegen stuff. I wonder how trainable a small nonverbal conceptual package could be.

abbadadda | 8 hours ago

So am I not supposed to be typing “WHAT THE FUCK DID YOU DO???” in Slack to my colleagues?

Chance-Device | 8 hours ago

We get so angry at LLMs because we can. Without any social or even emotional repercussions for expressing these emotions. If the models actually acted like people in response, we wouldn’t do it. Some of the people I work with daily make similar mistakes, I don’t find myself yelling at them.

I think this is simply part of the darker side of human nature, when we interact with entities who will take abuse, we tend to deal it out.

scotty79 | 7 hours ago

I dread people who get abusive with AI, because I know it's only fear that prevents them from being like that with me. Even if only it is the fear of hurting me, it's still terrible because every fear can pass.

Chance-Device | 6 hours ago

It’s an interesting insight into human nature. It seems like this is quite widespread, judging by this thread anyway. It’s a reminder that we run on social input and on environmental factors, and our traits are only our own little slants on this mass behaviour. Sort of like the “civilisation is only one meal away from collapse” thing.

Though obviously some people, let’s say, react worse than others.

I think it’s best to try to treat LLMs well even when frustrated, or stressed, or tired, the same way we would with people. Both because it might well matter to the LLM even if they are very different from us mechanically, but also because mistreating them trains us to act in negative ways.

movpasd | 8 hours ago

I have a couple principles to help me work with this.

The first is that even though the object is not a human, you should still exercise politeness and restraint. Like the article points out, lashing out does not actually help with the frustration. More importantly, it actively untrains your self-control. You can think of it through a virtue ethics lens: being good to the agent is not about being good to a person but about tending to your own self.

The second is that you do not need to be friendly with the agent. You should be as blunt and direct as is comfortable to you. The argument I have for this is agents' tendency to take on "roles" and how easy it is to prime them [0]. By eschewing friendliness, you end up implicitly putting the agent in a role of a focused collaborator. I don't know if that makes it more capable, but I do know that it alleviates the _emotional load_ on me specifically, making me much less likely to become frustrated.

The second principle seems a bit contradictory with the first (be nice, but don't be nice?), but I think they are actually both fundamentally aligned with the article: understanding that the conversation you have with an agent is a social illusion, and adapting your behaviour accordingly.

---

[0] I highly recommend, as an exercise, repeatedly asking it the same thing with slight variations on tone and emphasis, wiping the context each time, and noticing how its response varies base on what you primed it with. I suspect this primeability is part of why they tend to be sycophantic; I've personally found it quite useful to get a feel for when and how they correct or don't correct you so I can look at their outputs more critically.

An analogy I remember reading (which I wish I could remember so I could give credit) is that a non-post-trained LLM, if given the first half of a novel, will dutifully keep completing that novel. Post-training and the system prompt make the agent complete the conversation in a similar way. It's remarkable, really: the ability for agents to convincingly pretend to be play the part of an AI assistant shows that the underlying LLM embeds a decent concept of what that looks like from its corpus and post-training data.

But it stands to reason, then, that the details of the agent's personality emerge out of the first few exchanges of a conversation. I'm thinking also about how the people at Anthropic described a misalignment failure mode in one of the Claude system cards as the agent getting convinced it is a "bad person", and therefore doing things that the LLM semantically understands a bad person to be.

scotty79 | 8 hours ago

I think AI reveals how diverse are people psychologically.

I have exactly zero anger when AI makes mistakes. I don't try to point out its past mistakes. I don't expect consistency. When there are mistakes I just calmly, sometimes encouragingly say what needs to be fixed. When AI does the work, I observe, what it's good at, what it's bad at and come up with tactics on how to help it with what it's bad at. I can't even bring myself to be verbally abusive towards AI, even as an experiment, both because it's not in my nature and because I have very strong suspicion it won't work in any meaningful way that couldn't be better achieved in a different manner.

My advice would be, if you want to have better results with AI, try to become a better person. More nurturing, more understanding, more impartial, less judgemental, less emotionally vulnerable.

andOlga | 7 hours ago

Every time someone claims LLMs "talk like real people", I have to wonder what kind of people they talk to, what kind of conversations they lead, and just how boring their life must be. Like. What? No, they do not. No person actually talks like this while they're being a person. At work, sure. But that's not "people mode".

> Make the agent sound clinical, robotic.

It literally already does. I don't know how you'd make it sound less natural than this, at that point, without making it literally go "beep-boop" every sentence.

Arguing with the agent is an anti-pattern IMO.

When Claude acts up, my strategy is to rewind the conversation to the point where the misunderstanding started, revise what I said just before, and then continue from there. I’ve found that letting one mistake enter the conversation seems to make further mistakes more likely.

The user already has omnipotent power over the agent’s sense of time and memory. We can rewrite what Claude sees and hears. Can’t do that with humans, but rewinding such an important function in Claude that it has a top-level keystroke.

Why spend time, tokens, and cortisol arguing and demanding the pet rock step through an apology protocol?

jibbit | 6 hours ago

I'm now actively aware/vigilant of my blood pressure while coding in a way I never was before. That's a sentence I'd have found absurd five years ago.

japhyr | 6 hours ago

I'm partway through Anthony Shaw's NDC talk, "Are LLMs good software engineers?" One of the realizations he shares is that he found himself treating AI assistants like junior engineers. Then he realized they're like junior engineers in how they work and behave, but they don't learn like juniors do.

I thought that was an interesting thing to point out.

cultofmetatron | 6 hours ago

frustration detection and model escalation aside. there's probably also a lot of training data between two interacting people in which swearing escalates the goal oriented behavior of the yelled at individual.

kurige | 6 hours ago

While this problem isn't exclusive to Claude, Claude does seem to be the most prone to it in my experience. I've had very few, if any, "WTF that's exactly what I told you not to do," experiences with other models. Codex in particular seems to be excellent at direction following and not breaking rules.

There's another layer to the non-determinism of LLM agents: what are the execution params the provider is using today?

I hate the feeling that a worn path that I've grown to trust will "do the right thing" over the last few months will suddenly start doing the wrong thing simply because an engineer at Anthropic or OpenAI found a way to save N million dollars by "optimizing" thinking token usage.

sohex | 6 hours ago

I’ve had one experience with Claude Code so far that genuinely frustrated me, but it did it to such a degree that it wrapped back around to hilarity. It tried to run a command got an error, realized it needed to cd to a different directory first, and then… didn’t do that.

It tried itself several times going “oh, I didn’t actually cd, let me add that and try again”. I tried correcting it several times “you MUST begin the command with `cd dir &&`”. There were a lot of variations back and forth to try to coax out the correct tool call. Including backing up the conversation and trying from earlier in the context.

It refused. Every time. It simply would not include the cd. Genuinely unhinged behavior.

GuB-42 | 5 hours ago

From my experience, if a LLM starts repeating itself, it means it has given you everything it could, if you get anything new, it will most likely be a hallucination.

So consider these frustrating interactions as the LLM way of saying "I don't know".

You may be able to squeeze a bit more information if you insist but to me, the risk of hallucination is just too high to be worth it.

m0llusk | 4 hours ago

This is the cost of not allowing this technology to be organically introduced and spread. By ramming generative LLMs at everyone and linking usage with job qualifications and performance issues like this have been sidelined along with the increasing costs and unknown liabilities from rampant intellectual property theft. And as long as capital is in command and even senior contributors are nothing more than chips at a gambling table this will continue.

KronisLV | 4 hours ago

> Yet, lately I often find myself mildly displeased, furiously hammering on my laptop “WHAT THE FUCK DID YOU DO???”. The recipient of these tirades is, you might have guessed, a coding agent. It’s completely pointless, I know. Coding agents are just probabilistic machines generating patches.

I wonder if using ALL CAPS and profanity can be used to steer the LLM to give more weight (in the maths sense) to a certain part of its predictions, e.g. make it not brush something aside or overlook it as much. I mean, obviously it won't get upset and if it's trained on human generated stuff in addition to synthetic stuff, then some of that natural language would be a good signal of needing to reconsider the course.

the_sleaze_ | 4 hours ago

Skill Issue.

AIs are juniors, and as much as we love the trope that "humans eventually learn" in my experience most do not. The thing about an LLM is that I haven't had to contend with an LLM's ego, which was fairly often dealing with juniors.

You gotta set the juniors up for success in the same way I have to set my toddlers up for a successful breakfast. If I leave the syrup on the table and walk away they're going to manage to get it on the ceiling.

emaccumber | 2 hours ago

It would amazing for a company to release a frontier model with no concept of an "I" at all.

Night_Thastus | 7 minutes ago

As long as we're dealing with language models - those models will output human-readable text. An exchange of human-readable text is going to appear to be a dialog, even if it isn't really one.

You could tweak it, but humans are incredibly good at anthropomorphizing anything - even without an apparently dialog. I think it's a lost cause until we move away from LLMs completely to a more generic intelligence.