The "are you sure?" Problem: Why AI keeps changing its mind

41 points by turoczy a day ago on hackernews | 20 comments

philipp-gayret | a day ago

I am seriously tired of every other paragraph I read ending in an It isn't just X, it's Y. I'm sure there is something insightful in between this slop but to the author: Please write using your own voice, if I wanted ChatGPT's take on it I would ask.

robertlagrant | a day ago

Exactly. It's not just nauseating—it's sickening.

jofzar | a day ago

I miss people having their own voice. I can't keep reading slop.

I wish hackernews banned slop, or atleast required disclosure.

srean | a day ago

I think HN might need a downvote button for stories if this continues.

: jagged-chisel | a day ago
We have "flag." Flag 'em.

tyleo | a day ago

Agreed. I don't even necessarily have anything against AI edited text but there's a way to sharpen your own writing and there's a way to let its voice dominate. There's a lot of idioms it tends to fall back on (em dashes being the most well known). I'm surprised that folks don't notice these and aggressively reassert their voice.

I use LLMs in my own writing because they have benefits for conciseness but it tends to be a fairly laborious process of putting my text in the LLM for shortening and grammar, getting something more generic out, putting my soul back in, putting it back in the LLM for shortening, etc. I tend to do this at the paragraph level rather than the page level.

gmerc | a day ago

AI Slop. Unfortunately

RugnirViking | a day ago

The article's main idea is that for an AI, sycophancy or adversarial are the two available modes because they don't have enough context to make defensible decisions. You need to include a bunch of fuzzy stuff around the situation, far more than it strictly "needs" to help it stick to its guns and actually make decisions confidently

I think this is interesting as an idea. I do find that when I give really detailed context about my team, other teams, ours and their okrs, goals, things I know people like or are passionate about, it gives better answers and is more confident. but its also often wrong, or overindexes on these things I have written. In practise, its very difficult to get enough of this on paper without a: holding a frankly worrying level of sensitive information (is it a good idea to write down what I really think of various people's weaknesses and strengths?) and b: spending hours each day merely establishing ongoing context of what I heard at lunch or who's off sick today or whatever, plus I know that research shows longer context can degrade performance, so in theory you want to somehow cut it down to only that which truly matters for the task at hand and and and... goodness gracious its all very time consuming and im not sure its worth the squeeze

catigula | a day ago

An AI can only be tuned to either be sycophantic or adversarial.

It isn't possible to tune an AI to have some sort of 'correct answer' orientation because that would be full AGI.

agentultra | a day ago

There isn’t a mind to change. Unfortunately the article is slop. Too bad, won’t read the rest.

I wish there was a tag or something we could put on headlines to avoid giving views to slop.

: sunir | a day ago
There is a mind; the model + text + tool inputs is the full entity that can remember, take in sensory information, set objectives, decide, learn. The Observe, Orient, Decide, Act loop.
: gebalamariusz | 21 hours ago
In AWS, for example, DNSSEC Route53 signing is possible, but almost no one configures it. Generally, most people do a lot of good things about security, but they somehow forget about DNS.

josefritzishere | a day ago

AI slop about AI slop. The internet is dead.

trusche | a day ago

This is real, but (at least in a coding context) easily preventable. Just append "don't assume you're wrong - investigate" or something to that effect. Annoying, but usually effective.

: simsla | 13 hours ago
I think my experience as an interviewer has helped. If you ask non-leading questions, sycophancy doesn't come into play as much.
Instead of saying "are you sure?" or "shouldn't we do X instead?" you could say "give me the benefits and drawbacks of this compared to X".
Also, when you yourself are sure, give clear stear. "This overcomplicates A, let's do B instead."

johndhi | a day ago

The article posits that sycophancy is inherent to how models are trained.

I think there's a simpler explanation. Every leaked system prompt from every model pretty much includes instructions to "be helpful," and the models are trained to be assistants, not just general knowledge repositories or research tools.

My hunch is that's the core of the problem -- the system prompt.

Eddy_Viscosity2 | 20 hours ago

My prompts always contain the phrase 'no sycophancy'. The results are more direct.

: fleischhauf | 18 hours ago
I wonder what happens if you prompt it to be a tool and not an assistant and that it does not need to be helpful just do as instructed or something like this

zarzavat | a day ago

In computer chess there's a concept of "contempt". When you set the engine to have high contempt it evaluates the opponent's moves lower, essentially assuming that the opponent will make a mistake. Conversely with low contempt the engine evaluates the opponent's moves higher, expecting the opponent to play better than it.

There is a similar trade-off with LLMs. Sometimes their human conversant wants assistance and so the LLM should be more deferential. At other times the human wants a bias towards correctness rather than their own opinions.

It would be nice to have a contempt knob that you can adjust, instead of blindly trying to emulate one through prompting.

satisfice | a day ago

I call this self-repudiation. I performed a systematic experiment on this exact matter, a couple of years ago. I found that ChatGPT 3.5 frequently self-repudiated, whereas 4.0, under identical circumstances, rarely did.

These experiments are a bit expensive to run because you are forced to read all the responses to judge repudiation. Sometimes it is subtle.

Also, behavior changes with the exact wording of the question.