"I can't do that, Dave" – No agent yet

34 points by freediver a month ago on hackernews | 24 comments

Actually once I was very surprised when testing one of recent non-thinking Qwen models it said "I'm sorry, this project is too complex, I can do that". I was very impressed by this answer. So far, it was the only model that reacted to this task, ever. The remaining ones agreed to proceed and failed.

: lukan | a month ago
ChatGPT had a time for me, where it refused to do anything bigger than a few lines of code, because "too complex". But step by step it did it then anyway.
(No idea about the current state, since I switched to claude)
: QuadmasterXLII | a month ago
What would be great would be if they tried for a while, and if they didn’t succeed, explicitly report that they failed, and then present their best effort

porphyra | a month ago

I would much rather have models that try and fail than to have false refusals (which do happen and are really annoying).

speedgoose | a month ago

For coding perhaps. For general purpose usages, current models know how and when to refuse. Politics, sexual taboo, drugs,…

Perhaps we should train them to refuse developing more insert your most hated stack here.

ElFitz | a month ago

I’ve been running a Claude Code "thing" in a loop for a few days, and that has been extremely frustrating.

But after tons of nudging it has started developing a sort of "improvement engine", as it calls it, for itself to help address that.

It go through its own logs and sessions, documents and keep track of patterns and signals, associated strategies, then regularly evaluate their impacts, independently of the agent itself, and it feeds those back to it in each loop.

It’s been quite fascinating to watch.

rdiddly | a month ago

Boy, that was fragmented. What should I have done for years leading up to today to prepare for reading this? Gaming? Doomscrolling social media? Chugging Mountain Dew? Reading poetry?

esafak | a month ago

Idiocracy is reality: people can't even form paragraph-length thoughts any more. I just noped out.

: wolf4earth | a month ago
You don't understand my writing. Hence I must be stupid.
Interesting conclusion. ;)

woeirua | a month ago

Yeah, are these poems? I feel like it's just more AI slop.

blharr | a month ago

It is

: wolf4earth | a month ago
AI content is clearly marked as such.
The rest is written by me personally on my shoddy MacBook.

wiseowise | a month ago

Fr bruh rizzing hard to this og jester gooning.

hmokiguess | a month ago

Try LinkedIn

mkl95 | a month ago

It's very unique to LinkedIn. OP's prose is difficult to process even if you've abused your brain for years with LinkedIn content, though. In a more merciful timeline, only people like James Ellroy or Cormac McCarthy would ever attempt to write like that.

: wolf4earth | a month ago
Or people like the neuroqueer author.

wolf4earth | a month ago

It's a synthesis of multiple problem domains that thought they were special. When the truth is: they weren't.

It was fragmented? Good.

Welcome to reality. ;)

Frost1x | a month ago

It’s interesting because I’m seeing some emerging conversations where users are tending to prefer general agents that have their preferential bias over more constrained or specially built agents, because there are certain arbitrary goal criteria they either have forced on them or want to force upon the agent and the general purpose agents tend to do well at this because they just trudge along and do whatever.

Meanwhile more specialized agents that try to add or enforce constraints around a problem space where certain aspects tend to be well established don’t sit well with a lot of uses. “No, you and general knowledge don’t know best, I know best… do this.”

I can see the use case for both but I’m seeing a whole lot more willingness to want confirmation bias, essentially to automate away parts of jobs and tasks people already do but in the personalized or opinionated way they’ve established, unwilling to explore alternative options.

So the general purpose agent structures that just kickoff whatever they can tend to favor best in terms of positive feedback from agent users. Meanwhile it to some degree ignores many of the potential benenfits of having agents with general knowledge and bounded by general established bounds. It’s basically the whole “please do parts of my job for me but only the way I want them done.”

People aren’t ready for being wrong or change, they just want to automate parts of their processes away. So I’m not sure “no” is going to sit well with a lot of people.

matchagaucho | a month ago

Agents can propose refactoring just as readily as humans.

If coding agents already read AGENTS.md before making changes, they can also maintain a TECHNICAL_DEBT.md checklist.

Keep the loop intact: AGENTS.md ensures technical debt remains in context whenever changes are planned.

grian42 | a month ago

haven't read through the crap poetry lol nobody got time for thatt but have experienced the same "i cant do that" - no agent yet, llms very eager to apply a bodge fix or something rather than going "this design is shit consider changing it pal lol", which is the "fix" i did myself.

nilirl | a month ago

Huh? The point of the article is that we should use git to store an LLMs output as it works?

How do any of the quotes and citations used coherently form that argument?

What is this writing style? Why does it feel like it doesn't want me to understand what the heck it's saying?

: wolf4earth | a month ago
The point of the argument is that meaning emerges in conversation. A session between human and AI is a conversation.
Current AI storage paradigms offer lateral memory across the time axis. What exists around me?
A bit branch is longitudinal memory across the time axis. What exists behind me?
Persist type checked decision trees within it. Your git history just became a tamper-proof, reproducible O(1) decision tree. Execution becomes a tree walk.
It works. And it's not production ready yet.

bronlund | a month ago

«Think ultra deep and analyze this article. Make a detailed list of the top five alternatives as to what he is talking about.»

throwway262515 | a month ago

Qwen, is that you?

My experience with it is that it tends to create such 3-word sentences when ask to write an article.