Is AI ruining our skills? Early results are in and they’re not good

63 points by jaypatelani 12 hours ago on lobsters | 38 comments

I don't think it's a problem that skills can atrophy. Programmers used to be good at writing assembly on punch cards and we make do without that knowledge. But for LLMs specifically, there are three things that should give us a pause:

A lot of existing social structures are built around liability / ownership of the work product. So in some situations, you get the worst of both worlds: your ability to evaluate machine output for correctness is deteriorating the longer you use an LLM, but you still go to prison if you vibecode your taxes too much.
The skills we're talking about here are... the quintessence of the human condition? Creating art, formulating ideas, making complex decisions, expressing emotions, teaching others. So, before we reach the point where every song, painting, film, book, and blog post is coming off an LLM production line, we should try to figure out what else is there to make us proud of being humans. If the answer is "nothing", then we have some interesting choices to make.
Although AI can work as a skill multiplier, it's also a surprisingly powerful sameness multiplier: https://lcamtuf.substack.com/p/the-100000-whys-of-ai . I think it's clear that content consumers have high tolerance for sameness; AI-generated op-eds about AI now make up a significant proportion of our social feeds, and half the top-grossing films are MCU sequels, prequels, spin-offs, and remakes. But on the content production end: the tools give you an edge at the expense of stripping away all individuality. So, how do you compete in a world where your output is perfectly fungible with n billion people who can also write a prompt?

I wish we spent more time figuring this out instead of having constant "AI will cure cancer" vs "BS machines!" pie fights - which, again, are increasingly automated with LLMs.

jibsen | 5 hours ago

I don't think it's a problem that skills can atrophy. Programmers used to be good at writing assembly on punch cards and we make do without that knowledge.

I think it is less the practical skills like assembly or Rust, but rather the experience that makes you able to judge things like what is good design, or what possibilities for optimization it might be worth exploring.

wrs | 5 hours ago

All analogies are bad, and today I’m trying this one. Bear with me, it’s a slow Sunday and the coffee hasn’t really kicked in yet.

Photography was an exciting, hyped technology. It had obvious practical benefits, but there was a strong argument that it would destroy the visual arts. Why bother spending years learning to draw and paint by hand if anybody can just point and click? And indeed, now we produce a trillion (?) photos a year, very few of which are as interesting as the average painting.

But of course the actual art wasn’t painting, it was image-making. Painting was just the last step in the process. Qualities like composition, lighting, texture, choice of subject, and — most importantly — human experience still apply to photographs. Thoughtless pointing and clicking captures an image but it doesn’t make art.

“Artless” photos aren’t useless; they’re great for documenting parties, remembering that you once saw the Golden Gate Bridge, and millions of other things you would never have commissioned a painting of. But we all see that those are in a different category.

Artistic and practical images used to be done by the same highly educated people using the same techniques, and both used to be expensive and rare. Photography democratized the technique and made practical images essentially free to create, but the fundamental skills required to make an artistic image are pretty much the same as ever. And I’d guess the auction value of a good original artwork remains higher than a good photograph.

emk | 5 hours ago

Some of the underlying anxiety here is that these models are barely 4 years old, and we've gone from "can sort of write a factorial function" to "can one-shot a 3D adventure game demo, complete with quest-givers." Or "Can one shot a custom (toy) programming language."

And I remember what I read about horses. The first steam engines were wildly impractical, and they were basically only used in coal mines because coal is nearly free in a mine. Obviously, they weren't going to replace horses.

And in fact, the horse population went up for decades after the introduction of the steam engine. But eventually, various sorts of internal combustion engines became an almost fully general replacement for horses. And the horse population dropped.

So now I wonder, how many more years will it be before we build a fully-general replacement for the human mind? And what would happen then?

I may complain about the quality of Fable's code. But it can complete medium-sized projects end-to-end in a couple of hours for a few hundred dollars. And no, it isn't quite yet a fully general replacement for the human mind. But it's another big step closer.

Replacing your paintbrush with a camera isn't quite the same thing as replacing your own mind with something that thinks for you. That's a step where I fear analogies will begin to fail.

orib | 4 hours ago

Geoff Hinton has said that an optimistic endpoint is that humans have a similar relationship with AI as a toddler has with their parents.

I leave it as an exercise to the reader to imagine whether Musk and Altman are well enough aligned with the needs of humanity to get us to an outcome nearly as good as that.

wrs | 3 hours ago

So far, as someone who uses LLMs extensively, they are nowhere near thinking for me. They have no idea what to do until I prompt them, and no way to judge the result unless I tell them how. Much like my camera.

esak | an hour ago

The consensus (outside of disconnected management, ai-pilled true-believers, and marketing-savvy CEOs) seems to be that LLMs aren’t, currently, and likely won’t be, capable as “a fully general replacement for the human mind”. Whether or not “fully general” is possible remains debatable (e.g., substrate independent consciousness is still squarely the stuff of sci-fi). However, LLMs and agent-based workflows are clearly a step in the direction of wholesale automated cognition - a point that the TFA makes and that the parent touches on. The more you leverage them to solve your problems in some domain, the more your own problem-solving skills to atrophy in that domain. As that domain specificity gives way, so too will your problem solving skills more broadly. It takes something a lot less than “a fully general replacement of the human mind” to become a crutch that wrecks your ability to solve hard problems on your own. You can use a calculator, horseless carriage, or camera analogy, but the idea of “intelligence on tap” strikes me as a net loss in some sense.

wrs | an hour ago

I understand and respect the argument, but in practice it just doesn’t seem like “intelligence” to me, at least for software. It feels closer to eliminating brushstrokes than eliminating composition and choice of subject, much less choice of goal and purpose, which to me is where the “intelligence” comes in.

: varjag | 15 minutes ago
You can split hairs whether it's worthy of being called intelligence, but it already turned software development into another bullshit job.

orib | 3 hours ago

Can you define a Turing test that would tell you if someone was thinking or not?

(Does a human who is dragging their feet on a task think?)

: wrs | an hour ago
Perhaps to avoid a side argument on whether they’re “thinking” at all, I should say they’re nowhere near replacing the need to have me involved and thinking.
: k749gtnc9l3w | an hour ago

Does a human who is dragging their feet on a task think?

Depends on how inventive the feet-dragging is!

But yeah, it's a spectrum, and many tasks can be done «mindlessly» which is not literal zero but relatively low.

Just comparing how much heatwave impairs understanding or planning a novel proof, versus explaining previously understood things to more heat-resistant colleagues, tells me that explaining an already known plan is not full thinking.

orib | 4 hours ago

What do you value about the human condition?

lcamtuf | 5 hours ago

Correct me if I misunderstood what you're saying, but I think you might be making two arguments in defense of the tech, right? The first one is that AI is just another tool for creative expression - just how photography supplanted paintings, but didn't kill visual arts. The second argument may be that not all uses of technology need to serve an artistic purpose, just how a photo can be useful even if it's not good.

If I'm reading this right, I think it's somewhat tangential to what I'm trying to say: I'm not rejecting AI, I'm just saying that it is different in ways that should probably give us a pause.

But to address your points, I don't disagree with #2. LLMs have plenty of beneficial uses; my subjective POV as an old-fashioned internet user is that I encounter negative externalities more often than I benefit from the tech, but I accept that it's different for other folks. For #1, I get it in theory, but I'm not seeing it in practice, probably because the creation of wholesale AI art doesn't leave a whole lot of room for individualistic perspective or learned skill. You're just writing a short prompt; further, prompt generation can be automated with an LLM too. Do human prompts have a "spark" that a machine could never approximate? I don't know... maybe for now?

wrs | 3 hours ago

On #1, the analogy is that the result from a short prompt is like a point and shoot picture. You didn’t put any effort into it, you just wanted to get a quick result. And we all know it when we see the picture. That can be useful, but it’s not art.

My real-life pursuits are software, music, and a little bit of photography, and in all three areas I currently seem to have no trouble distinguishing between the result of a one-shot brief AI prompt and a lengthy session of a human using AI assistance.

Assertions that humans are replaceable at a high level just seem unfounded to me. That’s not to say we won’t see a lot more artless machine-generated artifacts, just like we see a lot of artless photos.

(Edit for clarity: I’m not “defending the tech”, and in no way trivializing or dismissing the vast uncertainties and concern here — just trying to relate it to a process that’s been going on long enough to characterize with some perspective.)

orib | an hour ago

I would be curious what your score is on this test: https://ai-art-turing-test.com/

Note that this is well over a year old at this point, and the state of the art has advanced further since.

wrs | an hour ago

That test is uninteresting because it’s completely context-free. To give the reverse example, humans can draw photorealistic pictures, but we don’t say that makes them equivalent to cameras. The art is not in the technique.

It would be more interesting to compare images made without a prompt to images made with extensive human prompting.

orib | an hour ago

I take it that means you failed to tell apart the AI and human art.

wrs | 52 minutes ago

I didn’t even try. But I don’t judge human art without context either. (I might judge technique, and certainly AI is good at copying technique, I think we can all see that.)

: orib | 16 minutes ago
So, when you say

in all three areas I currently seem to have no trouble distinguishing between the result of a one-shot brief AI prompt and a lengthy session of a human using AI assistance.

can you tell me what you mean by distinguishing the results?

steveklabnik | 4 hours ago

https://en.wikipedia.org/wiki/The_Work_of_Art_in_the_Age_of_Mechanical_Reproduction

k749gtnc9l3w | 4 hours ago

So, before we reach the point where every song, painting, film, book, and blog post is coming off an LLM production line, we should try to figure out what else is there to make us proud of being humans.

Careful with that question. Even without asking about LLMs, what is there now to make us proud of being humans, full stop, no in-group restriction? How many people would already answer «nothing»?

As a decades-old observation goes, intelligence needed to bring down the civilisation goes down, and the intelligence needed to see a reason «why not» goes up. LLMs might have accelerated one or both of the trends, but surely haven't started either. Although, there is a risk of moving the crossover point quite a bit closer.

tools give you an edge at the expense of stripping away all individuality

There are many dimensions, and one chooses which to give up.

perfectly fungible with n billion people who can also write a prompt

On the one hand there are people who would meaningfully compete on prompt.

On the flip side it is a bit of Hollywood problem, at least in English, given that AO3 and Jamendo and… DeviantArt, I guess? seem to provide practically unbounded amount of no-charge-for-personal-enjoyment art, with best examples very well done and probably with wider range of originality than commercial offerings.

I still regret the loss of «sort by popularity, skip to the last page» on Jamendo, fallen at the hand of infiniscroll. Being good at being popular comes at a price, sequel-itis just put it under a sharper light, and artificial-neural-net models might simply increase the contrast yet a bit more. But for the consumption types where current popularity-optimised stuff is suitable, maybe a flow will be found where people won't mind anyway?

scraps | 4 hours ago

I think it's clear that content consumers have high tolerance for sameness; AI-generated op-eds about AI now make up a significant proportion of our social feeds

The latter isn't really evidence of the former; the fact that something is in your feed isn't necessarily a sign of success. I'm under the impression engagement numbers on social media are down, not up, and that the preponderance of AI content is making people pull back from social media. Social media seems to be losing cultural relevance, not gaining.

Using the volume of content as evidence is a confusion of cause and effect. The content in a user's feed is the cause. The desired effect is the user's behavior from consuming the feed. Let's say the content is short videos of people telling jokes in front of brick walls at comedy clubs. The thing to measure isn't the number of clips: it's the number of laughs.

In your blog post you use number of books on Amazon as an example. In that scenario, the number of books is actually the input, not the output: the output is purchases, not books. Gary Marcus wrote about this exact example recently: the number of books on Amazon has gone up, but the number of book sales has gone down over the same period. https://garymarcus.substack.com/p/slop-productivity-and-why-the-ai

emk | 7 hours ago

I discussed a related issue in my weekend update here on Lobsters. Basically, I just refuse to do blind vibecoding for anything beyond throwaway experiments. If I commit something, I want to understand it, and to be happy with the quality. And during the few days I used Fable, I wasn't happy with the quality of the code it wrote: There were lots of subtly-cut corners, tons of code to silently recover from things that should have been treated as invariant violations, and a tendency to throw another thousand lines of code at the wall instead of teasing out an actual architecture. For some kinds of software, sure, you can get away with this. But for the core infrastructure that does the heavy lifting, I don't think it's wise.

But there are some serious problems with relying too heavily on the models:

Reading code will never produce the same depth of insight that writing it does.
A series of individually plausible-looking PRs can add up to a real hairball of code.
Popular wisdom says that reading code is harder than writing it.

And so I'm realizing that while I'm OK with some AI, I really want to spend a lot more time with my hands on the keyboard, and with my understanding fully engaged. As I noted in my earlier post, my personal sweet spot is somewhere between Qwen3.6 27B and DeepSeek V4 Flash. Which is, coincidentally, about the size range of models that a programmer could run locally if fast RAM were cheaper.

Fable, well, I'm really concerned. It does far more "mostly OKish" work than any human could ever hope to understand or review. Not sure where this will end up.

mattgreenrocks | 5 hours ago

Yep. It is a OK-ish code generator. Current models won't write great code reliably. The fact that even devs don't care/discuss it much says a lot.

Claude tends to write too much code, interpret requirements in a maximalist manner, and be overly defensive. A lot of time with Claude is spent with me essentially gaslighting it with, "but is there a simpler way?" (There usually is.)

: erock | 2 hours ago
Well, when you can have it gen some code while you do house chores, it’s hard to complain if it isn’t perfect. SWEs are more than happy to slop especially when execs are requesting it. There’s no judgement there, but we will see blowback when the impact settles

alandekok | 5 hours ago

For code generation, anything past a few lines or a "one off" script is pretty bad. After all, AI is trained on the Internet, and the Internet is full of garbage.

What I've found is that it's main use is in processing large amounts of data. "Summarize this", or do this vague set of tasks". Those are all things which would have been difficult before.

So if you have a good architecture in mind, the AI tools can help you craft a lot of things quickly.

simonw | 9 hours ago

For anyone without a Nature account, the Anthropic report it uses to talk about computer science skill atrophy is this one: https://www.anthropic.com/research/AI-assistance-coding-skills

Student | 9 hours ago

Replying because you link the actual report.

The study had junior developers do a 35 minute task with or without AI. The AI group overall scored much worse on a quiz about the library. The best performing AI users did as well as the worst performing non-AI users on the test. The best performing AI users also did various things to build understanding like asking for explanations.

I don’t think we can conclude that AI users will never learn adequately. But we certainly can conclude that learning by doing is faster than learning by watching and asking questions.

TBH this reinforces my prior that AI is a lot like cars in terms of benefits and drawbacks. And as someone who has been vibecoding hard recently I definitely need to make an effort to pull back at least a little bit.

orib | 4 hours ago

Maybe a better analogy would be a chauffeur, with opinions on where you should be going. The better AI gets, the less qualified people will be to pick their own directions.

"Look, I told Claude to get me to a bar. I got shit faced, of course I don't remember where it was."

: k749gtnc9l3w | 4 hours ago
If we are drawing analogies to existing technologies with known and stabilised deployment track record, it's GPS navigation. There are some places where local police gets annoyed at the readiness of people to literally drive into a large body of water if Google Maps tells them to continue straight ahead.
: Student | 4 hours ago
I think it depends how you use it. But yes the maximum vibecoding you can do it is using it like a chauffeur.

[OP] jaypatelani | 9 hours ago

Thanks for link. I think AI should be not allowed to Masters or even Ph.D.

lake | 5 hours ago

Once physicians began using it, their performance dropped significantly whenever the system was unavailable. During the three-month period before the AI tool was introduced, the specialists found at least one adenoma during 28.4% of colonoscopies. During the three-month period after the tool was introduced, the adenoma detection rate for colonoscopies performed without AI assistance decreased to 22.4%.

I'm scared of the equivalent happening to me. Our brains are very efficient at pruning connections that don't seem worth keeping up. That's one of multiple reasons I am currently drawing a red line at having LLMs generate my code. I just know the second I get used to that, my brain would start to screech if I then went back to writing code without them.

: gspr | 3 hours ago
Ditto. I find them moderately useful for search, moderately useful for ideas for debugging when I'm fresh out, and very useful for giving clues about where I should start focusing my attention to understand an unknown-to-me technical system (say a giant, unknown codebase) from a particular angle.

But these things will not write for me – neither code, nor documentation, nor notes, nor papers, nor emails, nor presentations, nor "funny" gags.

vegai | 4 hours ago

Is AI at the same time so weak that it cannot do anything right, and so powerful that it will stop us from thinking?

Whatever you spend your time doing, you become good at. So be mindful to direct your actions to a fruitful direction when LLM takes most of what you have been doing until now.

: k749gtnc9l3w | 4 hours ago

Is AI at the same time so weak that it cannot do anything right, and so powerful that it will stop us from thinking?

That would be an almost good description of how a lot of people interacted with 2021 google.com — except «search» post-quality-nosedive instead of «LLMs», and probably will be a good description of how a lot of people will interact with 2027 google.com (now with LLMs).

Both can get some things right, sure.

hyperpape | 2 hours ago

The Lancet study seems interesting. I found a preprint of the Lancet study[0], alongside some quotes in reaction to it[1]. The fact that colonoscopy volume doubled and the fact that the confidence intervals are pretty wide both give me pause.

It occurred to me to wonder if AI use could drive an increased number of colonoscopies, as they have more time to schedule the colonscopies when doctors are using AI for diagnosis. I have no idea if that's plausible or not, though.

[0] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5070304

[1] https://www.sciencemediacentre.org/expert-reaction-to-observational-study-looking-at-detection-rate-of-precancerous-growths-in-colonoscopies-by-health-professionals-who-perform-them-before-and-after-the-routine-introduction-of-ai/