On mobile, It’s hijacking my scroll in such a way that I literally cannot move further down the page. And “reader mode” is only showing me the first paragraph or so.
I’ll have to try again later on desktop. The content looks interesting but it’s literally impossible to read. I cannot get past the section that introduces Ernst and Young.
On desktop it keeps adding forced pauses to scrolling, of varying sizes, and you need to scroll down a between 1 and 10 pages worth to begin scrolling again.
It might "work" just fine on mobile (or not) but you may have stopped trying before reaching the point of re-scrolling, because it's insane.
I eventually managed to get far enough into the article that I thought I saw the main stat - the stat that 26% of the citations were hallucinated. Then the scroll threw me back to the top again and I gave up entirely on reading from my phone.
Coming back later on desktop, I see that the percentage keeps climbing the further you manage to make it down the page. The real stat is 60% of the citations were hallucinated.
I've stopped reading because of it. I can't scroll. Was this thing vibe-coded? Funny they are picking on EY for not reading their reports but it looks like they didn't test their website.
The problem we're seeing across many professions is AI output is not getting vetted by knowledgeable people, whether it's an experienced analyst, senior engineer, expert attorney, or the resident physician. At best they skim, at worst they don't even see it at all before it's published, pushed to production, distributed to clients, or submitted to the court.
In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.
Anyone remember that item a few months back about Amazon now having senior engineers vet generative AI output (https://news.ycombinator.com/item?id=47323017)? I had to LOL when I read that. These folks are already slammed. And the idea that Amazon would allow human bottlenecks to multiply across projects and underlying infrastructure development is ridiculous.
Part of the problem: you get given a complete document to review after it's been fully baked.
I'm pushing the need for basic engineering principles across whole organisations.
You wouldn't give an engineer 1000 lines of code to review without the original spec of what you're trying to achieve for context (at a minimum, ideally the reviewer was in the room when the work was introduced, and has full context).
So, these docs, they're given as an all or nothing.
Do you push back on the 39th metric that is defined to the utmost detail? Or just resign yourself to the fact that it is what it is?
A one (6 is the goto if we're talking Amazon?!) pager.. "this is what I am proposing" at least gives the skeleton of the idea to push back at the general shape of the idea, refine it, before all the emotional investment of your precious report being complete.
Y'know.. the traditional product running through the spec in a SCRUM* environment.. the engineers doing proper code reviews..
I've had this situation and basically just had to throw out stuff that was written because its completely terrible/wrong. Either start again or just give up.
So many people get buried in tech debt due to constantly choosing "let's be productive now and iterate later". Then later never comes.
This was already proof that, when a company is given the chance of getting a reward now and putting in the effort later, they will take the reward and postpone the effort indefinitely.
AI now offers "generate now, review later". Fill in the pattern.
We're cooked and it's not due to AI, it's due to the fundamentals of an economy and society that only sees the short term.
> In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.
I think a lot of the time it's just pure laziness. AI gives people a magical "do all the work for me" button and it can bring out the worst in them.
I constantly battle this dichotomy where I care about the work I do but I also cannot possibly care about the corporate model, given 0 ownership of flawed processes across the org and the looming layoff that'll happen any day now.
Some people are given the button and really do not care.
> AI output is not getting vetted by knowledgeable people
You mean the people they fired and demoralized?
One of the things that "great [wo]men" like about "vibe-coding" (and that includes blindly producing non-code product), is that they, and they alone can now do what used to require the painful process of "passing it to context experts."
Now, the LLM is a "built-in context expert," and they don't need to vet the output anymore.
> Now, the LLM is a "built-in context expert," and they don't need to vet the output anymore.
Serious orgs are going to have to figure out the human layer. It will be needed, no matter how 'hallucination-free' the AI tooling gets. AI will still have some spectacularly bad fuck ups or even worse time bombs that get embedded in a system and don't become apparent until months or years later.
A lot of this will be dumped on existing staff with predictable results as they don't have the bandwidth to do it right. I can envision "output compliance" or "AI QA" becoming dedicated positions at many orgs. It's clearly needed.
Let's be honest, how many orgs are really serious? Playing the game of the day for shareholder appeasement is taken far more seriously than whatever the domain experts might think.
If the main job is putting out a report, starting with AI is wrong in any case. What's the value of an AI-generated report, even if experts fix the biggest issues with it? Maybe this kind of report didn't have all that much value before, I don't know. But starting with AI just makes sure it's generic drivel.
As an attorney, I feel like vetting AI output takes longer than just doing it from scratch, let alone versus just using a traditional form.
With AI, I have to read through everything, often explain why it's wrong, and then rewrite everything anyways. I mean, I get way more billables, but I think it's symptomatic of how AI loses its advantage of being quick and accessible to those who don't understand the subject matter.
It's not really any different in programming. Like if you have a well structured code and want to do a clear refactoring across it and you know what to expect, it can speed things up. But if it's generating any significant (and relatively complex) new code, you have to go through the whole thing manually again and then you find out you have to fix way to many things and get bogged down in different paths the AI didn't do correctly.
Of course, it's pretty much impossible to hear a dissenting point of view today and everyone is going crazy on these drugs. I might be hilariously wrong but I think this is the best time to start a software company.
I think its the perfect time to be contrarian - think about it. If youre wrong - So what? The world will have changed for everyone in the field. If you are right? You stand to be positioned to win big financially whilst everyone elses brain is rotting away.
Fact-checking and editing a mediocre piece of writing be way harder than writing from scratch. Proving that something isn’t true or can’t be substantiated is hard work, and so is arguing that a word choice is subtly inappropriate.
And making a ton of corrections to a document everyone was hoping was ready to go is never fun politically.
I have experienced this several times lately when writing software with claude/codex. Sometimes vetting and steering the agent takes longer than it would have taken me if done manually. Sure you can just decide not to vet the output and go into full vibecode, but agents tend to do a lot of dumb things (such as not deleting unused private methods or having temporary variables that are not needed).
In my experience the most effective work pattern for me is using agents to perform research and feedback on high level design, then I write the code manually, then I ask the agent to review the code for potential bugs/issues and fix those. The agents have a much easier time making small changes once the design is 90% there without going fully off the rails and generating slop.
I am working on writing skills to make the agent better but it is a bit painstaking. For example I had to write this inside of a skill because sometimes the agent would just stub out methods and leave TODOs: “always fully complete the requested task before finishing edits unless input is needed”.
This is the realization I had too. We had a manager update a policy at our org. He just shit it out through AI. It had tons of mistakes, people who read it had questions. Not only did it have mistakes it was causing people to do things in a way that added a manual step when an automatic process existed. Then the engineer VP commented on it asking the original author what its about who then had to bring it back up to the attention of the manager who made the first change.
It wasted many people's time, probably an order of magnitude of time wasted (and money) than if the initial person put a modicum of effort into making it right in the first place. Instead they hand it off to their life partner claude and just assume its good enough.
It's to the point where I am feeling insulted when I get ai slop like this from people. If I am expected to perform at a high level then I expect that at the very minimum the slop throwers will proof read their slop.
You can also feed the document or source file to another frontier-level model, ideally two others, and tell it to vet it aggressively. The goal is to goad the models into erring on the side of false positive findings rather than potentially missing true positives.
I find that if Gemini Pro agrees with Claude Opus 4.8 and GPT 5.5 on something, it's almost certainly correct at a level where I wouldn't be likely to catch any errors myself.
Another attorney here. I understand your plight. But I can't believe law firms are sending out briefs and opinions without carefully checking all of the citations. I mean, even when Lexis or Westlaw identifies an (actual) case on point, you still have to check if the case has been overturned, whether it is truly on point, or if it can be distinuished from your case. So even if the cited case is not a halucination, someone would still have to read and analyze the cited case in the context of the present case.
>As an attorney, I feel like vetting AI output takes longer than just doing it from scratch, let alone versus just using a traditional form.
This is my issue with AI.
In the type of work i do the work needs to be precise down to the context of how individual words are used.
Having AI pump out 20 pages of content but then me having to go through the 20 pages word by word, cross checking references and prior statements is going to take a long time.
Not to mention I didn’t write it, so my brain doesn’t already know what’s been written so it takes several passes to confirm its complete and that it all fits together.
I find it easy to just write it myself use AI for the more menial tasks like logic check, completeness checks, etc.
One task AI was very useful in was where we wanted to understand the gaps in a submission relative to a process requirement document. We didn’t care if the output was 100% complete or perfect, we just wanted a few examples.
Being able to input a couple 300 pages documents and have AI spit out a dozen examples in 30 seconds was a huge time saver.
Also wondering on this whole review process with someone who wrote it with AI. Even if you comment and noted all issues. Do they have skills or willingness to correctly correct it all? And how many times would you need to keep the loop going for error free outcome? Is there even enough calendar time for that?
> The problem we're seeing across many professions is AI output is not getting vetted by knowledgeable people, whether it's an experienced analyst, senior engineer, expert attorney, or the resident physician.
Yeah probably not for the same reason I left VFX rather than have a lifetime of completely disregarding my own generative creativity and cleaning up LLM-generated bullshit. Fuck that. Double-fuck creating ‘content’ to train the models.
In code, LLMs automate away a lot of the drudgery. I wasn’t sad to avoid spending a couple hours looking up the usage patterns and idioms for some ported library, or do some rote task that didn’t make the project significantly better. In most other jobs, they automate away the only fun part and leave humans with all of the drudgery.
The tech industry has always been arrogant to some extent, but assuming the world of talented professional knowledge workers and creatives would be content to professionally proofread, apply lipstick to pigs, and polish turds is a whole new level of out-of-touch. I’d rather live out of my car and dig through the garbage for bottles with deposits.
I loved solving problems with code but I don’t think it’s controversial to say most people enjoy the novel problems much much more than pulling apart an obtusely designed, non-idiomatic API. The AI is better at the details than the novel problem, so whoever is driving still needs to sort that out. In art, writing, and so-on, it only does the interesting part, but it does it poorly. So for professional use cases, it involves a lot of tedious cleanup of other people’s art that a machine (very economically) amalgamated into a pile of shit, and you’ve no opportunity to do anything interesting. At all.
But wait, if knowledgeable people have to vet the output, the process will not be 10X faster and you will not be able to fire the knowledgeable people. Therefore, your objection makes no sense. QED.
> In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.
This is an interesting topic. We treat vetting output the same as doing the work ourselves, but that is not the case.
Doing the work is not the same as reviewing work done by others.
I have heard reports of software engineering companies that have gone full agentic. Their seniors only review stuff written by LLMs and it burns them out, because they have to switch context constantly.
I find this interesting because part of being a senior developer is that you are experienced enough that you won‘t make grave mistakes anymore. This is the case in many professions: you are relied upon to not make grave mistakes.
But those same people are now swamped with stuff that they are not able to review, so they will let a grave mistake slip through at some point.
>>The problem we're seeing across many professions is AI output is not getting vetted by knowledgeable people
I am particularly interested in Education and Human Knowledge Management. I have seen the rate of IT training going to zero. Think about specialized training, where if you make a mistake, the consequence of your errors, are talked about on the tv news of the evening.
The whole idea everybody is just planning to save their butt, using these strings coming out of these numeric matrices, while suspending judgement, just shudders me in horror. A bit like those South Asia Airline companies, that were forbidding their pilots from landing airplanes with manual piloting, leading to an increase loss of skills causing some well known disasters...
If well paid consultants cant even bother to check their links...
>> Amazon now having senior engineers vet generative AI output
Software and news are apples and oranges. The software engineer looks at AI code to spots errors and make/suggest corrections. But the software still exists. A doctor vetting a news article ("Owning a Boat Causes Cancer") isn't there to tweak language. The doctor's job is to stop the entire article. It would be like the amazon engineer deleting all the source code and telling the dev team to abandon the project. That will never be a popular task.
It's similar to tokenmaxxing in various companies. Who cares about reviewing the design and the code, just code generation all the way. If it runs it's good
It looks like when you sum up: the cost to generate information using an LLM + the cost to actually verify the information, the result on average is the same as not using the LLM. That does not mean some times it isn’t faster and cheaper. It just means other times it’s slower and more costly. This along with different people’s tolerance for accuracy explains why we see such diverging experiences with it.
So in order to make it pan out the forces at play are trying to make everyone believe we now have to accept wrong, even dangerous results.
The trope about external consultants is that your VP brings them in to review the company, and they talk to everybody and write a report on how to improve the business, and the report says exactly what you've been telling your VP but they've been ignoring you.
They are paid to justify decisions executives have already made. It's often referred to as due diligence, but in practice these reports mostly just allow executives to tell the board it wasn't their fault if it goes wrong.
This sort of thing is a complete embarrassment to a firm like EY, where people are paying them a lot of money for advice. They’ve basically demonstrated that their market leading research is just someone asking questions to ChatGPT.
If you ever needed evidence to not buy “advice” from such outfits, this is exhibit one.
Hopefully they at least fired the partner that published this steaming pile of AI slop.
Executives pay them a lot of money to launder blame. If a project fails after consulting EY, well, what can you do. If a project fails without consulting anyone externally, it's obviously a failure of the executive.
Exactly--they're paid a lot of money for their reputation, which is valuable in offering cover for politically difficult decisions. This was certainly net-negative for E&Y's reputation.
The Big Four have become a shadow of their former selves. They have become so risk averse that their advice is already incredibly generic and non-actionable.
I think their audit work is in a downwards spiral. Audit has become so competitive that they are struggling to find ways to make it cheaper. They have become slaves to reducing the hours booked, and the rate of those hours. To do this they substitute less experienced people all the time. You used to be able to chat with your partner about an issue you have coming up, now you get their assistant if you are lucky. By chasing 'efficiency' they have lost their value-add. Now the first time the partner has looked at your file is right before the clearance meeting, and they spot issues that should have been picked up earlier and tested on the day you should be signing. So you end up doing it all again. I'm trying to coin a term for the inneficiency caused by chasing efficiency.
I worked at a top 5 hedge fund in the early 2000s. They had a large team of E&Y auditors onsite at all times that I worked somewhat closely with.
Some things stuck out at me:
- They were all in their early 20s.
- They were all incredibly checked out. Honestly they still seem like an outlier to me decades later.
- They partied hard. Yes, with drugs.
- Most of them were in rotating intimate relationships with each other and unusually open about it. Office scuttlebutt was literally "who is fucking who this week".
- They seemed busy for maybe two or three weeks out of the entire year and then it was long stretches of Minesweeper/Solitaire.
I filed this away in my head as "provides no value" and that was decades ago. If the industry itself is worse off today I can't imagine how much worse it actually is from my experience.
You definitely could chat to your audit partner in the past. Otherwise it is the same old bunch of recent graduates giving out advice that they have zero experience to support
I don't quite get it why they can't take another LLM and vet the output of the first with the second one. Surely they would not have the same hallucinations and would be able to detect hallucinations of the earlier LLM. Maybe it would cost too much in terms of tokens?
I don't know but I would expect it to be realtively easy for an LLM to detect "hallucinations".
>I don't quite get it why they can't take another LLM and vet the output of the first with the seond one.
I think this may be part of the problem. The actual humans creating the report don't have the expertise to know which one to trust. At least that was what consulting was like in my experience at a similar firm.
Because they used LLMs to do the work. What you are suggesting is to use the LLMs to create more work, which is counter to the shortcut they were trying to take.
Good point with some irony. Thye don't want to do a better job they want to do an easier job. But a company like E&Y should realize shortcuts like these don't work. And their customers are paying them.
> I don't quite get it why they can't take another LLM and vet the output of the first with the second one.
Yes, this technique and its variations[1][2] "work" but it's still not 100% perfect. And it's not as widely used it might be because, among other reason:
a. it takes longer to implement
b. it costs more (more tokens spread across multiple llm calls)
c. higher latency (getting an answer takes longer due to multiple llm calls involved)
d. the final answer is probabilistically more likely to be correct, but is still not guaranteed to be error free, so you can never fully escape the need for Human in the Loop.
I am not exactly sure if this would solve the overall problem. The main one being lack of oversight. The solution to a social issue generally isn’t to throw more technology at it.
I think the AIs don't have enough information about the problem. There's many things those who wrote the prompts forgot to mention. And some of it maybe is tacit knowledge?
Then, it doesn't matter if you add 1000 frontier models -- they still can't generate a good report.
But yes I suppose you can get rid of hallucinated citations though
I don't quite get it why they can't take another LLM and vet the output of the first with the second one. Surely they would not have the same hallucinations and would be able to detect hallucinations of the earlier LLM. Maybe it would cost too much in terms of tokens?
I don't know but I would expect it to be relatively easy for an LLM to detect "hallucinations".
What’s strange about how things have developed is that this report 12-18 months ago would have been a massive scandal and would have caused durable brand damage.
This proves (again) one think for sure: The "Big x" Consulting Firms were always BS - and now them generating all their work themselves using LLMs just profs that their 'clients' can just skip their Million Dollar fees and just ask the LLM directly.
Basically the entire consulting industry should die due to AI.
Performative executives of yesteryear that constantly need external validation and direction and operate through hive mind and groupthink are weak and will die.
I believe some of the biggest problems in today's business leaders are an inability to be open to new information, to think across traditional professional boundaries, or to ask meaningful questions.
AI simply exposes this unapologetically.
Bad management (this includes most government): up your game or get out of the way.
Firefox has a handy "Reader view" (Opt + CMD + R on Mac) that you can activate to get a stripped down view of just the text on the page. Unfortunately, it also removes the images which contain some of the sources they use.
The real comedy is seeing this garbage come down from senior management, clumsy prompting, hallucinated garbage that’s all fluff and zero actionable information, zero real informed analysis. “See this analysis of our support issues from jira, we must fix these top three problems!!!” And it’s all the stuff everyone has known for years but management has refused to give anyone the authority to fix anything. I’ve seen this more than twice now; needs a name. Garbagemaxxing?
> we must fix these top three problems!!!” And it’s all the stuff everyone has known for years but management has refused to give anyone the authority to fix
I wish we could just stop destroying people's jobs and lives using AI. The statistics I have heard quoted say, that merely 25% of the people actually like their job. Meaning they like doing what they do for its own sake, not because it gets them money, which they desperately need to live. I get it, most people don't want to do the work. But can we stop ruining the jobs of people, who are actually dedicated to their job and would like to keep doing their job properly?
But I guess since EY is a CYA hedge anyway, no one really cares about whether the reports are hallucinations or not. Someone high up spent money on EY, so that they can justify some decision and won't be held responsible that much, when it turns out the decision was shit. All that matters to them is, that it has the appearance of something genuine and then they can base the decision on what they receive from EY, which better be what they already wanted to hear/read anyway.
>The statistics I have heard quoted say, that merely 25% of the people actually like their job. Meaning they like doing what they do for its own sake, not because it gets them money, which they desperately need to live.
Even people who like their jobs work because they need money to live.
My point is, that people who want to do their job properly are less likely to sling AI slop and be found out to do that, and that I wish we could stop destroying their jobs or lives, to chase stakeholder wet dreams of the companies they invested in letting go almost everyone.
I guess this is a great report, but the parallax landing page shenanigans disrupt my reading flow, you cannot easily scroll back to get a overview of the key facts, so I stopped.
Not by me, but by the mods. They also changed from "full of hallucinations" to "and most citations were hallucinated". Maybe a rep from "EY Global" filed a complain ;)
I did some ghost writing for EY. I wrote cheat sheets about international tax transfer pricing, mining and metals, and life sciences for its then CEO Mark Weinberger.
I had no experience and knew absolutely zero about any of those sectors.
Off topic but: the scroll mechanism on mobile is so horribly irritating and unpredictable that I just can’t be bothered fighting against it to read what sounds like at least a mildly interesting article.
Those are who rejected you for a job you applied for.. AI amplified the dunning kruger that unfortunately real experts in their field are overlooked now, because a wall of text with numbers sounds and look professional enough.
Any person with above average knowledge on a specific topic, can tell when AI starts hallucinating and making things up, or at least introducing new problems due to complexity added rather than solving it, that’s my observation using all top tier ones too, it’s like they are designed to solve a problem regardless so they start making things up or piling workarounds, a person with no deep knowledge in that topic will just copy it all and call it a day.
Just yesterday, I asked claude 4.8 on something specific that I know the answer for, it had a long list of solutions that none were close to the real answer, when I replied with the real answer and pushed back, I got the famous quote “you are right, thanks for pushing back”.
> Instead of releasing our results all at once, we're going to focus on one report at a time. This approach both prevents individual examples being overlooked and allows us to illustrate the negative impacts of vibe citing on research quality and public trust.
Not to take away from the actually great reporting here, but what they mean is, This approach allows them to milk it for as many clicks as possible.
People don't get it, this is marketing an example of what they could do for you. They can produce reports that say what you want to say, filtered through third party diligence and E&O policies, then take flak and blowback for tough policy choices. For the client there will be no consequences. It's not just ai slop or garbage, it's what makes them well worth it.
Slop signalling may be the new power play. Nothing quite says "FU" like a low effort AI hallucinations.
Ernst & Young again proving they're leading in the race to the bottom.
Why would anyone trust these large contractor companies enough to pay them the huge amounts of money to have juniors learning the ropes on their dime?
"Customers" were the content the juniors were trained on in the same way that scraped internet data is what LLMs are trained on, except Customers were paying for the privilege of being 'scraped'.
Now there are no juniors, just LLMs being asked questions that aren't specific enough, and assuming the answer is one-shot correct.
It saves E&Y lots of money though, and their (confusing) reputation will provide a surprising amount of momentum such that plenty of work will keep rolling in for a few years to come.
How does such thing even happen? I know for example in Qwen Chat or Perplexity, they produce citations on at the end of each generated sentence. So I can hover my mouse over each citation and see from which website that was scraped from.
Did they just prompt ChatGPT with no web search and copy-pasted it?
raro11 | 21 hours ago
bokkies | 21 hours ago
umpalumpaaa | 21 hours ago
snailmailman | 21 hours ago
I’ll have to try again later on desktop. The content looks interesting but it’s literally impossible to read. I cannot get past the section that introduces Ernst and Young.
1000100_1000101 | 21 hours ago
It might "work" just fine on mobile (or not) but you may have stopped trying before reaching the point of re-scrolling, because it's insane.
lelandfe | 21 hours ago
snailmailman | 20 hours ago
Coming back later on desktop, I see that the percentage keeps climbing the further you manage to make it down the page. The real stat is 60% of the citations were hallucinated.
kavok | 21 hours ago
bbddg | 21 hours ago
nntwozz | 21 hours ago
canyp | 21 hours ago
Some people should not be allowed to make a website.
IshKebab | 20 hours ago
csomar | 20 hours ago
chaidhat | 21 hours ago
331c8c71 | 21 hours ago
nilirl | 21 hours ago
cwillu | 21 hours ago
AshamedBadger56 | 20 hours ago
ilamont | 21 hours ago
In many cases the skills are available in house to do the necessary vetting, but these people are already overwhelmed with their existing day to day.
Anyone remember that item a few months back about Amazon now having senior engineers vet generative AI output (https://news.ycombinator.com/item?id=47323017)? I had to LOL when I read that. These folks are already slammed. And the idea that Amazon would allow human bottlenecks to multiply across projects and underlying infrastructure development is ridiculous.
ChrisLTD | 21 hours ago
Why?
SoftTalker | 21 hours ago
So if they're having humans proofread what the AI produces, they must have found that to be necessary.
ChrisLTD | 17 hours ago
bluefirebrand | 16 hours ago
_puk | 21 hours ago
I'm pushing the need for basic engineering principles across whole organisations.
You wouldn't give an engineer 1000 lines of code to review without the original spec of what you're trying to achieve for context (at a minimum, ideally the reviewer was in the room when the work was introduced, and has full context).
So, these docs, they're given as an all or nothing.
Do you push back on the 39th metric that is defined to the utmost detail? Or just resign yourself to the fact that it is what it is?
A one (6 is the goto if we're talking Amazon?!) pager.. "this is what I am proposing" at least gives the skeleton of the idea to push back at the general shape of the idea, refine it, before all the emotional investment of your precious report being complete.
Y'know.. the traditional product running through the spec in a SCRUM* environment.. the engineers doing proper code reviews..
* Yes SCRUM is dead, but that's another thing.
JoshTriplett | 21 hours ago
Not fully baked, worse: made to sound confidently correct, orthogonal to its actual correctness.
bradleyankrom | 21 hours ago
s0rce | 20 hours ago
torben-friis | 18 hours ago
This was already proof that, when a company is given the chance of getting a reward now and putting in the effort later, they will take the reward and postpone the effort indefinitely.
AI now offers "generate now, review later". Fill in the pattern.
We're cooked and it's not due to AI, it's due to the fundamentals of an economy and society that only sees the short term.
uxhacker | 17 hours ago
stackghost | 17 hours ago
xienze | 21 hours ago
I think a lot of the time it's just pure laziness. AI gives people a magical "do all the work for me" button and it can bring out the worst in them.
canyp | 21 hours ago
Some people are given the button and really do not care.
ChrisMarshallNY | 21 hours ago
You mean the people they fired and demoralized?
One of the things that "great [wo]men" like about "vibe-coding" (and that includes blindly producing non-code product), is that they, and they alone can now do what used to require the painful process of "passing it to context experts."
Now, the LLM is a "built-in context expert," and they don't need to vet the output anymore.
ilamont | 21 hours ago
Serious orgs are going to have to figure out the human layer. It will be needed, no matter how 'hallucination-free' the AI tooling gets. AI will still have some spectacularly bad fuck ups or even worse time bombs that get embedded in a system and don't become apparent until months or years later.
A lot of this will be dumped on existing staff with predictable results as they don't have the bandwidth to do it right. I can envision "output compliance" or "AI QA" becoming dedicated positions at many orgs. It's clearly needed.
anal_reactor | 19 hours ago
Once the hallucination rate drops below error rate of human workers, it won't be needed anymore.
cassianoleal | 19 hours ago
asdff | 19 hours ago
fabian2k | 21 hours ago
mminer237 | 21 hours ago
With AI, I have to read through everything, often explain why it's wrong, and then rewrite everything anyways. I mean, I get way more billables, but I think it's symptomatic of how AI loses its advantage of being quick and accessible to those who don't understand the subject matter.
SV_BubbleTime | 21 hours ago
I can’t cite “from scratch” for something outside of my knowledge but I side LLM training or assisted search.
Izikiel43 | 21 hours ago
I do the second approach for coding with smallish steps and the output is fine
csomar | 20 hours ago
Of course, it's pretty much impossible to hear a dissenting point of view today and everyone is going crazy on these drugs. I might be hilariously wrong but I think this is the best time to start a software company.
2fff | 19 hours ago
I think its the perfect time to be contrarian - think about it. If youre wrong - So what? The world will have changed for everyone in the field. If you are right? You stand to be positioned to win big financially whilst everyone elses brain is rotting away.
smelendez | 20 hours ago
And making a ton of corrections to a document everyone was hoping was ready to go is never fun politically.
__turbobrew__ | 20 hours ago
In my experience the most effective work pattern for me is using agents to perform research and feedback on high level design, then I write the code manually, then I ask the agent to review the code for potential bugs/issues and fix those. The agents have a much easier time making small changes once the design is 90% there without going fully off the rails and generating slop.
I am working on writing skills to make the agent better but it is a bit painstaking. For example I had to write this inside of a skill because sometimes the agent would just stub out methods and leave TODOs: “always fully complete the requested task before finishing edits unless input is needed”.
claaams | 19 hours ago
It wasted many people's time, probably an order of magnitude of time wasted (and money) than if the initial person put a modicum of effort into making it right in the first place. Instead they hand it off to their life partner claude and just assume its good enough.
It's to the point where I am feeling insulted when I get ai slop like this from people. If I am expected to perform at a high level then I expect that at the very minimum the slop throwers will proof read their slop.
CamperBob2 | 19 hours ago
I find that if Gemini Pro agrees with Claude Opus 4.8 and GPT 5.5 on something, it's almost certainly correct at a level where I wouldn't be likely to catch any errors myself.
jimmydddd | 19 hours ago
root-parent | 18 hours ago
Update your priors: https://www.damiencharlotin.com/hallucinations/
root-parent | 18 hours ago
"AI Hallucination Cases" - https://www.damiencharlotin.com/hallucinations/
refurb | 14 hours ago
This is my issue with AI.
In the type of work i do the work needs to be precise down to the context of how individual words are used.
Having AI pump out 20 pages of content but then me having to go through the 20 pages word by word, cross checking references and prior statements is going to take a long time.
Not to mention I didn’t write it, so my brain doesn’t already know what’s been written so it takes several passes to confirm its complete and that it all fits together.
I find it easy to just write it myself use AI for the more menial tasks like logic check, completeness checks, etc.
One task AI was very useful in was where we wanted to understand the gaps in a submission relative to a process requirement document. We didn’t care if the output was 100% complete or perfect, we just wanted a few examples.
Being able to input a couple 300 pages documents and have AI spit out a dozen examples in 30 seconds was a huge time saver.
But that was a non-critical task.
Ekaros | 20 hours ago
kloop | 20 hours ago
The problem is that output sometimes take longer to verify than to create in the first place.
That turns AI into a deeply negative ROI system for many applications.
DrewADesign | 20 hours ago
Yeah probably not for the same reason I left VFX rather than have a lifetime of completely disregarding my own generative creativity and cleaning up LLM-generated bullshit. Fuck that. Double-fuck creating ‘content’ to train the models.
In code, LLMs automate away a lot of the drudgery. I wasn’t sad to avoid spending a couple hours looking up the usage patterns and idioms for some ported library, or do some rote task that didn’t make the project significantly better. In most other jobs, they automate away the only fun part and leave humans with all of the drudgery.
The tech industry has always been arrogant to some extent, but assuming the world of talented professional knowledge workers and creatives would be content to professionally proofread, apply lipstick to pigs, and polish turds is a whole new level of out-of-touch. I’d rather live out of my car and dig through the garbage for bottles with deposits.
bluefirebrand | 16 hours ago
I don't know about "most other jobs", I feel like this applies strongly to software too. AI is being used to automate the fun stuff
You may view writing code as drudgery, I like doing it
DrewADesign | 15 hours ago
I loved solving problems with code but I don’t think it’s controversial to say most people enjoy the novel problems much much more than pulling apart an obtusely designed, non-idiomatic API. The AI is better at the details than the novel problem, so whoever is driving still needs to sort that out. In art, writing, and so-on, it only does the interesting part, but it does it poorly. So for professional use cases, it involves a lot of tedious cleanup of other people’s art that a machine (very economically) amalgamated into a pile of shit, and you’ve no opportunity to do anything interesting. At all.
wrs | 20 hours ago
fzeindl | 20 hours ago
This is an interesting topic. We treat vetting output the same as doing the work ourselves, but that is not the case.
Doing the work is not the same as reviewing work done by others.
I have heard reports of software engineering companies that have gone full agentic. Their seniors only review stuff written by LLMs and it burns them out, because they have to switch context constantly.
I find this interesting because part of being a senior developer is that you are experienced enough that you won‘t make grave mistakes anymore. This is the case in many professions: you are relied upon to not make grave mistakes.
But those same people are now swamped with stuff that they are not able to review, so they will let a grave mistake slip through at some point.
So they really can‘t trust themselves anymore?
watwut | 20 hours ago
root-parent | 18 hours ago
I am particularly interested in Education and Human Knowledge Management. I have seen the rate of IT training going to zero. Think about specialized training, where if you make a mistake, the consequence of your errors, are talked about on the tv news of the evening.
The whole idea everybody is just planning to save their butt, using these strings coming out of these numeric matrices, while suspending judgement, just shudders me in horror. A bit like those South Asia Airline companies, that were forbidding their pilots from landing airplanes with manual piloting, leading to an increase loss of skills causing some well known disasters...
If well paid consultants cant even bother to check their links...
roncesvalles | 18 hours ago
sandworm101 | 18 hours ago
Software and news are apples and oranges. The software engineer looks at AI code to spots errors and make/suggest corrections. But the software still exists. A doctor vetting a news article ("Owning a Boat Causes Cancer") isn't there to tweak language. The doctor's job is to stop the entire article. It would be like the amazon engineer deleting all the source code and telling the dev team to abandon the project. That will never be a popular task.
gloryjulio | 17 hours ago
erentz | 13 hours ago
So in order to make it pan out the forces at play are trying to make everyone believe we now have to accept wrong, even dangerous results.
mapontosevenths | 21 hours ago
It's unsurprising that trying to do more with less results in lower quality.
onlyrealcuzzo | 20 hours ago
There may be a lot of demand for do-nothing services.
A lot of corporate work is just do-nothing box-ticking.
Boss: get me a report about X, so I can give that report to my boss who won't read it.
You: E&Y, please get me a report. Here's $200k.
bombcar | 20 hours ago
fragmede | 20 hours ago
2fff | 19 hours ago
they are not simply paid to do nothing. They are paid to do dirty work.
mapontosevenths | 18 hours ago
Our_Benefactors | 21 hours ago
cmiles8 | 21 hours ago
If you ever needed evidence to not buy “advice” from such outfits, this is exhibit one.
Hopefully they at least fired the partner that published this steaming pile of AI slop.
ralph84 | 21 hours ago
elmomle | 21 hours ago
jimnotgym | 21 hours ago
I think their audit work is in a downwards spiral. Audit has become so competitive that they are struggling to find ways to make it cheaper. They have become slaves to reducing the hours booked, and the rate of those hours. To do this they substitute less experienced people all the time. You used to be able to chat with your partner about an issue you have coming up, now you get their assistant if you are lucky. By chasing 'efficiency' they have lost their value-add. Now the first time the partner has looked at your file is right before the clearance meeting, and they spot issues that should have been picked up earlier and tested on the day you should be signing. So you end up doing it all again. I'm trying to coin a term for the inneficiency caused by chasing efficiency.
busterarm | 21 hours ago
Some things stuck out at me: - They were all in their early 20s. - They were all incredibly checked out. Honestly they still seem like an outlier to me decades later. - They partied hard. Yes, with drugs. - Most of them were in rotating intimate relationships with each other and unusually open about it. Office scuttlebutt was literally "who is fucking who this week". - They seemed busy for maybe two or three weeks out of the entire year and then it was long stretches of Minesweeper/Solitaire.
I filed this away in my head as "provides no value" and that was decades ago. If the industry itself is worse off today I can't imagine how much worse it actually is from my experience.
jimnotgym | 18 hours ago
mrgoldenbrown | 21 hours ago
Penny wise, pound foolish? Measure twice cut once?
slater | 21 hours ago
"don't let the perfect be the enemy of the good" ?
bobnamob | 20 hours ago
BLKNSLVR | 16 hours ago
It's just that people are more aware / less trusting these days.
jimnotgym | 3 hours ago
galaxyLogic | 21 hours ago
I don't know but I would expect it to be realtively easy for an LLM to detect "hallucinations".
operatingthetan | 21 hours ago
I think this may be part of the problem. The actual humans creating the report don't have the expertise to know which one to trust. At least that was what consulting was like in my experience at a similar firm.
TZubiri | 21 hours ago
galaxyLogic | 20 hours ago
mindcrime | 21 hours ago
Yes, this technique and its variations[1][2] "work" but it's still not 100% perfect. And it's not as widely used it might be because, among other reason:
a. it takes longer to implement
b. it costs more (more tokens spread across multiple llm calls)
c. higher latency (getting an answer takes longer due to multiple llm calls involved)
d. the final answer is probabilistically more likely to be correct, but is still not guaranteed to be error free, so you can never fully escape the need for Human in the Loop.
[1]: https://en.wikipedia.org/wiki/LLM-as-a-Judge
[2]: https://github.com/karpathy/llm-council
s0ulf3re | 18 hours ago
s0ulf3re | 18 hours ago
KajMagnus | 11 hours ago
Then, it doesn't matter if you add 1000 frontier models -- they still can't generate a good report.
But yes I suppose you can get rid of hallucinated citations though
galaxyLogic | 21 hours ago
I don't know but I would expect it to be relatively easy for an LLM to detect "hallucinations".
gdulli | 20 hours ago
jonwinstanley | 21 hours ago
rao-v | 21 hours ago
Now nobody will remember or notice.
mentalgear | 21 hours ago
contingencies | 21 hours ago
Performative executives of yesteryear that constantly need external validation and direction and operate through hive mind and groupthink are weak and will die.
I believe some of the biggest problems in today's business leaders are an inability to be open to new information, to think across traditional professional boundaries, or to ask meaningful questions.
AI simply exposes this unapologetically.
Bad management (this includes most government): up your game or get out of the way.
Sycophantic consultant firms: die.
The Economist should do an article on this.
scotty79 | 21 hours ago
meibo | 21 hours ago
cwillu | 21 hours ago
_tk_ | 20 hours ago
addandsubtract | 19 hours ago
solomonxiexie | an hour ago
throwrioawfo | 21 hours ago
jiveturkey | 18 hours ago
As is noted in the conclusion.
le-mark | 21 hours ago
CuriousSkeptic | 7 hours ago
So a net positive then?
zb3 | 21 hours ago
zelphirkalt | 21 hours ago
But I guess since EY is a CYA hedge anyway, no one really cares about whether the reports are hallucinations or not. Someone high up spent money on EY, so that they can justify some decision and won't be held responsible that much, when it turns out the decision was shit. All that matters to them is, that it has the appearance of something genuine and then they can base the decision on what they receive from EY, which better be what they already wanted to hear/read anyway.
krapp | 20 hours ago
Even people who like their jobs work because they need money to live.
zelphirkalt | 19 hours ago
wg0 | 21 hours ago
~ A greedy, dishonest and unethical capitalist.
biosboiii | 20 hours ago
sourcecodeplz | 20 hours ago
rescripting | 20 hours ago
[OP] smartmic | 19 hours ago
atom058 | 3 hours ago
Unfortunately, they forgot to check the name beforehand, and it turned out that "EY!" already existed: a gay porn magazine.
Yes, this really happened.
0898 | 20 hours ago
I had no experience and knew absolutely zero about any of those sectors.
themafia | 20 hours ago
FearNotDaniel | 20 hours ago
themafia | 19 hours ago
henry2023 | 20 hours ago
FearNotDaniel | 20 hours ago
dragonfax | 19 hours ago
yieldcrv | 20 hours ago
okay that makes me feel better, I think January's frontier models and beyond are better at this
but check your sources folks
aneutron | 20 hours ago
s0rce | 20 hours ago
tipsytoad | 19 hours ago
tamimio | 19 hours ago
Any person with above average knowledge on a specific topic, can tell when AI starts hallucinating and making things up, or at least introducing new problems due to complexity added rather than solving it, that’s my observation using all top tier ones too, it’s like they are designed to solve a problem regardless so they start making things up or piling workarounds, a person with no deep knowledge in that topic will just copy it all and call it a day.
Just yesterday, I asked claude 4.8 on something specific that I know the answer for, it had a long list of solutions that none were close to the real answer, when I replied with the real answer and pushed back, I got the famous quote “you are right, thanks for pushing back”.
dwa3592 | 18 hours ago
jiveturkey | 18 hours ago
Not to take away from the actually great reporting here, but what they mean is, This approach allows them to milk it for as many clicks as possible.
motohagiography | 17 hours ago
Slop signalling may be the new power play. Nothing quite says "FU" like a low effort AI hallucinations.
bleepblap | 17 hours ago
BLKNSLVR | 16 hours ago
Why would anyone trust these large contractor companies enough to pay them the huge amounts of money to have juniors learning the ropes on their dime?
"Customers" were the content the juniors were trained on in the same way that scraped internet data is what LLMs are trained on, except Customers were paying for the privilege of being 'scraped'.
Now there are no juniors, just LLMs being asked questions that aren't specific enough, and assuming the answer is one-shot correct.
It saves E&Y lots of money though, and their (confusing) reputation will provide a surprising amount of momentum such that plenty of work will keep rolling in for a few years to come.
emeril | 16 hours ago
ChoGGi | 15 hours ago
Alifatisk | 2 hours ago
Did they just prompt ChatGPT with no web search and copy-pasted it?
solomonxiexie | an hour ago