That's why LLM will eventually be used only for initial interaction between the user in their language, to prepare the data to a specialized model.
Imagine face recognition to work like a text chat, where the PC gets the frame from the camera and writes in the chat: "Who's that? Here's the RGB888 image in hex: ...".
Huh? The images are tokenized in the same way language is and it’s just fed into one single model. Not multiple smaller expert models.
Image gets rasterized into smaller pieces (eg 4x4 pixels) and each of those is assigned a token, similarly how text is broken up into tokens. And the whole thing is fed into a single model.
> Imagine face recognition to work like a text chat, where the PC gets the frame from the camera and writes in the chat: "Who's that? Here's the RGB888 image in hex: ...".
The experts in MoEs aren't specialized in any meaningful task sense. From level of what we would think as tasks MoEs are selected essentially arbitrarily per token and per block.
It’s unsupervised, yes, but “unspecialized in any meaningful task sense” is incorrect, that’s the whole point. It’s just not in the sense of “this is a legal expert, this is a software developer”.
Wouldn't this be faster with an agent skill that has code?
/skill-creator [or /create-skill] Write an agent skill
with code script(s) that use an existing user space IP library that works with your agent runtime, to [...]
You could read about that in 1992 "A Fire Upon the Deep" by Vernor Vinge. There is prompt injection in communication, in the book certain protocols for information communication can not be deterministic so if someone is too smart you get hacked.
Now do the equivalent of just in time compilation. Claude sees that we need to respond to a lot of pings and writes a program to compute it instead of thinking about each one.
Oh, they are. It's just that the harness around it is able to pick up the commands it "autocompletes" and runs them for you. LLM can't run anything, it never could.
If you wonder why your Copilot subscription has new limits that you hit every few days, it's because of PhDs like Adam.
Could Adam use a local model hosted on his own box? Probably yes. But he preferred to waste the service we all use just to produce a weak blog post that introduces absolutely no knowledge and serves no other purpose than to tell everyone that the author likes to waste resources and calls it "fun".
> Ridiculous? Yes. Wasteful of tokens? Sure. Fun? Oh yeah!
Do you really think it's fun to be one of these people who are the reason why the rest of us gets more limits?
ValdikSS | 8 hours ago
Imagine face recognition to work like a text chat, where the PC gets the frame from the camera and writes in the chat: "Who's that? Here's the RGB888 image in hex: ...".
FeepingCreature | 4 hours ago
stingraycharles | 4 hours ago
Image gets rasterized into smaller pieces (eg 4x4 pixels) and each of those is assigned a token, similarly how text is broken up into tokens. And the whole thing is fed into a single model.
FeepingCreature | 3 hours ago
> Imagine face recognition to work like a text chat, where the PC gets the frame from the camera and writes in the chat: "Who's that? Here's the RGB888 image in hex: ...".
that's p much how it works.
stingraycharles | 2 hours ago
Dylan16807 | 55 minutes ago
wongarsu | an hour ago
Vision language models are an incredible achievement in the generality and usability. But they pay a hefty price in fidelity and speed
stingraycharles | 4 hours ago
jampekka | 4 hours ago
stingraycharles | 4 hours ago
brcmthrowaway | 8 hours ago
westurner | 8 hours ago
/skill-creator [or /create-skill] Write an agent skill with code script(s) that use an existing user space IP library that works with your agent runtime, to [...]
ComposioHQ/awesome-claude-skills: https://github.com/ComposioHQ/awesome-claude-skills
anthopics/skills//skill-creator/SKILL.md: https://github.com/anthropics/skills/blob/main/skills/skill-...
/.agents/skills/skill-name/SKILL.md, scripts/{script_name.py,__init__.py}
https://agentskills.io/what-are-skills
trollbridge | 7 hours ago
Even faster would just to be use code in the first place!
jeremyjh | 6 hours ago
codezero | 6 hours ago
vrighter | 5 hours ago
pastage | 4 hours ago
lionkor | 2 hours ago
fouc | 5 hours ago
twoodfin | 4 hours ago
1,000 pings, how many correctly ponged?
ShinyLeftPad | 3 hours ago
bot403 | 3 hours ago
mintflow | an hour ago
fl7305 | 53 minutes ago
ForHackernews | an hour ago
I think this author and I have different definitions of fun.
fl7305 | 52 minutes ago
Because this seems to disprove that claim pretty convincingly?
mystifyingpoi | 19 minutes ago
self_awareness | 24 minutes ago
Could Adam use a local model hosted on his own box? Probably yes. But he preferred to waste the service we all use just to produce a weak blog post that introduces absolutely no knowledge and serves no other purpose than to tell everyone that the author likes to waste resources and calls it "fun".
> Ridiculous? Yes. Wasteful of tokens? Sure. Fun? Oh yeah!
Do you really think it's fun to be one of these people who are the reason why the rest of us gets more limits?
mystifyingpoi | 21 minutes ago