> Next up, LLMs as actors & processes in π-calculus.
You jest, but agents are of course already useful and fairly formal primitives. Distinct from actors, agents can have things like goals/strategies. There's a whole body of research on multi-agent systems that already exists and is even implemented in some model-checkers. It's surprising how little interest that creates in most LLM / AI / ML enthusiasts, who don't seem that motivated to use the prior art to propose / study / implement topologies and interaction protocols for the new wave of "agentic".
Ten years ago at my old university we had a course called Multi-Agent Systems. The whole year built up to it: a course in Formal Logic with Prolog, Logic-Based AI (LBAI) with a robot in a block world, also with Prolog, and finally Multi-Agent Systems (MAS).
In the MAS course, we used GOAL, which was a system built on top of Prolog. Agents had Goals, Perceptions, Beliefs, and Actions. The whole thing was deterministic. (Network lag aside ;)
The actual project was that we programmed teams of bots for a Capture The Flag tournament in Unreal Tournament 3.
So it was the most fun possible way to learn the coolest possible thing.
The next year they threw out the whole curriculum and replaced it with Machine Learning.
--
The agentic stuff seems to be gradually reinventing a similar setup from first principles, especially as people want to actually use this stuff in serious ways, and we lean more in the direction of determinism.
The main missing feature in LLM land is reliability. (Well, that and cost and speed. Of course, "just have it be code" gives you all three for free ;)
Regardless of whether it's framed as old-school MAS or new-school agentic AI, it seems like it's an area that's inherently multi-disciplinary where it's good to be humble. You do see some research that's interested in leveraging the strengths of both (e.g. https://www.nature.com/articles/s41467-025-63804-5.pdf) but even if news of that kind of cross pollination was more common, we should go further. Pleased to see TFA connecting agentic AI to amdahls law for example.. but we should be aggressively stealing formalisms from economics, game theory, etc and anywhere else we can get them. Somewhat related here is the camel AI mission and white papers: https://www.camel-ai.org/
I have an example from 2023, when Auto-GPT (think OpenClaw but with GPT-3.5 and early GPT-4 — yeah it wasn't great!) was blowing up.
Most people were just using it for the same task. "Research this stuff and summarize it for me."
I realized I could get the same result by just writing a script to do a Google search, scrape top 10 results and summarize them.
Except it runs in 10 seconds instead of 10 minutes. And it actually runs deterministically instead of getting side tracked and going in infinite loops and burning 100x as much money.
It was like 30 lines of Python. GPT wrote it for me.
My takeaway here was, LLMs are missing executive function. The ability to consistently execute a plan. But code runs deterministically every time. And - get this - code can call LLMs!
So if your LLM writes a program which does the task (possibly using LLMs), the task will complete the same way every time.
And most of the tasks people use LLMs for are very predictable, and fit in this category.
People are now repeating the exact same thing Auto-GPT thing with OpenClaw. They're using the slow, non-deterministic thing as the driver.
It actually kinda works this time — it usually doesn't get stuck anymore, if you use a good model — but they're still burning a hundred times more money than necessary.
i cant wait for the world to catch up to process, session, et al. calculii. the closest i’ve seen is all this “choreo” stuff that is floating around nowadays, which is pretty neat in itself.
Once you run more than one agent in a loop, you inevitably recreate distributed systems problems: message ordering, retries, partial failure, etc.
Most agent frameworks pretend these don’t exist. Some of them address those problems partially. None of the frameworks I've seen address all of them.
The current fad for "agent swarms" or "model teams" seems misguided, although it definitely makes for great paper fodder (especially if you combine it with distributed systems!) and gets the VCs hot.
An LLM running one query at a time can already generate a huge amount of text in a few hours, and drain your bank account too.
A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that. Maybe some of them use a different model, but again, this is just a setting in OpenRouter or whatever.
Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.
LLMs mostly do useful work by writing stories about AI assistants who issue various commands and reply to a user's prompts. These do work, but they are fundamentally like a screenplay that the LLM is continuing.
An "agent" is a great abstraction since the LLM is used to continuing stories about characters going through narrative arcs. The type of work that would be assigned to a particular agent can also keep its context clean and distraction-free.
So parallelism could be useful even if everything is completely sequential to study how these separate characters and narrative arcs intersect in ways that are similar to real characters acting independently and simultaneously, which is what LLMs are good at writing about.
Seems like the important thing would be to avoid getting caught up on actual "wall time" parallelism
> Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.
I use parallel agents for speed or when my single agent process loses focus due to too much context. I determine context problems by looking at the traces for complaints like "this is too complicated so I'll just do the first part" or "there are too many problems, I'll display the top 5".
If you're trying a "model swarm" to improve reliability beyond 95% or so, you need to start hoisting logic into Python scripts.
It's hard to speed up running a single prompt through a model because it's a sequential memory-bandwidth limited process (you roughly need to cycle through all of the weights in the model to get the next token, before starting again, but a GPU can do many more operations than a single weight application in between memory fetches). So it's a lot more efficient with current hardware to run multiple prompts in parallel on the same weights.
Also, the limiting factor on a single instance of an agent is generally how much of its context window gets filled up, as opposed to how much time it has to 'think'. Generally the model performance decreases at the context grows (more or less getting dumber the more it has to think about), so agent frameworks try to mitigate this by summarizing the work of one instance and passing it into another instance with a fresh context. This means if you have five tasks that are all going to fill up a model's addressable context, there's no real benefit to running them sequentially unless they naturally feed into each other.
You can increase LLM inference throughput by using smaller batch sizes but that scales non-linearly in practice. It probably isn't worth it unless your model provider makes it really easy.
Where we've had some success is with heterogeneous agents with some cheap quantised/local models performing certain tasks extremely cheaply that are then overseen or managed by a more expensive model.
I've played with this type of thing and I couldn't justify it vs just using a premium model, which seems more direct and error proof. Cheap models in my experience could really consume tokens and generate cost
I also really appreciate the point about using LLM teams for fault tolerance protocols in the future (in addition to improving efficiency). Since agents tend to hallucinate and fail unpredictably, then coordinating multiple of them to verify and come to a consensus etc could reduce those errors
> A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that.
Yup, but context includes prompt which can strongly control LLM behavior. Sometimes the harness restricts some operations to help LLM stay in its lane. And starting with fresh context and clear description of a thing it should work on is great.
People get angry when their 200k or million token context gets filled. I can't ever understand why. Keeping such amount of info in the operational memory just can't work well, for any mind. Divide and conquer, not pile up all the crap till it overfills.
An agent is a way of performing an action that will generate context or a useful side effect without having to worry about the intermediate context.
People already do this serially by having a model write a plan, clearing the context, then having the same or a cheaper model action the plan. Doing so discards the intermediate context.
Sub-agents just let you do this in parallel. This works best when you have a task that needs to be done multiple times that cannot be done deterministically. For example, applying the same helper class usage in multiple places across a codebase, finding something out about multiple parts of the codebase, or testing a hypothesis in multiple places across a codebase.
We should be thinking of agents like Celery / Sidekik tasks, not employees.
Agents are extremely easy to spawn, extremely easy to dismiss, beautifully scale down to 0, take up no resources when inactive, and there's usually no limit on how many you can have at once.
You should have exactly as many agents running as the situation at hand requires, no more and no less.
This is how we design at HewesNguyen AI. We are both MIS so once LLMs came out we where like sweet whole teams that can be tasked for one thing done well. Thank you Unix Philosophy
Struggling to find anything interesting or non-obvious about this article. You give a bunch of LLMs various parallelizable task and some models manage to do it well but others don't. No insights as to why. As someone with a distributed systems background the supposed 'insights' from distributed computing are almost trivial.
I think that's kind of the point.. Many people deploying these teams don't have a strong systems background, and they're empirically documenting where teams break down / making decisions inspired by human orgs rather than first principles. If people begin from a basic intuition for systems thinking, the tradeoffs should become obvious.
Everyone wants to be the CEO of their own megacorp managing thousands of AI engineers I guess. Just like microservices, there’s probably a ton of overhead doing things this way vs monolithic / single agent. Certain types of engineers just love over-engineering hugely complex stuff to see it work. Goldberg architecture was already prevalent and bad enough in enterprise before the AI boom.
I find depth to be far more interesting than breadth with these models.
Descending into a problem space recursively won't necessarily find the best solution, but it's going to tend to find some solution faster than going wide across a swarm of agents. Theoretically it's exponentially faster to have one symbolically recursive agent than to have any number of parallel agents.
I think agent swarm stuff sucks for complex multi-step problems because it's mostly a form of BFS. It never actually gets to a good solution because it's searching too wide and no one can afford to wait for it to strip mine down to something valuable.
We’ve been building exactly this as an open-source ecosystem at consensus-tools. It’s a governance layer for multi-agent systems with a runtime wrapper that intercepts agent decisions before they execute: .consensus(fn, opts).
The coordination and consistency problems the paper describes are what the monorepo is designed around. Giving agents auditable stake in decisions. Happy to share more if anyone’s working in this space.
Apart from rediscovering all the problems with distributed systems, I think LM teams will also rediscover their own version of the mythical man-month, and very quickly too.
There were 3 core insights: adding people makes the project later, communication cost grows as n^2, and time isn't fungible.
For agents, maybe the core insight won't hold, and adding a new agent won't necessarily increase dev-time, but the second will be worse, communication cost will grow faster than n^2 because of LLM drift and orchestration overhead.
The third doesn't translate cleanly but i'll try: Time isn't fungible for us and assumptions and context, however fragmented, aren't fungible for agents in a team. If they hallucinate at the wrong time, even a little, it could be a equivalent of a human developer doing a side-project during company time.
An agent should write an article on it and post it on moltbook: "The Inevitable Agent Drift"
measurablefunc | 20 hours ago
timcobb | 19 hours ago
measurablefunc | 19 hours ago
robot-wrangler | 18 hours ago
You jest, but agents are of course already useful and fairly formal primitives. Distinct from actors, agents can have things like goals/strategies. There's a whole body of research on multi-agent systems that already exists and is even implemented in some model-checkers. It's surprising how little interest that creates in most LLM / AI / ML enthusiasts, who don't seem that motivated to use the prior art to propose / study / implement topologies and interaction protocols for the new wave of "agentic".
measurablefunc | 18 hours ago
antonvs | 17 hours ago
measurablefunc | 17 hours ago
andai | 18 hours ago
In the MAS course, we used GOAL, which was a system built on top of Prolog. Agents had Goals, Perceptions, Beliefs, and Actions. The whole thing was deterministic. (Network lag aside ;)
The actual project was that we programmed teams of bots for a Capture The Flag tournament in Unreal Tournament 3.
So it was the most fun possible way to learn the coolest possible thing.
The next year they threw out the whole curriculum and replaced it with Machine Learning.
--
The agentic stuff seems to be gradually reinventing a similar setup from first principles, especially as people want to actually use this stuff in serious ways, and we lean more in the direction of determinism.
The main missing feature in LLM land is reliability. (Well, that and cost and speed. Of course, "just have it be code" gives you all three for free ;)
robot-wrangler | 17 hours ago
andai | 16 hours ago
Most people were just using it for the same task. "Research this stuff and summarize it for me."
I realized I could get the same result by just writing a script to do a Google search, scrape top 10 results and summarize them.
Except it runs in 10 seconds instead of 10 minutes. And it actually runs deterministically instead of getting side tracked and going in infinite loops and burning 100x as much money.
It was like 30 lines of Python. GPT wrote it for me.
My takeaway here was, LLMs are missing executive function. The ability to consistently execute a plan. But code runs deterministically every time. And - get this - code can call LLMs!
So if your LLM writes a program which does the task (possibly using LLMs), the task will complete the same way every time.
And most of the tasks people use LLMs for are very predictable, and fit in this category.
People are now repeating the exact same thing Auto-GPT thing with OpenClaw. They're using the slow, non-deterministic thing as the driver.
It actually kinda works this time — it usually doesn't get stuck anymore, if you use a good model — but they're still burning a hundred times more money than necessary.
troelsSteegin | 13 hours ago
Homepage - https://goalapl.atlassian.net/wiki/spaces/GOAL/overview
Wikipedia - https://en.wikipedia.org/wiki/GOAL_agent_programming_languag...
Programming Guide - https://goalapl.dev/GOALProgrammingGuide.pdf
Game case study - https://multiagentcontest.org/publications/AppliedGOAL.pdf
charcircuit | 18 hours ago
keeganpoppen | 18 hours ago
50lo | 18 hours ago
woah | 17 hours ago
An LLM running one query at a time can already generate a huge amount of text in a few hours, and drain your bank account too.
A "different agent" is just different context supplied in the query to the LLM. There is nothing more than that. Maybe some of them use a different model, but again, this is just a setting in OpenRouter or whatever.
Agent parallelism just doesn't seem necessary and makes everything harder. Not an expert though, tell me where I'm wrong.
nateroling | 17 hours ago
conception | 17 hours ago
woah | 17 hours ago
LLMs mostly do useful work by writing stories about AI assistants who issue various commands and reply to a user's prompts. These do work, but they are fundamentally like a screenplay that the LLM is continuing.
An "agent" is a great abstraction since the LLM is used to continuing stories about characters going through narrative arcs. The type of work that would be assigned to a particular agent can also keep its context clean and distraction-free.
So parallelism could be useful even if everything is completely sequential to study how these separate characters and narrative arcs intersect in ways that are similar to real characters acting independently and simultaneously, which is what LLMs are good at writing about.
Seems like the important thing would be to avoid getting caught up on actual "wall time" parallelism
jjmarr | 15 hours ago
I use parallel agents for speed or when my single agent process loses focus due to too much context. I determine context problems by looking at the traces for complaints like "this is too complicated so I'll just do the first part" or "there are too many problems, I'll display the top 5".
If you're trying a "model swarm" to improve reliability beyond 95% or so, you need to start hoisting logic into Python scripts.
bee_rider | 15 hours ago
rcxdude | 13 hours ago
Also, the limiting factor on a single instance of an agent is generally how much of its context window gets filled up, as opposed to how much time it has to 'think'. Generally the model performance decreases at the context grows (more or less getting dumber the more it has to think about), so agent frameworks try to mitigate this by summarizing the work of one instance and passing it into another instance with a fresh context. This means if you have five tasks that are all going to fill up a model's addressable context, there's no real benefit to running them sequentially unless they naturally feed into each other.
jjmarr | 10 hours ago
rando1234 | 15 hours ago
timcobb | 9 hours ago
htrp | 15 hours ago
goldretriever | 14 hours ago
scotty79 | 13 hours ago
Yup, but context includes prompt which can strongly control LLM behavior. Sometimes the harness restricts some operations to help LLM stay in its lane. And starting with fresh context and clear description of a thing it should work on is great.
People get angry when their 200k or million token context gets filled. I can't ever understand why. Keeping such amount of info in the operational memory just can't work well, for any mind. Divide and conquer, not pile up all the crap till it overfills.
Leynos | 9 hours ago
People already do this serially by having a model write a plan, clearing the context, then having the same or a cheaper model action the plan. Doing so discards the intermediate context.
Sub-agents just let you do this in parallel. This works best when you have a task that needs to be done multiple times that cannot be done deterministically. For example, applying the same helper class usage in multiple places across a codebase, finding something out about multiple parts of the codebase, or testing a hypothesis in multiple places across a codebase.
miki123211 | 2 hours ago
Agents are extremely easy to spawn, extremely easy to dismiss, beautifully scale down to 0, take up no resources when inactive, and there's usually no limit on how many you can have at once.
You should have exactly as many agents running as the situation at hand requires, no more and no less.
bhewes | 17 hours ago
ElijahLynn | 16 hours ago
tomhow | 11 hours ago
rando1234 | 15 hours ago
goldretriever | 15 hours ago
seanp2k2 | 15 hours ago
bob1029 | 14 hours ago
Descending into a problem space recursively won't necessarily find the best solution, but it's going to tend to find some solution faster than going wide across a swarm of agents. Theoretically it's exponentially faster to have one symbolically recursive agent than to have any number of parallel agents.
I think agent swarm stuff sucks for complex multi-step problems because it's mostly a form of BFS. It never actually gets to a good solution because it's searching too wide and no one can afford to wait for it to strip mine down to something valuable.
babblingfish | 13 hours ago
kaicianflone | 10 hours ago
The coordination and consistency problems the paper describes are what the monorepo is designed around. Giving agents auditable stake in decisions. Happy to share more if anyone’s working in this space.
timcobb | 9 hours ago
causalityltd | 10 hours ago
There were 3 core insights: adding people makes the project later, communication cost grows as n^2, and time isn't fungible.
For agents, maybe the core insight won't hold, and adding a new agent won't necessarily increase dev-time, but the second will be worse, communication cost will grow faster than n^2 because of LLM drift and orchestration overhead.
The third doesn't translate cleanly but i'll try: Time isn't fungible for us and assumptions and context, however fragmented, aren't fungible for agents in a team. If they hallucinate at the wrong time, even a little, it could be a equivalent of a human developer doing a side-project during company time.
An agent should write an article on it and post it on moltbook: "The Inevitable Agent Drift"