> Why is it insecure, well, Pi tells you >No permission popups.
Pi supports permission popups, but doesn't use them by default. Their example extensions show how to do it (add an event listener for `tool_call` events; to block the call put `block: true` in its result).
> there's no secure way to know whether what they're asking to is what they'll actually do
What do you mean? `tool_call` event listeners are given the parameters of the tool call; so e.g. a call to the `bash` tool will show the exact command that will execute (unless we block it, of course).
Has anyone used an open coding agent in headless mode? I have a system cobbled together with exceptions going to a centralized system where I can then have each one pulled out and `claude -p`'d but I'd rather just integrate an open coding agent into the loop because it's less janky and then I'll have it try to fix the problem and propose a PR for me to review. If anyone else has used pi.dev or opencode or aider in this mode (completely non-interactive until the PR) I'd be curious to hear.
EDIT: Thank you to both responders. I'll just try the two options out then.
pi has an RPC mode which just sends/receives JSON lines over stdio (including progress updates, and "UI" things like asking for confirmation, if it's configured for that).
That's how the pi-coding-agent Emacs package interacts with pi; and it's how I write automated tests for my own pi extensions (along with a dummy LLM that emits canned responses).
Aider's `--yes` flag combined with a git-based loop works honestly better than I expected for this, like it'll just commit and you review the diff.
Pi I've tried headless and it's fine but you kinda have to wire up the exit conditions yourself since it's so minimal by design.
Fwiw the janky `claude -p` approach you described is actually pretty solid once you stop fighting it, the simplicity is the feature I think.
I've been using Pi day to day recently for simple, smaller tasks. It's a great harness for use with smaller parameter size models given the system prompt is quite a bit shorter vs Claude or Codex (and it uses a nice small set of tools by default).
For local models I've been trying it with GLM-4.7-Flash and the new LFM2 24B model. I'm excited to try it with the new Qwen3.5 models that came out today as well.
That's interesting; I've found Pi really shines for rapid prototyping. Balancing minimalism and functionality is tricky, but it sounds like they're nailing it with these constraints.
My current fave harness. I've been using it to great effect, since it is self-extensible, and added support for it to https://github.com/rcarmo/vibes because it is so much faster than ACP.
The better web UI is now part of https://github.com/rcarmo/piclaw (which is essentially the same, but with more polish and a claw-like memory system). So you can pick if you want TS or Python as the back-end :)
The claw version’s web UI essentially has better thinking output, more visibility of tool calls, and slightly better SSE streaming. I’ve backported some of it to vibes, but if you want to borrow UI stuff, the better bits are in piclaw. I use both constantly on my phone/desktop.
No, literally. Mistral, Gemini, opencode, everything supported by Toad, etc. I’ve tried them all. I just don’t like using either Claude Code or Codex, so I didn’t add them to agentbox and stuck with Copilot because it gives me both OpenAI and Anthropic models.
Ok, maybe we need to establish what "literally" means before we try to figure out "all of them" it seems...
I was curious about your project, but the sloppy usage of even the most basic terms kind of makes me not to want to dive deeper, how could I even trust it does what it says on the tin, if apparently we don't even have a shared vocabulary?
I think the thesis of Pi is that there isn't much special about agents.
Model + prompt + function calls.
There are many such wrappers, and they differ largely on UI deployment/integration. Harness feels like a decent term, though "coding harness" feels a bit vague.
Can you shed some light on the speed difference of the direct integration vs. ACP?
I’m still looking for a generic agent interaction protocol (to make it worth building around) and thought ACP might be it. But (and this is from a cursory look) it seems that even OpenCode, which does support ACP, doesn’t use it for its own UI. So what’s wrong with it and are there better options to hopefully take its place?
Yeah, ACP adds another layer of marshaling/unmarshaling (or two-one on each side) and can be slower than API calls on occasion. Like MCP, it adds JSON overhead that doesn’t really need to be there.
The best option will always be in-memory exchanges. Right now I am still using the pi RPC, and that also involves a bit of conversion, but it’s much lighter.
I've used ACP extensively because agent-shell in emacs uses it, although the Anthropic license change means I'm not sure if I can continue to use Claude through it without getting banned. I kind of wish it integrated more tightly but also you can't really expect someone to have magit involved such that agent-shell (or the like) starts interacting with emacs directly. I'd love it if it did though.
I've started using OpenCode for some things in a big window because its side-by-side diff is great.
This looks great but It feels really risky to add more and more tools to the harness from random repos. Nothing against this repo in particular but I wish we had better security and isolation so I that I knew nothing could go wrong and I could just test a bunch of these every day the same way I can install an app on my phone and feel confident it's not going to steal my data.
I test a bunch of these every day too, so I made a local sandbox to jail all TUI clunkers to $CWD and run all of them in —-yolo mode https://agent-safehouse.dev/
I feel like this misses the point of pi somewhat. The allure of pi is that it allows you to start from scratch and make it entirely your own; that it’s lightweight and uses only what you need. I go through the list of features in this and I think, okay, cool, but why should I use this over OpenCode if I just want a feature-packed (and honestly -bloated) ready-made harness?
A harness is a collection of stubs and drivers configured to assist with automation or testing. It's a standard term often used in QA as they've been automating things for ages before Gen Ai came on to the scene.
Yes. It seems to be the term that stands out the most, as terms like "AI coding assistant", "agentic coding framework", etc. are too vague to really differentiate these tools.
"harness" fits pretty nicely IMO. It can be used as a single word, and it's not too semantically overloaded to be useful in this context.
Honestly, i'm not interested in this if it can't use my subscription, but now i really want to understand this idea of coding harness. I've been exploring ideas that might be quite similar, though more inline with the scope of IDE, and it sounds like "coding harness" fits my mental model better.
I'm not interested in coding myself as I like to write code still, I'm interested in that idea of task delegation eg. "research about this topic" or "do this". Having a bunch of agents doing things, that could be cool.
For me I'm looking to stick with Python so will whip something up with Tkinter later for the desktop GUI aspect although I still like Electron/JS primarily.
I do this with an extension. I run all bash tools with bwrap and ACLs for the write and edit tools. Serves my purposes. Opens up access to other required directories, at least for git and rust.
I think I published it. Check the pi package page.
Yeah I wrote a small landlock wrapper using go-landlock to sandbox pi that works well (not public, similar projects are landrun and nono).
Note that if you sandbox to literally just the working directly, pi itself wont run since pretty much every linux application needs to be able to read from /usr and /etc
I got pi to write me a very basic sandbox based on an example from the pi github. Added hooks for read/write/edit/bash, some prompts to temp/perm override. Have a look, copy-paste what you like.
The way you’re able to extend the harness through extension/hook architecture is really cool.
Eg some form of comprehensive planning/spec workflow is best modeled as an extension vs natively built in. And the extension still ends up feeling “native” in use
Pi ships with powerful defaults but skips features like sub-agents and plan mode
Does anyone have an idea as to why this would be a feature? don't you want to have a discussion with your agent to iron out the details before moving onto the implementation (build) phase?
In any case, looks cool :)
EDIT 1: Formatting
EDIT 2: Thanks everyone for your input. I was not aware of the extensibility model that pi had in mind or that you can also iterate your plan on a PLAN.md file. Very interesting approach. I'll have a look and give it a go.
Agreed. I rarely find the guardrails of plan to be necessary; I basically never use it on opencode. I have some custom commands I use to ask for plan making, discussion.
As for subagents, Pi has sessions. And it has a full session tree & forking. This is one of my favorite things, in all harnesses: build the thing with half the context, then keep using that as a checkpoint, doing new work, from that same branch point. It means still having a very usable lengthy context window but having good fundamental project knowledge loaded.
yes! I just don't understand that as well. Up until some time ago claud code's preferred install was a npm i, wasn't it? Please serious answers for why anyone would use a web language for a terminal app
It’s straightforward: JavaScript is a dynamic language, which allows code (for instance, code implementing an extension to the harness) to be executed and loaded while the harness is running.
This is quite nice — I do think there’s a version of pi’s design choices which could live in a static harness, but fully covering the same capabilities as pi without a dynamic language would be difficult. (You could imagine specifying a programmable UI, etc — various ways to extend the behavior of the system, and you’d like end up with an interpreter in the harness)
At least, you’d like to have a way to hot reload code (Elixir / Erlang could be interesting)
Sure, but why implement a novel language with said feature if your concern is a harness ... not on implementing a brand new language with this feature?
I'm super on board the rust train right now & super loving it. But no, code hot loading is not common.
Most code in the world is dead code. Most languages are for dead code. It's sad. Stop writing dead code (2022) was no where near the first, is decades and decades late in calling this out, but still a good one. https://jackrusher.com/strange-loop-2022/
I built my own harness on Elixir/Erlang[0]. It's very nice, but I see why TypeScript is a popular choice.
No serialization/JSON-RPC layer between a TS CLI and Elixir server. TS TUI libraries utilities are really nice (I rewrote the Elixir-based CLI prototype as it was slowing me down). Easy to extend with custom tools without having to write them in Elixir, which can be intimidating.
But you're right that Erlang's computing vision lends itself super well to this problem space.
Fwiw @dicklesworthstone / jeff Emanuel is definitely my favorite dragon rider right now, doing the most with AI, to the most effect.
Their agent mail was great & very early in agent orchestration. Code agent search is amazing & will tell you what's happening in every harness. Their Franktui is a ridiculously good rust tui. They have project after project after project after project and they are all so good.
This matters less and less in the new world. that fact that a fully compatible 10x faster clone came up, and is continuously working and adapting/improving, tells you that this is hugely valuable. It has users and it's thriving.
Caring about taste in coding is past now. It's sad :( but also something to accept.
Yeah, I tried to use this clone of pi for a while and its very, very broken.
First of all it wouldn't build, I have to mess around with git sub-modules to get it building.
Then trying to use it. First of all the scrolling behavior is broken. You cannot scroll properly when there are lots of tool outputs, the window freezes. I also ended up with lots of weird UI bugs when trying to use slash commands. Sometimes they stop the window scrolling, sometimes the slash commands don't even show at all.
The general text output is flaky, how it shows results of tools, the formatting, the colors, whether it auto-scrolls or gets stuck is all very weird and broken.
You can easily force it into a broken state by just running lots of tool calls, then the UI just freezes up.
If you look at that code it’s possibly the worst rust code I’ve seen in my life. There are several files with 5000 to 10000 lines of code in a single file.
It looks 100% vibe coded by someone who’s a complete neophyte.
This looked interesting because I prefer rust over npm.
The first issue I had was to figure out the schema of the models.json, as someone who hadn't used the original pi before. Then I noticed the documented `/skill:` command doesn't exist. That's also hard to see because the slash menu is rendered off screen if the prompt is at the bottom of the terminal. And when I see it, the selected menu items always jumps back to the first line, but looks like he fixed that yesterday.
The tool output appears to mangle the transcript, and I can't even see the exact command it ran, only the output of the command. The README is overwhelmingly long and I don't understand what's important for me as a first time user and what isn't. Benchmarks and code internals aren't too terribly relevant to me at this point.
I looked at the original pi next and realized the config schema is subtly different (snake_case instead of camelCase). Since it was advertised as a port, I expected it to be a drop-in replacement, which is clearly not the case.
All in all it doesn't inspire confidence. Unfortunate.
I am building an entire GPT model framework from the ground up in Typescript + small amounts of c bindings for gpu stuff. https://github.com/thomasdavis/alpha2 (using claude)
Don't hate me aha and no, there is no reason other than I can
This confused me about openclaw for quite some time. The whole lobster/crustacean theme is just firmly associated with rust in my head. Guess it's just a claude/claw wordplay.
Pi makes GPT-5.3-Codex act about on par with Claude easily.
There's something in the default Codex harness that makes it fight with both arms behind its back, maybe the sandboxing is overly paranoid or something.
With Pi I can one-shot many features faster and more accurately than with Codex-cli.
I’m working with a friend to build an ui around Pi to make it more user friendly for people who prefer to work with a gui (ala conductor). You can check out the repo: https://github.com/philipp-spiess/modern
Written by a person who is infamously annoying open source maintainers with AI slop PRs (see the DWARF debacle in OCaml) … and missing much of pi’s philosophy
I haven’t met a single person who has tried pi for a few days and not made it their daily driver. Once you taste the freedom of being able to set up your tool exactly how you like, there’s really no going back.
Not the person you replied to, but I'll stress the point that it is not just what you can add that Claude Code doesn't offer, but also what you don't need to add that Claude Code does offer that you don't want.
I dislike many things about Claude Code, but I'll pick subagents as one example. Don't want to use them? Tough luck. (AFAIK, it's been a while since I used CC, maybe it is configurable now or was always and I never discovered that.)
With Pi, I just didn't install an extension for that, which I suspect exists, but I have a choice of never finding out.
IME CLAUDE.md rarely gets fully honored. I've left HN comments before about how I had to convert some CLAUDE.md instructions to pre-commit deterministic checks due to how often they were ignored. My guesstimate is that it is about 70 % reliable. That's with Opus 4.5. I've since switched to GPT-5.2 and now GPT-5.3 Codex and use Codex CLI, Pi and OpenCode, not CC, so maybe things have changed with a new system prompt or with the introduction of Opus 4.6.
> I haven’t met a single person who has tried pi for a few days and not made it their daily driver.
Pleased to meet you!
For me, it just didn’t compare in quality with Claude CLI and OpenCode. It didn’t finish the job. Interesting for extending, certainly, but not where my productivity gains lie.
I've spent way too long working around the jank and extra features in Other People's Software.
Now I can just make my own that does exactly what I want and need, nothing more and nothing less. It's just for me, it's not a SaaS or a "start-up" I'm the CEO of.
I came here to say the same thing. It's basically _is_ Emacs. Heavily configurable tool, text-focused UI, primary interaction with a minibuffer ..er.. box to prompt at the bottom of the screen, package distribution mechanism, etc etc.
With Emacs modes like agent-shell.el available and growing, why not invest in learning a tool that is likely to survive and have mindshare beyond the next few months?
If you ever want to use other models, pi can do that. In the middle of a session I might switch from gpt-5.2 to opus and get it to do something or review something and then switch back to gpt. Since models are being released every few weeks this is interesting to compare models without having to switch to a different harness.
And if there’s any feature codex has that you want, just have pi run codex in a tmux session and interrogate it how said feature works, and recreate it in pi.
Pi was probably the best ad for Claude Code I ever saw.
After my max sub expired I decided to try Kimi on a more open harness, and it ended up being one of the worst (and eye opening experiences) I had with the agentic world so far.
It was completely alienating and so much 'not for me', that afterwards I went back and immediately renewed my claude sub.
> I would say that the project actively expects you to be downloading them to fill any missing gaps you might have.
Where did you get this perspective from?
> I thought pi and its tools were supposed to be minimal and extensible. So why is a subagent extension bundling six agents I never asked for that I can’t disable or remove?
Why do you think a random subagents extension is under the same philosophy as pi?
Your blog post says little about pi proper, it's essentially concerned with issues you had with the ecosystem of extensions, often made by random people who either do or do not get the philosophy? Why would that be up to pi to enforce?
> if I start the agent in ./folder then anything outside of ./folder should be off limits unless I explicitly allow it, and the same goes for bash where everything not on an allowlist should be blocked by default.
Here's the problem with Claude Code: it acts like it's got security, but it's the equivalent of a "do not walk on grass" sign. There's no technical restrictions at play, and the agent can (maliciously or accidentally) bypass the "restrictions".
That's why Pi doesn't have restrictions by default. The logic is: no matter what agent you are using, you should be using it in a real sandbox (container, VM, whatever).
But the agent has to interact with the world; fetch docs, push code, fetch comments, etc. You can't sandbox everything. So you push that configuration to your sandbox, which is a worse UX that the harness just asking you at the right time what you'd like to do.
Well, you are imagining a worse UX, but it doesn't have to be. Pi doesn't include a sandboxing story at all (Claude provides an advisory but not mandatory one), but the sandbox doesn't have to be a simple static list of allowed domains/files. It's totally valid to make the "push code" tool in the sandbox send a trigger to code running outside of the sandbox, which then surfaces an interactive prompt to you as a user. That would give you the interactivity you want and be secure against accidentally or deliberately bypassing the sandbox.
So you have to set up that integration instead of letting the agent do it. I suppose the sandbox is more configurable, but do you need that? I thought the draw of pi was that you didn't do all that and let it fly, wheeee!
edit: You're not making it sound easy at all. I don't have to build anything with the other agents.
Certainly not. Pi is "minimalist", so the draw is that it's "easy" to set it up yourself. You can not do that and run it in yolo mode, and you can do that with Claude Code too. Heck you can even use this hypothetical real-sandbox-with-interactive-prompts with Claude Code instead, once you build it.
Back to my original point: Claude Code gives you a false feeling of security, Pi gives you the accurate feeling of not having security.
I too would like to know what a good UX looks like here but I have doubts that the permission prompts of Claude are the way to go right now.
Within days people become used to just hitting accept and allowlisting pretty much everything. The agents write length scripts into shell scripts or test runners that themselves can be destructive but they immediately allowlisted.
I had a very similar experience. I have different preferences, but ultimately, my takeaway was that if I want to follow my own version of their philosophy, I should just create my own thing.
In the meantime, the codex/cc defaults are better for me.
> As it turns out, the opinions in question are that bash should be enabled by default with no restrictions, that the agent should have access to every file on your machine from the start, and that npm is the only package manager worth supporting.
Yep. This is why I've been going "Hell, no!" and will probably keep doing so.
Technically you're not allowed to use Claude subscription account with Pi (according to Anthropic's policy). So yeah, Pi is the best anti-ad against Anthropic.
Interesting approach to planning via extensions. I took a similar direction with enforcement. A governance loop that hooks into the agent's tool calls and blocks
execution until protocol is followed. Every 10 actions (configurable), the agent re-centers. No permission popups, but the agent literally can't skip steps.
I've been using pi via the pi-coding-agent Emacs package, which uses its RPC mode to populate a pair of Markdown buffers (one for input, one for chat), which I find much nicer than the awful TUIs used by harnesses like gemini-cli (Emacs works perfectly well as a TUI too!).
The extensibility is really nice. It was easy to get it using my preferred issue tracker; and I've recently overridden the built-in `read` and `write` commands to use Emacs buffers instead. I'd like to override `edit` next, but haven't figured out an approach that would play to the strengths of LLMs (i.e. not matching exact text) and Emacs (maybe using tree-sitter queries for matches?). I also gave it a general-purpose `emacs_eval`, which it has used to browse documentation with EWW.
Nice! I'm curious to hear how you're mapping `read` and `write` to Emacs buffers. Does that mean those commands open those files in Emacs and read and write them there?
Let me also drop a link to the Pi Emacs mode here for anyone who wants to check it out: https://github.com/dnouri/pi-coding-agent -- or use: M-x package-install pi-coding-agent
We've been building some fun integrations in there like having RET on the output of `read`, `write`, `edit` tool calls open the corresponding file and location at point in an Emacs buffer. Parity with Pi's fantastic session and tree browsing is hopefully landing soon, too. Also: Magit :-)
The implementation is pretty terrible: a giant string of vibe-coded Emacs Lisp is sent to emacsclient, which performs the actions and sends back a string of JSON.
It's been interesting to iterate on the approach: watching the LLM (in my case Claude) attempting to use the tools; noticing when it struggles or makes incorrect assumptions; and updating the tool, documentation and defaults to better match those expectations.
It feels similar to the file-watching provided by Aider (which uses inotify to spot files containing `# AI!` or `# AI?`), which I've previously used with FIXME and TODO comments in code; but it also works well in non-file things, e.g. error messages and test failures in `shell-mode`, and issues listed in the Emacs UI I wrote for the Artemis bug tracker (Claude just gets the issue number from the current line, and plugs that into a Pi extension I made for Artemis :-) )
Qwen3 Coder Next in llama.cpp on my own machine. I'm an AI hater, but I need to experiment with it occasionally, I'm not going to pay someone rent for something they trained on my own GitHub, Stack overflow, and Reddit posts.
MiniMax has an incredibly affordable coding plan for $10/month. It has a rolling five hour limit of 100 prompts. 100 prompts doesn't sound like much, but in typical AI company accounting fashion, 1 prompt is not really 1 prompt. I have yet to come even close to hitting the limit with heavy use.
Run Qwen3-coder-next locally. That's what I'm doing (using LMstudio). It's actually a surprisingly capable model. I've had it working on some LLVM-IR manipulation and microcode generation for a kind of VLIW custom processor. I've been pleasantly surprised that it can handle this (LLVM is not easy) - there are also verilog code that define the processor's behavior that it reads to determine the microcode format and expected processor behavior. When I do hit something that it seems to struggle with I can go over to antigravity and get some free Gemini 3 flash usage.
Pi treats you like an adult and shows whatever the fuck LLM is doing rather than actively hiding shit from the user. And just for that, once you tasted the freedom and transparency, there’s no way to go back to CC.
After 2.20.0 of Claude code where they started not showing what files are read / searches are made by default .. I fucking love how easy it was to ditch Claude code for pi.
To me, the most interesting thing about Pi and the "claw" phenomenon is what it means for open source. It's becoming passé to ask for feature requests and even to submit PRs to open source repos. Instead of extensions you install, you download a skill file that tells a coding agent how to add a feature. The software stops being an artifact and starts being a living tool that isn't the same as anyone else's copy. I'm curious to see what tooling will emerge for collaborating with this new paradigm.
So everybody will be using (sometimes slightly, sometimes entirely) different software. Like mutations, these adapt to the specific problems in the situation they were prompted to be programmed.
It's like the dude who monkey-patches their car and goes to the dealer to complain why the suspension is stiff.
It's because you put 2by4's in place of the shocks, you absolute muppet. And then they either give them a massive bill to fix it properly or politely show them out.
Same will happen in self-modifying software. Some people are self-aware enough to know that "I made this, it's my problem to fix", some will complain to the maker of the harness they used and will be summarily shown the door.
That's just because corporations got greedy and made their apps suck.
Strip away the ads, the data harvesting, add back the power features, and we'll be happy again. I'm more willing than ever to pay a one-time fee good software. I've started donating to all the free apps I use on a regular basis.
I don't want to own my own slop. That doesn't help me. Use your AI tools to build out the software if you want, but make sure it does a good job. Don't make me fiddle with indeterministic flavor-of-the-month AI gents.
I think there's room for both visions. Big Tech is generating more toxic sludge than ever, and yeah sure this is because they're greedy, but more precisely the root cause is how they lobbied Washington and our elected officials agreed to all kinds of pro-corporate, anti-human legislation. Like destroying our right to repair, like criminalizing "circumvention" measures in devices we own, like insane life-destroying penalties for copyright infringement, like looking the other way when Big Tech broke anti-trust laws, etc.
The Big Tech slop can only be fixed in one way, and actually it's really predictable and will work - we need to fix the laws so that they put the rights and flourishing of human beings first, not the rights and flourishing of Big Tech. We need to fix enforcement because there are so many times that these companies just break the law and they get convicted but they get off with a slap on the wrist. We need to legislate a dismantling of barriers to new entrants in the sectors they dominate. Competition for the consumer dollar is the only thing that can force them to be more honest. They need to see that their customers are leaving for something better, otherwise they'll never improve.
But our elected officials have crafted laws and an enforcement system which make 'something better' impossible (or at least highly uneconomical).
Parallel to this if open source projects can develop software which is easier for the user to change via a PR, they totally should. We can and should have the best of both worlds. We should have the big companies producing better "boxed" software. Plus we should have more flexibility to build, tweak and run whatever we want.
What you're describing is the expected and correct outcome inside a profit-oriented, capitalist system. So the only way I see out of this situation would be changing policy to a more socialist one, which doesn't seem to be so popular among the tech elite, who often think they deserve their financial status because of the 'value' they provide, without specifying what that value is (or its second-order consequences). Whether that's abusing a monopolistic market position they lucked into, making apps as addictive as possible, or building drones that throw bombs on newborns in hospitals.
I think we're after the same goal but have a different view of mechanism.
Regulation enforcement against the anti-market behaviors would bring a lot of good.
Putting too much power in any centralized authority - company or government - seems to lead to oppression and unhealthy culture.
Fair markets are the neatest trick we have. They put the freedom of choice in the hands of the individual and allow organic collaboration.
The framing should not be government vs company. But distributed vs centralized power. For both governance and commerce.
The entire world right now suffers from too much centralized power. That comes in the form of both corporate and government. Power tends to consolidate until the bureaucracy of the approach becomes too inefficient and collapses under its own weight. That process is painful, and it's not something I enjoy living through.
If you see through that lens, it has explaining power for the problems of both the EU countries and the US.
> That's just because corporations got greedy and made their apps suck.
It is true for me with Linux. I code for a living and I can't change anything because I can't even build most software -- the usual configure/make/make install runs into tons of compiler errors most of the time.
Loss of control is an issue. I'm curious if AI tools will change that though.
I'm presently in the process of building (read: directing claude/codex to build) my own AI agent from the ground up, and it's been an absolute blast.
Building it exactly to my design specs, giving it only the tool calls I need, owning all the data it stores about me for RAG, integrating it to the exact services/pipelines I care about... It's nothing short of invigorating to have this degree of control over something so powerful.
In a couple of days work, I have a discord bot that's about as useful as chatgpt, using open models, running on a VPS I manage, for less than $20/mo (including inference). And I have full control over what capabilities I add to it in the future. Truly wild.
I'm using kimi-k2-instruct as the primary model and building out tool calls that use gpt-oss-120b to allow it to opt-in to reasoning capabilities.
Using Vultr for the VPS hosting, as well as their inference product which AFAIK is by far the cheapest option for hosting models of these class ($10/mo for 50M tokens, and $0.20/M tokens after that). They also offer Vector Storage as part of their inference subscription which makes it very convenient to get inference + durable memory & RAG w/ a single API key.
Their inference product is currently in beta, so not sure whether the price will stay this low for the long haul.
> It's nothing short of invigorating to have this degree of control over something so powerful
I'm a SWE w/ >10 years, and you're right, this part has always been invigorating.
I suppose what's "new" here is the drastically reduced amount of cognitive energy I need build complex projects in my spare time. As someone who was originally drawn to software because of how much it lowered the barrier to entry of birthing an idea into existence (when compared to hardware), I am genuinely thrilled to see said barrier lowered so much further.
Sharing my own anecdotal experience:
My current day job is leading development of a React Native mobile app in Typescript with a backend PaaS, and the bulk of my working memory is filled up by information in that domain. Given this is currently what pays the bills, it's hard to justify devoting all that much of my brain deep-diving into other technologies or stacks merely for fun or to satisfy my curiosity.
But today, despite those limitations, I find myself having built a bespoke AI agent written from scratch in Go, using a janky beta AI Inference API with weird bugs and sub-par documentation, on a VPS sandbox with a custom Tmux & Neovim config I can "mosh" into from anywhere using finely-tuned Tailscale access rules.
I have enough experience and high-level knowledge that it's pretty easy for me to develop a clear idea of what exactly I want to build from a tooling/architecture standpoint, but prior to Claude, Codex, etc., the "how" of building it tended to be a big stumbling block. I'd excitedly start building, only to run into the random barriers of "my laptop has an ancient version of Go from the last project I abandoned" or "neovim is having trouble starting the lsp/linter/formatter" and eventually go "ugh, not worth it" and give up.
Frankly, as my career progressed and the increasingly complex problems at work left me with vanishingly less brain-space for passion projects, I was beginning to feel this crushing sense of apathy & borderline despair. I felt I'd never be able make good on my younger self's desire to bring these exciting ideas of mine into existence. I even got to the point where I convinced myself it was "my fault" because I lacked the metal to stomach the challenges of day-to-day software development.
Now I can just decide "Hmm.. I want an lightweight agent in a portable binary. Makes sense to use Go." or "this beta API offers super cheap inference, so it's worth dealing with some jank" and then let an LLM work out all the details and do all the troubleshooting for me. Feels like a complete 180 from where I was even just a year or two ago.
At the risk of sounding hyperbolic, I don't think it's overstating things to say that the advent of "agentic engineering" has saved my career.
I'm actually relieved they're doing it now because it's going to be a forcing function for the local LLM ecosystem. Same thing with their "distillation attack" smear piece -- the more of a spotlight people get on true alternatives + competition to the 900 lb gorillas, the better for all users of LLMs.
I really hope so. I moved to Codex, only to get my account flagged and my requests downgraded to 5.2 because of some "safety" thing. Now OpenAI demands I hand my ID over to Persona, the incredibly dodgy US surveillance company Discord just parted ways with, to get back what I paid for.
This timeline sucks, I don't want to live in a future where Anthropic and OpenAI are the arbiters of what we can and cannot do.
> The nerds could always make a home with their linux desktop. Now everyone can. It'll change the equation.
Probelm is, to be able to do what you're describing, you still need the source code and the permission to modify it. So you will need to switch to the FOSS tools the nerds are using.
Think of skills more like Excel macros (or any other software with robust macro support). It doesn't make sense for Microsoft to provide the specific workflow you need, but your own sheet needs it.
> a living tool that isn't the same as anyone else's copy
Yes, which is why this model of development is basically dead-in-the-water in terms of institutional adoption. No large firm or government is going to allow that.
It wasn't "inevitable", it took Red Hat and some other key players addressing the concerns the businesses and governments had, which took the better part of a decade. If LLMs as an ecosystem don't implode in the next year or so I imagine you'll start to see some big consultancies starting that same process for them.
> it took Red Hat and some other key players addressing the concerns the businesses and governments had
Red Hat? I don't think they are involved in the moves to FOSS for government agencies, mostly because they're American, and the ones who are currently moving quickly (in the government world at least) are the ones who aren't American and what to get rid of their reliance on American infrastructure and software.
> Visit Washington DC some time and ride the metro. Red Hat puts out ads about all their public sector offerings.
I haven't had a single need to visit the US, and I still have zero needs for it. If I need to read subway ads to understand how a company is connected to FOSS, I think I'll skip that and continue using and working with companies who make that clear up front :) Thanks for the offer though!
Most American government infrastructure runs on Red Hat. Almost all of Amazon's internal operations runs on Amazon Linux, which is a rebranded Red Hat, and it powers Gov Cloud.
Right, but is "the States" currently trying to migrate away from US infrastructure and choosing FOSS to do so? That was the context I was entering this thread with, since most of the organizations moving to FOSS right now are doing so to move away from US infrastructure.
The whole context was how Red Hat was historically involved in addressing the concerns that were hindering government adoption. Are you just being intentionally obtuse to denigrate the US for some reason?
Worse it better for you when it meets your needs better.
I use a lot of my own software. Most of it is strictly worse both in terms of features and bugs than more intentional, planned projects. The reason I do it is because each of those tools solve my specific pain points in ways that makes my life better.
A concrete example: I have a personal dashboard. It was written by Claude in its entirety. I've skimmed the code, but no more than that. I don't review individual changes. It works for me. It pulls in my calendar, my fitbit data, my TODO list, various custom reminders to work around my tendency to procrastinate, it surfaces data from my coding agents, it provides a nice interface for me to browse various documentation I keep to hand, and a lot more.
I could write a "proper" dashboard system with cleanly pluggable modules. If I were to write it manually I probably would because I'd want something I could easily dip in and out of working on. But when I've started doing stuff like that in the past I quickly put it aside because it cost more effort than I got out of it. The benefit it provides is low enough that even a team effort would be difficult to make pay off.
Now that equation has fundamentally changed. If there's something I don't like, I tell Claude, and a few minutes - or more - later, I reload the dashboard and 90% of the time it's improved.
I have no illusions that code is generic enough to be usable for others, and that's fine, because the cost of maintaining it in my time is so low that I have no need to share that burden with others.
I think this will change how a lot of software is written. A "dashboard toolkit" for example would still have value to my "project". But for my agent to pull in and use to put together my dashboard faster.
A lot of "finished products" will be a lot less valuable because it'll become easier to get exactly what you want by having your agent assemble what is out there, and write what isn't out there from scratch.
It is a way of sharing and improving software already today. Not a major way, yet, but I don't agree with you it would be a bad thing for that to become more common, in as much as - to go back to my dashboard example - sharing a skill that contains some of the lessons learned, and packages small parts would seem far more flexible and viable as a path for me to help make it easier for others to do the same, than packaging up something in a way that'd give the expectation that it was something finished.
But also, note that skills can carry scripts with them, so they are definitely also more than a my_feature.md.
I've been thinking about this lately too. I think we're going to see the rise of Extremely Personal Software, software that barely makes any sense outside of someone's personal context. I think there is going to be _so_ much software written for an audience of 1-10 people in the next year. I've had Claude create so much tooling for me and a small number of others in the last few months. A DnD schedule app; a spoiler-free formula e news checker; a single-use voting site for a climbing co-op; tools to access other tools that I don't like using by hand; just absolutely tons of stuff that would never have made any sense to spend time on before. It's a new world. https://redfloatplane.lol/blog/14-releasing-software-now/
I think people overestimate the general population's ability and interest in vibe coding. Open source tools are still a small niche. Vibe code customized apps are an even bigger niche.
Maybe so. I guess I feel that in a couple of years it may not be called vibe coding, or even coding, I think it might be called 'using a computer'. I suppose it's very hard to correctly estimate or reason about such a big change.
I actually look at this another way. I think we’re going to see a lot more open source. Before you had to get your pr merged into main. Now people will just ask ai to build the tool they need and then open source it.
Maintainers won’t have to deal with an endless stream of PRs. Now people will just clone your library the second it has traction and make it perfect for their specific use case.
Cherry pick the best features and build something perfect for them. They’ll be able to do things your product can’t, and individual users will probably find a better fit in these spinoffs than in the original app.
> It's becoming passé to ask for feature requests and even to submit PRs to open source repos.
Yet, the first impact on FOSS seems to be quite the opposite: maintainers complaining about PRs and vulnerability disclosures that turn out to be AI hallucinations, wasting their time. It seems to be so bad that now GitHub is offering the possibility of turning off pull requests for repositories. What you present here is an optimistic view, and I would be happy for it to be correct, but what we've seen so far unfortunately seems to point in a different direction.
We might be witnessing some survivor bias here based on our own human conditioning. Successful PRs aren't going to make the news like the bad ones do.
With that said, we are all dealing with AI still convincingly writing code that doesn't work despite passing tests or introducing hard to find bugs. It will be some time until we iron that out fully for more reliable output I suspect.
Unfortunately we won't be able to stop humans thinking they are software engineers when they are not now that the abstraction language is the human language so guarding from spam will be more important than ever.
> Instead of extensions you install, you download a skill file that tells a coding agent how to add a feature. The software stops being an artifact and starts being a living tool that isn't the same as anyone else's copy. I'm curious to see what tooling will emerge for collaborating with this new paradigm.
I build my own inspired by Beads, not quite as you're describing, but I store todo's in a SQLite database (beads used SQLite AND git hooks, I didn't want to be married to git), and I let them sync to and from GitHub Issues, so in theory I can fork a GitHub repo, and have my tool pull down issues from the original repo (havent tried it when its a fork, so that's a new task for the task pile).
You can see me dogfeeding my tool to my tools codebase and having my issues on the github for anyone to see, you can see the closed ones. I do think we will see an increase in local dev tooling that is tried and tested by its own creators, which will yield better purpose driven tooling that is generic enough to be useful to others.
I used to use Beads for all my Claude Code projects, now I just use GuardRails because it has safety nets and works without git which is what I wanted.
I could have forked Beads, but the other thing is Beads is a behemoth of code, it was much easier to start from nothing but a very detailed spec and Claude Code ;)
I spent 3 months adopting Codex and Claude Code SDKs only to realize they're just vendor lock-in and brittle. They're intended to be used as CLI so it's not programmable enough as a library. After digging into OpenClaw codebase, I can safely say that the most of its success comes from the underlying harness, pi agent.
pi plugins support adding hooks at every stage, from tool calls to compaction and let you customize the TUI UI as well. I use it for my multi-tenant Openclaw alternative https://github.com/lobu-ai/lobu
If you're building an agent, please don't use proprietary SDKs from model providers. Just stick to ai-sdk or pi agent.
IIUC to reliably use 3P tools you need to use API billing, right? Based on my limited experimentation this is an order of magnitude more expensive than consumer subscriptions like Claude Pro, do I have that right?
("Limited experimentation" = a few months ago I threw $10 into the Anthropic console and did a bit of vibe coding and found my $10 disappeared within a couple of hours).
If so, that would support your concern, it does kinda sound like they're selling marginal Claude Code / Gemini CLI tokens at a loss. Which definitely smells like an aggressive lockin strategy.
Technically you're still using claude CLI with this pattern so it's not 3P app calling Anthropic APIs via your OAuth token. Even if you would use Claude Code SDK, your app is 3P so it's in a gray area.
Anthropic docs is intentionally not clear about how 3P tools are defined, is it calling Claude app or the Anthropic API with the OAuth tokens?
Unfortunately it's currently very utopian for (I would assume) most devs to use something like this when API cost is so prohibitively expensive compared to e.g. Claude Code. I would love to use a lighter and better harness, but I wouldn't love to quintuple my monthly costs. For now the pricing advantage is just too big for me compared to the inconvenience of using CC.
Is this in line with Anthropic ToS? They cracked down hard on Clawdbot and the like from what I gathered. I guess if you are still invoking CC it might be fine, but isn't that gonna lead to weird behavior from basically doubling up on harnesses?
I left some notes about this. I agree with you directionally but practically/economically you want to let users leverage what they're already paying for.
In an ideal world we would have a pi-cli-mono or similar, like something that is not as powerful as pi but gives a least common denominator sort of interface to access at least claude/codex.
ACP is also something interesting in this space, though I don't honestly know how that fits into this story.
For all of the recent talk about how Anthropic relies on heavy cache optimization for claude-code, it certainly seems like session-specific information (the exact datestamp, the pid-specific temporary directory for memory storage) enters awfully early in the system prompt.
I dont. I use this as my coding harness (replacement of gemini-cli/claudecode etc). I dont want to sandbox it because I expect it to be used only for coding on projects. I dont want to over complicate it.
I am building my own assistant as an AI harness - that is definitely getting sandboxed to run only as a VM on my Mac.
A few things. I intentionally clone the repo and build it locally for my use and use it as my-omp.. this way, I can make oh-my-pi make customisations like skills, tools, anything and yet retain the ability to do a git pull from upstream with cherry picking if necessary.
I have this in my shell rc.
# bun
export BUN_INSTALL="$HOME/.bun"
export PATH="$BUN_INSTALL/bin:$PATH"
alias my-omp= "bun/Users/aravindhsampathkumar/ai_playground/oh-my-pi/packages/coding-agent/src/cli.ts"
and do
1. git pull origin main
2. bun install
3. bun run build:native
every time I pull changes from upstream.
Until yesterday, this process was purely bliss - my own minimal custom system prompt, minimal AGENTS.md, and self curated skills.md. One thing I was wary of switching from pi to oh-my-pi was the use of Rust tools pi-native using NAPI. The last couple of days whatever changes I pulled from upstream is causing the models to get confused about which tool to use and how while editing/patching files. They are getting extremely annoyed - I see 11 iterations of a tool call to edit a damn file and the model then resorted to rewriting the whole file from memory, and we al know how that goes. This may not be a bug in oh-my-pi per se. My guess is that the agent developed its memory based on prior usage of the tools and my updating oh-my-pi brought changes in their usage. It might be okay if I could lose all agent memory and begin again, but I dont want to.
I'm going to be more diligent about pulling upstream changes from now on, and only do when I can afford a full session memory wipe.
Otherwise, the integrations with exa for search, LSP servers on local machine, syntax highlighting, steering prompts, custom tools (using trafilatura to fetch contents of any url as markdown, use calculator instead of making LLM do arithmetic) etc work like a charm. I haven't used the IPython integration nor do I plan to.
Stop advertising pi, people. It _somehow_ continued to fly somewhat under the radar after that whole OpenClaw nonsense. Don’t make Anthropic’s sic their bloodhounds on them like they did on OpenCode.
People deserve to know it exists, I got tired of even OpenCode workflows/agents, installed OpenSpec but all this wrapped todos still not how i wanted
I needed more control but dint wanted to write my own tool, then i ended knowing about pi, this got me interested at first read:
No plan mode. Write plans to files, or build it with extensions, or install a package.
No built-in to-dos. Use a TODO.md file, or build your own with extensions.
No background bash. Use tmux. Full observability, direct interaction.
This is very important to have control and ownership.
Pi is not for everyone, but the ones eventually want to have tools like (read, bash, edit, write, grep, find, ls) as building blocks.
it even runs inside a browser I'll publish my browserpi if someone is interested I did not dare to add a pull request with my slop but i would love to show the fork and create a pull request if there is broader interest
I happen to be somewhat familiar with OpenCode and am considering using it as a personal AI workspace (some chat & agentic behavior,
not worrying about initiative behavior just yet, I’d try to DIY memory with local files and access to my notes) because it seems to have a decent ecosystem.
Pi appears to have a smaller, less “pre-made” ecosystem, but with more flexibility, enthusiasm and extensibility.
Is this correct? Should I look towards Pi over OpenCode? What are the UI options?
I have the same question as you, but I want to add that I used OpenCode for general tasks like writing, organization and such but with a context of .md files and it works wonders. And like you, I am considering trying a better suited harness for this task.
I looked a bit into the reasoning for Pi’s design (https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...) and, while it does seem to do a lot of things very well around extensibility, I do miss support for permissions, MCP and perhaps Todos and a server mode. OpenCode seems a lot more complete in that regard, but I can well imagine that people have adapted Pi for these use cases (OpenClaw seems to have all of these). So it’s definitely not out of the race yet, but I still appreciate OpenCodes relative seeming completeness in comparison.
As soon as your agent can write and execute code, your permissions are just a security theater. If you care, just do proper sandboxing. If not, there are extensions for that.
> MCP
Again, Pi is extensible.
pi install pi-mcp-adapter
Now, you can connect to any mcp.
> and perhaps Todos
At least 10 different todo extensions. Pick which one you like. If you don't like any of them, ask Pi to write one for you.
> and a server mode.
Pi has rpc mode, which is a kind of server. If that's not enough, you could extend it.
> OpenCode seems a lot more complete in that regard,
Yes, but good luck working with Opencode if you don't like their plan-mode. Or todo support. And MCP. You pay their cost in complexity and tokens even if you don't use them or you don't like how they work.
> but I can well imagine that people have adapted Pi for these use cases (OpenClaw seems to have all of these). So it’s definitely not out of the race yet, but I still appreciate OpenCodes relative seeming completeness in comparison.
There's also an oh-my-pi fork if you want an out-of-the-box experience. Still, in my experience, nothing beats Pi in terms of customizability. It's the first piece of software that I can easily make completely to my liking. And I say that as a decade old Emacs user.
To be honest, none for what I am using for (organizing documents, cross-referencing information, writing summaries of documents). Howeverm it feels wrong using OpenCode for this. I somehow think there must be a better way of doing this.
I've been using PI for this - just switch to "oh my pi" and am liking it!
Honestly, it's been a dream, I have it running in a docker-sandbox with access to a single git repo (not hosted) that I am using for varied things with my business.
Try it out, it's super easy to setup. If you use docker sandbox, you can just follow what is necessary for claude, spin up the sandbox, exit out, exec into it with bash and switch to Pi.
The people pushing oh-my-pi seem to have missed the point of pi... Downloading 200k+ lines of additional code seems completely against the philosophy of building up your harness, letting your agent self-improve, relying on code that you control.
If you want bags of features, rather clone oh-my-pi somewhere, and get your agent to bring in bits of it a time, checking, reviewing, customising as you go.
I still don't get why would you want to use a terminal app to code when you can do all of this through IDE extension which does the same except it is better integrated.
You can open a grid of windows inside vscode too and it comes back up exactly as it was on reload.
When I use a CLI agent to code, I don't need the IDE for anything.
Think of it more like directing a coworker or subcontractor via text chat. You tell them what you want and get a result, then you test it if it's what you want and give more instructions if needed.
I literally just fixed a maintenance program on my own server while working my $dayjob. ssh to server, start up claude and tell it what's wrong, tab away. Then I came back some time later, read what it had done, tested the script and immediately got a few improvement ideas. Gave them to Claude, tabbed out, etc.
Took me maybe 15 minutes of active work while chatting on Slack and managing my other tasks. I never needed to look at the code at any point. If it works and tests pass, why do I care what it looks like?
In my own experience I cannot blindly accept code without even looking at it even for a few moments because I've had many situations where the code was simply doing the wrong things... including tests are completely wrong and testing the wrong assumptions.
So yah, even when I review trivial changes I still look at the diff view to see if it makes sense. And IDEs make code review a lot easier than diff.
Btw, this experience is not from lack of trying. We use coding agent extensively (I would assume more than the typical org looking at our bill) and while they are certainly very, very helpful and I cannot describe how much effort they are really saving us, there is absolutely zero chance of pushing something out without reviewing it first - same applies for code written by AI agent or a coworker.
> I still don't get why would you want to use a terminal app to code when you can do all of this through IDE extension which does the same except it is better integrated.
I agree. I tried Gemini CLI for a while, and didn't like how separate I felt from the underlying files: rather than doing minor cleanup myself, the activation energy of switching to a separate editor and opening the same files was too high, so I'd prompt the LLM to it instead. Which was often an exercise in frustration, as it would take many rounds of explanation for such tiny payoffs; maybe even fiddling with system prompts and markdown files, to try and avoid wasting so much time in the future...
I've been using Pi for a few weeks now, and have managed to integrate it quite deeply into Emacs. I run it entirely via RPC mode (JSON over stdio), so I don't really know (or care) about its terminal UI :-)
> I still don't get why would you want to use a terminal app to code when you can do all of this through IDE extension which does the same except it is better integrated.
Because then you need to make an extension for every IDE. Isn't it better to make a CLI tool with a server, and let people make IDE extensions to communicate with it?
Claude Code has an update every few days. Imagine now propagating those changes to 20+ IDEs.
I've found VSCode _ok_ to work with across across different workspaces/projects. The window memory is hit and miss. There's a secondary side bar I've been trying to NOT have open on startup but always seem to stick around. I'd prefer to programmatically manage the windows so I can tinker with an automated setup but the VSCode API/Plugins for managing this are terrible and tend to fail silently.
CLI within VSCode is workable but most of my VSCode envs are within a docker container. This is a pattern that I'm moving more and more away from as agents within a container kind of suck.
Doesn't need a terminal: run it in RPC mode to send/receive JSON over stdio. That's how the pi-coding-agent Emacs package works, which is the only way I've ever used Pi.
It seems pretty well done: when I added permission requests to the `bash` tool, the "Are you sure y/N" requests started appearing just like they were native to Emacs.
cyanydeez | a day ago
Anyway, even if you give your agent permission, there's no secure way to know whether what they're asking to is what they'll actually do, etc.
himata4113 | 23 hours ago
chriswarbo | 21 hours ago
Pi supports permission popups, but doesn't use them by default. Their example extensions show how to do it (add an event listener for `tool_call` events; to block the call put `block: true` in its result).
> there's no secure way to know whether what they're asking to is what they'll actually do
What do you mean? `tool_call` event listeners are given the parameters of the tool call; so e.g. a call to the `bash` tool will show the exact command that will execute (unless we block it, of course).
cermicelli | a day ago
jotaen | 23 hours ago
schpet | 23 hours ago
squeefers | 6 hours ago
arjie | 23 hours ago
EDIT: Thank you to both responders. I'll just try the two options out then.
rcarmo | 23 hours ago
dosinga | 23 hours ago
evalstate | 23 hours ago
fred_tandemai | 23 hours ago
chriswarbo | 21 hours ago
That's how the pi-coding-agent Emacs package interacts with pi; and it's how I write automated tests for my own pi extensions (along with a dummy LLM that emits canned responses).
mihneadevries | 11 hours ago
Fwiw the janky `claude -p` approach you described is actually pretty solid once you stop fighting it, the simplicity is the feature I think.
alchemist1e9 | 8 hours ago
jmorgan | 23 hours ago
rpastuszak | 22 hours ago
jmorgan | 15 hours ago
hrmtst93837 | 8 hours ago
rcarmo | 23 hours ago
badlogic | 23 hours ago
rcarmo | 23 hours ago
badlogic | 23 hours ago
rcarmo | 23 hours ago
gusmally | 20 hours ago
rcarmo | 14 hours ago
embedding-shape | 11 hours ago
https://github.com/search?q=repo%3Arcarmo%2Fagentbox%20codex...
rcarmo | 8 hours ago
Before Pi, I actually preferred Mistral Vibe’s UX
embedding-shape | 8 hours ago
I was curious about your project, but the sloppy usage of even the most basic terms kind of makes me not to want to dive deeper, how could I even trust it does what it says on the tin, if apparently we don't even have a shared vocabulary?
rcarmo | 4 hours ago
baby | 18 hours ago
furryrain | 17 hours ago
Model + prompt + function calls.
There are many such wrappers, and they differ largely on UI deployment/integration. Harness feels like a decent term, though "coding harness" feels a bit vague.
baby | 7 hours ago
solarkraft | 12 hours ago
I’m still looking for a generic agent interaction protocol (to make it worth building around) and thought ACP might be it. But (and this is from a cursory look) it seems that even OpenCode, which does support ACP, doesn’t use it for its own UI. So what’s wrong with it and are there better options to hopefully take its place?
rcarmo | 8 hours ago
The best option will always be in-memory exchanges. Right now I am still using the pi RPC, and that also involves a bit of conversion, but it’s much lighter.
ljm | 5 hours ago
I've started using OpenCode for some things in a big window because its side-by-side diff is great.
infruset | 23 hours ago
himata4113 | 23 hours ago
mijoharas | 22 hours ago
virtuallynathan | 18 hours ago
Went from codex/claude code -> opencode -> pi -> oh-my-pi
esafak | 17 hours ago
jannniii | 13 hours ago
I am still an avid user of opencode, my own fork though with async tools etc, but it is cumbersome and tries to do too many things.
tietjens | 12 hours ago
jannniii | 13 hours ago
amin2 | 13 hours ago
e1g | 11 hours ago
himata4113 | 4 hours ago
thepasch | 13 hours ago
himata4113 | 4 hours ago
It's just an opinionated fork, either you like it or you don't. I personally really like it.
ge96 | 23 hours ago
Wondering if you wanted a similar interface (though a GUI not just CLI) where it's not for coding what would you call that?
Same idea cycle through models, ask question, drag-drop images, etc...
rcarmo | 23 hours ago
outofpaper | 23 hours ago
arcanemachiner | 22 hours ago
squeefers | 6 hours ago
arcanemachiner | 22 hours ago
"harness" fits pretty nicely IMO. It can be used as a single word, and it's not too semantically overloaded to be useful in this context.
unshavedyak | 5 hours ago
ge96 | 4 hours ago
For me I'm looking to stick with Python so will whip something up with Tkinter later for the desktop GUI aspect although I still like Electron/JS primarily.
fred_tandemai | 23 hours ago
fjk | 23 hours ago
Here’s an example config: https://github.com/earendil-works/gondolin/blob/main/host/ex...
rcarmo | 22 hours ago
monkey26 | 22 hours ago
I think I published it. Check the pi package page.
ac29 | 21 hours ago
Note that if you sandbox to literally just the working directly, pi itself wont run since pretty much every linux application needs to be able to read from /usr and /etc
carderne | 10 hours ago
https://github.com/carderne/pi-sandbox
ramoz | 23 hours ago
Eg some form of comprehensive planning/spec workflow is best modeled as an extension vs natively built in. And the extension still ends up feeling “native” in use
mongrelion | 23 hours ago
Does anyone have an idea as to why this would be a feature? don't you want to have a discussion with your agent to iron out the details before moving onto the implementation (build) phase?
In any case, looks cool :)
EDIT 1: Formatting EDIT 2: Thanks everyone for your input. I was not aware of the extensibility model that pi had in mind or that you can also iterate your plan on a PLAN.md file. Very interesting approach. I'll have a look and give it a go.
ramoz | 23 hours ago
https://github.com/badlogic/pi-mono/tree/main/packages/codin...
alvivar | 23 hours ago
jauntywundrkind | 18 hours ago
As for subagents, Pi has sessions. And it has a full session tree & forking. This is one of my favorite things, in all harnesses: build the thing with half the context, then keep using that as a checkpoint, doing new work, from that same branch point. It means still having a very usable lengthy context window but having good fundamental project knowledge loaded.
miroljub | 23 hours ago
There are already multiple implementations of everything.
With a powerful and extensible core, you don't need everything prepackaged.
mccoyb | 23 hours ago
semiinfinitely | 22 hours ago
Blackarea | 22 hours ago
fragmede | 18 hours ago
So it can share code with the web app.
Because writing it in javascript is easier than writing it in raw brute forced assembly.
mccoyb | 22 hours ago
This is quite nice — I do think there’s a version of pi’s design choices which could live in a static harness, but fully covering the same capabilities as pi without a dynamic language would be difficult. (You could imagine specifying a programmable UI, etc — various ways to extend the behavior of the system, and you’d like end up with an interpreter in the harness)
At least, you’d like to have a way to hot reload code (Elixir / Erlang could be interesting)
This is my intuition, at least.
jatari | 21 hours ago
mccoyb | 21 hours ago
jauntywundrkind | 18 hours ago
I'm super on board the rust train right now & super loving it. But no, code hot loading is not common.
Most code in the world is dead code. Most languages are for dead code. It's sad. Stop writing dead code (2022) was no where near the first, is decades and decades late in calling this out, but still a good one. https://jackrusher.com/strange-loop-2022/
jasonjmcghee | 16 hours ago
But Rust can dynamically link with dylib but I believe it's still unstable.
It can also dynamically load with libloading.
sergiomattei | 21 hours ago
No serialization/JSON-RPC layer between a TS CLI and Elixir server. TS TUI libraries utilities are really nice (I rewrote the Elixir-based CLI prototype as it was slowing me down). Easy to extend with custom tools without having to write them in Elixir, which can be intimidating.
But you're right that Erlang's computing vision lends itself super well to this problem space.
[1]: https://github.com/matteing/opal
sean_pedersen | 21 hours ago
jauntywundrkind | 18 hours ago
Their agent mail was great & very early in agent orchestration. Code agent search is amazing & will tell you what's happening in every harness. Their Franktui is a ridiculously good rust tui. They have project after project after project after project and they are all so good.
Didn't know they had a rust Pi. Nice.
saberience | 12 hours ago
It’s clear it was 100% written by Claude using sub-agents which explains the many classes with 5000 lines of rust in a single file.
It’s a huge buggy mess which doesn’t run on my Mac.
If you’re a rust engineer and want a good laugh, go take a look at the agent.rs, auth.rs, or any of the core components.
orangecoffee | 9 hours ago
Caring about taste in coding is past now. It's sad :( but also something to accept.
mr_mitm | 8 hours ago
orangecoffee | 7 hours ago
mr_mitm | 7 hours ago
Looks like a lot of nonsensical commits.
saberience | 6 hours ago
First of all it wouldn't build, I have to mess around with git sub-modules to get it building.
Then trying to use it. First of all the scrolling behavior is broken. You cannot scroll properly when there are lots of tool outputs, the window freezes. I also ended up with lots of weird UI bugs when trying to use slash commands. Sometimes they stop the window scrolling, sometimes the slash commands don't even show at all.
The general text output is flaky, how it shows results of tools, the formatting, the colors, whether it auto-scrolls or gets stuck is all very weird and broken.
You can easily force it into a broken state by just running lots of tool calls, then the UI just freezes up.
But just try it and see for yourself...
saberience | 12 hours ago
It looks 100% vibe coded by someone who’s a complete neophyte.
mr_mitm | 8 hours ago
The first issue I had was to figure out the schema of the models.json, as someone who hadn't used the original pi before. Then I noticed the documented `/skill:` command doesn't exist. That's also hard to see because the slash menu is rendered off screen if the prompt is at the bottom of the terminal. And when I see it, the selected menu items always jumps back to the first line, but looks like he fixed that yesterday.
The tool output appears to mangle the transcript, and I can't even see the exact command it ran, only the output of the command. The README is overwhelmingly long and I don't understand what's important for me as a first time user and what isn't. Benchmarks and code internals aren't too terribly relevant to me at this point.
I looked at the original pi next and realized the config schema is subtly different (snake_case instead of camelCase). Since it was advertised as a port, I expected it to be a drop-in replacement, which is clearly not the case.
All in all it doesn't inspire confidence. Unfortunate.
Edit: The original pi also says that there is a `/skill` command, but then it is missing in the following table: https://github.com/badlogic/pi-mono/tree/main/packages/codin...
The `/skill` command also doesn't seem registered when I use pi. What is going on? How are people using this?
Edit2: Ah, they have to be placed in `~/.pi/agent/skills`, not `~/.pi/skills`, even though according to the docs, both should work: https://github.com/badlogic/pi-mono/tree/main/packages/codin...
This is exhausting.
moonlion_eth | 20 hours ago
andai | 18 hours ago
https://news.ycombinator.com/item?id=47120784
thomasfromcdnjs | 15 hours ago
Don't hate me aha and no, there is no reason other than I can
KeplerBoy | 14 hours ago
raincole | 12 hours ago
solarkraft | 12 hours ago
rahimnathwani | 22 hours ago
https://x.com/victormustar/status/2026380984866710002
lukasb | 22 hours ago
mccoyb | 22 hours ago
ac29 | 21 hours ago
theshrike79 | 22 hours ago
There's something in the default Codex harness that makes it fight with both arms behind its back, maybe the sandboxing is overly paranoid or something.
With Pi I can one-shot many features faster and more accurately than with Codex-cli.
suralind | 22 hours ago
muratsu | 22 hours ago
ramoz | 22 hours ago
https://plannotator.ai/blog/plannotator-meets-pi/
elyase | 22 hours ago
https://github.com/elyase/awesome-personal-ai-assistants?tab...
_neil | 20 hours ago
snthpy | 16 hours ago
mccoyb | 10 hours ago
Pass for me.
tmustier | 21 hours ago
and you can build cool stuff on top of it too!
ck_one | 20 hours ago
tomashubelbauer | 13 hours ago
I dislike many things about Claude Code, but I'll pick subagents as one example. Don't want to use them? Tough luck. (AFAIK, it's been a while since I used CC, maybe it is configurable now or was always and I never discovered that.)
With Pi, I just didn't install an extension for that, which I suspect exists, but I have a choice of never finding out.
prettyblocks | 8 hours ago
tomashubelbauer | 8 hours ago
extr | 5 hours ago
theshrike79 | 10 hours ago
Can't do that with Claude =)
cudgy | 6 hours ago
sshine | 18 hours ago
Pleased to meet you!
For me, it just didn’t compare in quality with Claude CLI and OpenCode. It didn’t finish the job. Interesting for extending, certainly, but not where my productivity gains lie.
esafak | 17 hours ago
ixsploit | 15 hours ago
theshrike79 | 13 hours ago
Now I can just make my own that does exactly what I want and need, nothing more and nothing less. It's just for me, it's not a SaaS or a "start-up" I'm the CEO of.
insin | 10 hours ago
raincole | 7 hours ago
ngrilly | 10 hours ago
PessimalDecimal | 8 hours ago
With Emacs modes like agent-shell.el available and growing, why not invest in learning a tool that is likely to survive and have mindshare beyond the next few months?
johanyc | 9 hours ago
jsumrall | 4 hours ago
And if there’s any feature codex has that you want, just have pi run codex in a tmux session and interrogate it how said feature works, and recreate it in pi.
thevinter | 21 hours ago
After my max sub expired I decided to try Kimi on a more open harness, and it ended up being one of the worst (and eye opening experiences) I had with the agentic world so far.
It was completely alienating and so much 'not for me', that afterwards I went back and immediately renewed my claude sub.
https://www.thevinter.com/blog/bad-vibes-from-pi
mccoyb | 21 hours ago
Where did you get this perspective from?
> I thought pi and its tools were supposed to be minimal and extensible. So why is a subagent extension bundling six agents I never asked for that I can’t disable or remove?
Why do you think a random subagents extension is under the same philosophy as pi?
Your blog post says little about pi proper, it's essentially concerned with issues you had with the ecosystem of extensions, often made by random people who either do or do not get the philosophy? Why would that be up to pi to enforce?
the_mitsuhiko | 14 hours ago
Pi ships with docs that include extensions and the agent looks there for inspiration if you ask it to build a custom extension.
Looking at what others publish is useful!
NamlchakKhandro | 19 hours ago
CGamesPlay | 19 hours ago
Here's the problem with Claude Code: it acts like it's got security, but it's the equivalent of a "do not walk on grass" sign. There's no technical restrictions at play, and the agent can (maliciously or accidentally) bypass the "restrictions".
That's why Pi doesn't have restrictions by default. The logic is: no matter what agent you are using, you should be using it in a real sandbox (container, VM, whatever).
esafak | 16 hours ago
CGamesPlay | 15 hours ago
esafak | 8 hours ago
edit: You're not making it sound easy at all. I don't have to build anything with the other agents.
CGamesPlay | 7 hours ago
Back to my original point: Claude Code gives you a false feeling of security, Pi gives you the accurate feeling of not having security.
the_mitsuhiko | 15 hours ago
Within days people become used to just hitting accept and allowlisting pretty much everything. The agents write length scripts into shell scripts or test runners that themselves can be destructive but they immediately allowlisted.
tern | 16 hours ago
In the meantime, the codex/cc defaults are better for me.
rcarmo | 14 hours ago
a96 | 13 hours ago
Yep. This is why I've been going "Hell, no!" and will probably keep doing so.
raincole | 12 hours ago
isagawa-co | 21 hours ago
Open source: https://github.com/isagawa-co/isagawa-kernel
chriswarbo | 21 hours ago
The extensibility is really nice. It was easy to get it using my preferred issue tracker; and I've recently overridden the built-in `read` and `write` commands to use Emacs buffers instead. I'd like to override `edit` next, but haven't figured out an approach that would play to the strengths of LLMs (i.e. not matching exact text) and Emacs (maybe using tree-sitter queries for matches?). I also gave it a general-purpose `emacs_eval`, which it has used to browse documentation with EWW.
dnouri | 20 hours ago
Let me also drop a link to the Pi Emacs mode here for anyone who wants to check it out: https://github.com/dnouri/pi-coding-agent -- or use: M-x package-install pi-coding-agent
We've been building some fun integrations in there like having RET on the output of `read`, `write`, `edit` tool calls open the corresponding file and location at point in an Emacs buffer. Parity with Pi's fantastic session and tree browsing is hopefully landing soon, too. Also: Magit :-)
chriswarbo | 9 hours ago
The implementation is pretty terrible: a giant string of vibe-coded Emacs Lisp is sent to emacsclient, which performs the actions and sends back a string of JSON.
It's been interesting to iterate on the approach: watching the LLM (in my case Claude) attempting to use the tools; noticing when it struggles or makes incorrect assumptions; and updating the tool, documentation and defaults to better match those expectations.
I've also written some Emacs Lisp which opens Pi and tells it to "Action the request/issue/problem at point in buffer '<current-buffer>'" https://github.com/Warbo/warbo-emacs-d/blob/a13a1e02f5203476...
It feels similar to the file-watching provided by Aider (which uses inotify to spot files containing `# AI!` or `# AI?`), which I've previously used with FIXME and TODO comments in code; but it also works well in non-file things, e.g. error messages and test failures in `shell-mode`, and issues listed in the Emacs UI I wrote for the Artemis bug tracker (Claude just gets the issue number from the current line, and plugs that into a Pi extension I made for Artemis :-) )
dnouri | 3 hours ago
type4 | 20 hours ago
ChatGPT $20/month is alright but I got locked out for a day after a couple hours. Considering the GitHub pro plus plan.
beacon294 | 20 hours ago
lambda | 19 hours ago
rahimnathwani | 19 hours ago
ursuscamp | 19 hours ago
UncleOxidant | 18 hours ago
[OP] kristianpaul | 18 hours ago
zirror | 16 hours ago
UncleOxidant | 2 hours ago
raffkede | 11 hours ago
indigodaddy | 20 hours ago
gtirloni | 20 hours ago
ErikBjare | 20 hours ago
jasonjmcghee | 16 hours ago
Nearly all of its value is facilitating your interaction with the LLM, the tools it can use, and how it uses them.
gtirloni | 2 hours ago
qazplm17 | 20 hours ago
TZubiri | 19 hours ago
They are all open source though so you can just find out whats going on if you want right?
WXLCKNO | 13 hours ago
moonlion_eth | 20 hours ago
mobrienv | 20 hours ago
https://github.com/mikeyobrien/rho
rglover | 19 hours ago
TZubiri | 19 hours ago
The prompt shown is
"Who's your daddy and what does he do?"
Is this a joke or tech? Is the author a dev or a clown?
NamlchakKhandro | 19 hours ago
This coding agent certainly couldn't give a fuck.
enneff | 18 hours ago
fnord77 | 19 hours ago
CGamesPlay | 19 hours ago
axelthegerman | 18 hours ago
sshine | 18 hours ago
krickelkrackel | 16 hours ago
wrxd | 15 hours ago
It’s going to be very likely that once something is patched is to be considered as diverged and very hard to upgrade
theshrike79 | 15 hours ago
It's because you put 2by4's in place of the shocks, you absolute muppet. And then they either give them a massive bill to fix it properly or politely show them out.
Same will happen in self-modifying software. Some people are self-aware enough to know that "I made this, it's my problem to fix", some will complain to the maker of the harness they used and will be summarily shown the door.
throwaway13337 | 18 hours ago
We know that a lack of control over their environment makes animals, including humans, depressed.
The software we use has so much of this lack of control. It's their way, their branding, their ads, their app. You're the guest on your own device.
It's no wonder everyone hates technology.
A world with software that is malleable, personal, and cheap - this could do a lot of good. Real ownership.
The nerds could always make a home with their linux desktop. Now everyone can. It'll change the equation.
I'm quite optimistic for this future.
hdjrudni | 17 hours ago
Strip away the ads, the data harvesting, add back the power features, and we'll be happy again. I'm more willing than ever to pay a one-time fee good software. I've started donating to all the free apps I use on a regular basis.
I don't want to own my own slop. That doesn't help me. Use your AI tools to build out the software if you want, but make sure it does a good job. Don't make me fiddle with indeterministic flavor-of-the-month AI gents.
safety1st | 16 hours ago
The Big Tech slop can only be fixed in one way, and actually it's really predictable and will work - we need to fix the laws so that they put the rights and flourishing of human beings first, not the rights and flourishing of Big Tech. We need to fix enforcement because there are so many times that these companies just break the law and they get convicted but they get off with a slap on the wrist. We need to legislate a dismantling of barriers to new entrants in the sectors they dominate. Competition for the consumer dollar is the only thing that can force them to be more honest. They need to see that their customers are leaving for something better, otherwise they'll never improve.
But our elected officials have crafted laws and an enforcement system which make 'something better' impossible (or at least highly uneconomical).
Parallel to this if open source projects can develop software which is easier for the user to change via a PR, they totally should. We can and should have the best of both worlds. We should have the big companies producing better "boxed" software. Plus we should have more flexibility to build, tweak and run whatever we want.
bergfest | 15 hours ago
mentalgear | 13 hours ago
LancelotLac | 4 hours ago
peepee1982 | 15 hours ago
throwaway13337 | 3 hours ago
Regulation enforcement against the anti-market behaviors would bring a lot of good.
Putting too much power in any centralized authority - company or government - seems to lead to oppression and unhealthy culture.
Fair markets are the neatest trick we have. They put the freedom of choice in the hands of the individual and allow organic collaboration.
The framing should not be government vs company. But distributed vs centralized power. For both governance and commerce.
The entire world right now suffers from too much centralized power. That comes in the form of both corporate and government. Power tends to consolidate until the bureaucracy of the approach becomes too inefficient and collapses under its own weight. That process is painful, and it's not something I enjoy living through.
If you see through that lens, it has explaining power for the problems of both the EU countries and the US.
moring | 14 hours ago
It is true for me with Linux. I code for a living and I can't change anything because I can't even build most software -- the usual configure/make/make install runs into tons of compiler errors most of the time.
Loss of control is an issue. I'm curious if AI tools will change that though.
h14h | 13 hours ago
Building it exactly to my design specs, giving it only the tool calls I need, owning all the data it stores about me for RAG, integrating it to the exact services/pipelines I care about... It's nothing short of invigorating to have this degree of control over something so powerful.
In a couple of days work, I have a discord bot that's about as useful as chatgpt, using open models, running on a VPS I manage, for less than $20/mo (including inference). And I have full control over what capabilities I add to it in the future. Truly wild.
afro88 | 11 hours ago
h14h | 6 hours ago
Using Vultr for the VPS hosting, as well as their inference product which AFAIK is by far the cheapest option for hosting models of these class ($10/mo for 50M tokens, and $0.20/M tokens after that). They also offer Vector Storage as part of their inference subscription which makes it very convenient to get inference + durable memory & RAG w/ a single API key.
Their inference product is currently in beta, so not sure whether the price will stay this low for the long haul.
discreteevent | 6 hours ago
Is this really that different to programming? (Maybe you haven't programmed before?)
h14h | an hour ago
> It's nothing short of invigorating to have this degree of control over something so powerful
I'm a SWE w/ >10 years, and you're right, this part has always been invigorating.
I suppose what's "new" here is the drastically reduced amount of cognitive energy I need build complex projects in my spare time. As someone who was originally drawn to software because of how much it lowered the barrier to entry of birthing an idea into existence (when compared to hardware), I am genuinely thrilled to see said barrier lowered so much further.
Sharing my own anecdotal experience:
My current day job is leading development of a React Native mobile app in Typescript with a backend PaaS, and the bulk of my working memory is filled up by information in that domain. Given this is currently what pays the bills, it's hard to justify devoting all that much of my brain deep-diving into other technologies or stacks merely for fun or to satisfy my curiosity.
But today, despite those limitations, I find myself having built a bespoke AI agent written from scratch in Go, using a janky beta AI Inference API with weird bugs and sub-par documentation, on a VPS sandbox with a custom Tmux & Neovim config I can "mosh" into from anywhere using finely-tuned Tailscale access rules.
I have enough experience and high-level knowledge that it's pretty easy for me to develop a clear idea of what exactly I want to build from a tooling/architecture standpoint, but prior to Claude, Codex, etc., the "how" of building it tended to be a big stumbling block. I'd excitedly start building, only to run into the random barriers of "my laptop has an ancient version of Go from the last project I abandoned" or "neovim is having trouble starting the lsp/linter/formatter" and eventually go "ugh, not worth it" and give up.
Frankly, as my career progressed and the increasingly complex problems at work left me with vanishingly less brain-space for passion projects, I was beginning to feel this crushing sense of apathy & borderline despair. I felt I'd never be able make good on my younger self's desire to bring these exciting ideas of mine into existence. I even got to the point where I convinced myself it was "my fault" because I lacked the metal to stomach the challenges of day-to-day software development.
Now I can just decide "Hmm.. I want an lightweight agent in a portable binary. Makes sense to use Go." or "this beta API offers super cheap inference, so it's worth dealing with some jank" and then let an LLM work out all the details and do all the troubleshooting for me. Feels like a complete 180 from where I was even just a year or two ago.
At the risk of sounding hyperbolic, I don't think it's overstating things to say that the advent of "agentic engineering" has saved my career.
cedws | 7 hours ago
yowlingcat | 3 hours ago
cedws | 2 hours ago
This timeline sucks, I don't want to live in a future where Anthropic and OpenAI are the arbiters of what we can and cannot do.
GTP | 6 hours ago
Probelm is, to be able to do what you're describing, you still need the source code and the permission to modify it. So you will need to switch to the FOSS tools the nerds are using.
throwaway13337 | 5 hours ago
It means normies will finally see value in open source beyond just being free. They'll choose it over closed source alternatives.
This, too, makes a brighter future.
blubber | an hour ago
There is OSS you are not allowed to modify etc.
CuriouslyC | 17 hours ago
theshrike79 | 15 hours ago
navigate8310 | 8 hours ago
bandrami | 13 hours ago
Yes, which is why this model of development is basically dead-in-the-water in terms of institutional adoption. No large firm or government is going to allow that.
raincole | 12 hours ago
bandrami | 12 hours ago
embedding-shape | 11 hours ago
Red Hat? I don't think they are involved in the moves to FOSS for government agencies, mostly because they're American, and the ones who are currently moving quickly (in the government world at least) are the ones who aren't American and what to get rid of their reliance on American infrastructure and software.
bandrami | 11 hours ago
embedding-shape | 10 hours ago
I haven't had a single need to visit the US, and I still have zero needs for it. If I need to read subway ads to understand how a company is connected to FOSS, I think I'll skip that and continue using and working with companies who make that clear up front :) Thanks for the offer though!
petcat | 9 hours ago
IBM didn't acquire Red Hat for no reason.
navigate8310 | 8 hours ago
embedding-shape | 8 hours ago
hrimfaxi | 6 hours ago
ambicapter | 5 hours ago
lugao | 11 hours ago
vidarh | 10 hours ago
I use a lot of my own software. Most of it is strictly worse both in terms of features and bugs than more intentional, planned projects. The reason I do it is because each of those tools solve my specific pain points in ways that makes my life better.
A concrete example: I have a personal dashboard. It was written by Claude in its entirety. I've skimmed the code, but no more than that. I don't review individual changes. It works for me. It pulls in my calendar, my fitbit data, my TODO list, various custom reminders to work around my tendency to procrastinate, it surfaces data from my coding agents, it provides a nice interface for me to browse various documentation I keep to hand, and a lot more.
I could write a "proper" dashboard system with cleanly pluggable modules. If I were to write it manually I probably would because I'd want something I could easily dip in and out of working on. But when I've started doing stuff like that in the past I quickly put it aside because it cost more effort than I got out of it. The benefit it provides is low enough that even a team effort would be difficult to make pay off.
Now that equation has fundamentally changed. If there's something I don't like, I tell Claude, and a few minutes - or more - later, I reload the dashboard and 90% of the time it's improved.
I have no illusions that code is generic enough to be usable for others, and that's fine, because the cost of maintaining it in my time is so low that I have no need to share that burden with others.
I think this will change how a lot of software is written. A "dashboard toolkit" for example would still have value to my "project". But for my agent to pull in and use to put together my dashboard faster.
A lot of "finished products" will be a lot less valuable because it'll become easier to get exactly what you want by having your agent assemble what is out there, and write what isn't out there from scratch.
lugao | 10 hours ago
> you download a skill file that tells a coding agent how to add a feature
This is suggesting a my_feature.md would be a way of sharing and improving software in the future, which I think is mostly a bad thing.
vidarh | 10 hours ago
But also, note that skills can carry scripts with them, so they are definitely also more than a my_feature.md.
hebejebelus | 10 hours ago
boh | 8 hours ago
hebejebelus | 8 hours ago
tagami | 6 hours ago
thierrydamiba | 9 hours ago
Maintainers won’t have to deal with an endless stream of PRs. Now people will just clone your library the second it has traction and make it perfect for their specific use case.
Cherry pick the best features and build something perfect for them. They’ll be able to do things your product can’t, and individual users will probably find a better fit in these spinoffs than in the original app.
rbren | 7 hours ago
https://github.com/rbren/personal-ai-devbox
GTP | 6 hours ago
Yet, the first impact on FOSS seems to be quite the opposite: maintainers complaining about PRs and vulnerability disclosures that turn out to be AI hallucinations, wasting their time. It seems to be so bad that now GitHub is offering the possibility of turning off pull requests for repositories. What you present here is an optimistic view, and I would be happy for it to be correct, but what we've seen so far unfortunately seems to point in a different direction.
brandensilva | 4 hours ago
With that said, we are all dealing with AI still convincingly writing code that doesn't work despite passing tests or introducing hard to find bugs. It will be some time until we iron that out fully for more reliable output I suspect.
Unfortunately we won't be able to stop humans thinking they are software engineers when they are not now that the abstraction language is the human language so guarding from spam will be more important than ever.
davej | 5 hours ago
giancarlostoro | 5 hours ago
I build my own inspired by Beads, not quite as you're describing, but I store todo's in a SQLite database (beads used SQLite AND git hooks, I didn't want to be married to git), and I let them sync to and from GitHub Issues, so in theory I can fork a GitHub repo, and have my tool pull down issues from the original repo (havent tried it when its a fork, so that's a new task for the task pile).
https://github.com/Giancarlos/guardrails/issues
You can see me dogfeeding my tool to my tools codebase and having my issues on the github for anyone to see, you can see the closed ones. I do think we will see an increase in local dev tooling that is tried and tested by its own creators, which will yield better purpose driven tooling that is generic enough to be useful to others.
I used to use Beads for all my Claude Code projects, now I just use GuardRails because it has safety nets and works without git which is what I wanted.
I could have forked Beads, but the other thing is Beads is a behemoth of code, it was much easier to start from nothing but a very detailed spec and Claude Code ;)
brandensilva | 4 hours ago
20022026 | 18 hours ago
rcarmo | 14 hours ago
TacticalCoder | 18 hours ago
ianlpaterson | 18 hours ago
[OP] kristianpaul | 18 hours ago
jasonjmcghee | 16 hours ago
Qwen3.5 released a couple of days ago but I'm not that RAM rich
breisa | 15 hours ago
jasonjmcghee | 6 hours ago
buremba | 16 hours ago
pi plugins support adding hooks at every stage, from tool calls to compaction and let you customize the TUI UI as well. I use it for my multi-tenant Openclaw alternative https://github.com/lobu-ai/lobu
If you're building an agent, please don't use proprietary SDKs from model providers. Just stick to ai-sdk or pi agent.
siva7 | 15 hours ago
bjackman | 14 hours ago
("Limited experimentation" = a few months ago I threw $10 into the Anthropic console and did a bit of vibe coding and found my $10 disappeared within a couple of hours).
If so, that would support your concern, it does kinda sound like they're selling marginal Claude Code / Gemini CLI tokens at a loss. Which definitely smells like an aggressive lockin strategy.
buremba | 14 hours ago
Anthropic docs is intentionally not clear about how 3P tools are defined, is it calling Claude app or the Anthropic API with the OAuth tokens?
vanillameow | 14 hours ago
buremba | 14 hours ago
vanillameow | 12 hours ago
buremba | 10 hours ago
badlogic | 11 hours ago
kzahel | 12 hours ago
https://yepanywhere.com/subscription-access-approaches/
Captures the ai-sdk and pi-mono.
In an ideal world we would have a pi-cli-mono or similar, like something that is not as powerful as pi but gives a least common denominator sort of interface to access at least claude/codex.
ACP is also something interesting in this space, though I don't honestly know how that fits into this story.
buremba | 9 hours ago
burgerquizz | 10 hours ago
Munksgaard | 9 hours ago
0: https://cchistory.mariozechner.at/
Majromax | 7 hours ago
reacharavindh | 14 hours ago
https://github.com/can1357/oh-my-pi
More of a batteries included version of pi.
self_awareness | 13 hours ago
reacharavindh | 10 hours ago
I am building my own assistant as an AI harness - that is definitely getting sandboxed to run only as a VM on my Mac.
mr_o47 | 12 hours ago
reacharavindh | 10 hours ago
I have this in my shell rc.
and do1. git pull origin main
2. bun install
3. bun run build:native
every time I pull changes from upstream.
Until yesterday, this process was purely bliss - my own minimal custom system prompt, minimal AGENTS.md, and self curated skills.md. One thing I was wary of switching from pi to oh-my-pi was the use of Rust tools pi-native using NAPI. The last couple of days whatever changes I pulled from upstream is causing the models to get confused about which tool to use and how while editing/patching files. They are getting extremely annoyed - I see 11 iterations of a tool call to edit a damn file and the model then resorted to rewriting the whole file from memory, and we al know how that goes. This may not be a bug in oh-my-pi per se. My guess is that the agent developed its memory based on prior usage of the tools and my updating oh-my-pi brought changes in their usage. It might be okay if I could lose all agent memory and begin again, but I dont want to.
I'm going to be more diligent about pulling upstream changes from now on, and only do when I can afford a full session memory wipe.
Otherwise, the integrations with exa for search, LSP servers on local machine, syntax highlighting, steering prompts, custom tools (using trafilatura to fetch contents of any url as markdown, use calculator instead of making LLM do arithmetic) etc work like a charm. I haven't used the IPython integration nor do I plan to.
thepasch | 13 hours ago
tietjens | 13 hours ago
raincole | 12 hours ago
[OP] kristianpaul | 5 hours ago
No plan mode. Write plans to files, or build it with extensions, or install a package. No built-in to-dos. Use a TODO.md file, or build your own with extensions. No background bash. Use tmux. Full observability, direct interaction.
This is very important to have control and ownership.
Pi is not for everyone, but the ones eventually want to have tools like (read, bash, edit, write, grep, find, ls) as building blocks.
mr_o47 | 12 hours ago
I really like the customization aspect of it and you can build tools on fly and even switch model mid session
There’s another project here called oh my pi has anyone here tried it
raffkede | 12 hours ago
vinibrito | 5 hours ago
sheerun | 11 hours ago
thomascountz | 11 hours ago
solarkraft | 11 hours ago
Pi appears to have a smaller, less “pre-made” ecosystem, but with more flexibility, enthusiasm and extensibility.
Is this correct? Should I look towards Pi over OpenCode? What are the UI options?
amunozo | 10 hours ago
solarkraft | 10 hours ago
I looked a bit into the reasoning for Pi’s design (https://mariozechner.at/posts/2025-11-30-pi-coding-agent/#to...) and, while it does seem to do a lot of things very well around extensibility, I do miss support for permissions, MCP and perhaps Todos and a server mode. OpenCode seems a lot more complete in that regard, but I can well imagine that people have adapted Pi for these use cases (OpenClaw seems to have all of these). So it’s definitely not out of the race yet, but I still appreciate OpenCodes relative seeming completeness in comparison.
miroljub | 9 hours ago
https://pi.dev/packages
> I do miss support for permissions,
As soon as your agent can write and execute code, your permissions are just a security theater. If you care, just do proper sandboxing. If not, there are extensions for that.
> MCP
Again, Pi is extensible.
pi install pi-mcp-adapter
Now, you can connect to any mcp.
> and perhaps Todos
At least 10 different todo extensions. Pick which one you like. If you don't like any of them, ask Pi to write one for you.
> and a server mode.
Pi has rpc mode, which is a kind of server. If that's not enough, you could extend it.
> OpenCode seems a lot more complete in that regard,
Yes, but good luck working with Opencode if you don't like their plan-mode. Or todo support. And MCP. You pay their cost in complexity and tokens even if you don't use them or you don't like how they work.
> but I can well imagine that people have adapted Pi for these use cases (OpenClaw seems to have all of these). So it’s definitely not out of the race yet, but I still appreciate OpenCodes relative seeming completeness in comparison.
There's also an oh-my-pi fork if you want an out-of-the-box experience. Still, in my experience, nothing beats Pi in terms of customizability. It's the first piece of software that I can easily make completely to my liking. And I say that as a decade old Emacs user.
amunozo | 6 hours ago
mritchie712 | 10 hours ago
mikodin | 4 hours ago
Honestly, it's been a dream, I have it running in a docker-sandbox with access to a single git repo (not hosted) that I am using for varied things with my business.
Try it out, it's super easy to setup. If you use docker sandbox, you can just follow what is necessary for claude, spin up the sandbox, exit out, exec into it with bash and switch to Pi.
carderne | 11 hours ago
If you want bags of features, rather clone oh-my-pi somewhere, and get your agent to bring in bits of it a time, checking, reviewing, customising as you go.
manojlds | 10 hours ago
_pdp_ | 10 hours ago
You can open a grid of windows inside vscode too and it comes back up exactly as it was on reload.
theshrike79 | 10 hours ago
Think of it more like directing a coworker or subcontractor via text chat. You tell them what you want and get a result, then you test it if it's what you want and give more instructions if needed.
I literally just fixed a maintenance program on my own server while working my $dayjob. ssh to server, start up claude and tell it what's wrong, tab away. Then I came back some time later, read what it had done, tested the script and immediately got a few improvement ideas. Gave them to Claude, tabbed out, etc.
Took me maybe 15 minutes of active work while chatting on Slack and managing my other tasks. I never needed to look at the code at any point. If it works and tests pass, why do I care what it looks like?
_pdp_ | 10 hours ago
In my own experience I cannot blindly accept code without even looking at it even for a few moments because I've had many situations where the code was simply doing the wrong things... including tests are completely wrong and testing the wrong assumptions.
So yah, even when I review trivial changes I still look at the diff view to see if it makes sense. And IDEs make code review a lot easier than diff.
Btw, this experience is not from lack of trying. We use coding agent extensively (I would assume more than the typical org looking at our bill) and while they are certainly very, very helpful and I cannot describe how much effort they are really saving us, there is absolutely zero chance of pushing something out without reviewing it first - same applies for code written by AI agent or a coworker.
chriswarbo | 5 hours ago
I agree. I tried Gemini CLI for a while, and didn't like how separate I felt from the underlying files: rather than doing minor cleanup myself, the activation energy of switching to a separate editor and opening the same files was too high, so I'd prompt the LLM to it instead. Which was often an exercise in frustration, as it would take many rounds of explanation for such tiny payoffs; maybe even fiddling with system prompts and markdown files, to try and avoid wasting so much time in the future...
I've been using Pi for a few weeks now, and have managed to integrate it quite deeply into Emacs. I run it entirely via RPC mode (JSON over stdio), so I don't really know (or care) about its terminal UI :-)
BeetleB | 5 hours ago
Because then you need to make an extension for every IDE. Isn't it better to make a CLI tool with a server, and let people make IDE extensions to communicate with it?
Claude Code has an update every few days. Imagine now propagating those changes to 20+ IDEs.
rubenflamshep | 3 hours ago
CLI within VSCode is workable but most of my VSCode envs are within a docker container. This is a pattern that I'm moving more and more away from as agents within a container kind of suck.
alabhyajindal | 10 hours ago
bankombinator | 7 hours ago
squeefers | 6 hours ago
jacobgorm | 6 hours ago
chriswarbo | 5 hours ago
It seems pretty well done: when I added permission requests to the `bash` tool, the "Are you sure y/N" requests started appearing just like they were native to Emacs.
nacozarina | 2 hours ago
Veen | 56 minutes ago
https://shittycodingagent.ai
twsted | an hour ago
It seems stange also that even Steinberger in his interviews is not giving pi the proper attribution.