At its core, libretto generates, validates, and helps with debugging RPA scripts. As far as I understand tools like playwright CLI are more focused on letting your agent use playwright to perform one-off automations.
The implementation is also pretty different:
- libretto gives your agent a single exec tool (instead of different tools for each action) so it can write arbitrary playwright/javascript and is more context efficient
- Also we gave libretto instructions on bot detection avoidance so that it will prefer using network requests for automation (something that other tools don’t support), but will fall back to playwright if it identifies network requests as too risky
playwright-cli is very simple and meant for humans - it basically generates a first draft of a script, and was originally meant for writing e2e tests. You need to do a lot of post-processing on it to get it to be a reliable automation.
libretto gives a similar ability for agents for building scripts but:
- agents automatically run, debug, and test the integrations they write
- they have a much better understanding of the semantics of the actions you take (vs. playwright auto-assuming based on where you clicked)
- they can parse network requests and use those to make direct API calls instead
there's fundamentally a mismatch where playwright-cli is for building e2e test scripts for your own app but libretto is for building robust web automations
Lol sorry for the misleading click. We named it libretto after the term in theater, inspired by Playwright. No retro gaming here, just browser automation!
Cool. Thank you for sharing. While AI tools are extremely powerful, packages like this help create some good standards and stepping stones for connectivity that the models haven’t gotten around to yet. Thanks again.
Thanks for this! We have clear answers for things that are 100% and 0% automated, but it’s always that 80%-99% automated slice where the frontier is, great idea.
script maintenance is exactly where that middle slice bites - the app keeps evolving and the scripts lag behind. we took the angle of having the agent re-explore from scratch each run with autonoma (https://github.com/autonoma-ai/autonoma) for e2e qa, no maintained scripts, adapts naturally - different goal than libretto but same core intuition
Right now libretto only captures HTTP requests, which the coding agent can use to determine how to perform the automation.
For more complex cases where libretto can't validate that the network approach would produce the right data (like sites that rely on WebSockets or heavy client-side logic) it falls back to using the DOM with playwright
It's a good callout. We have a BAA + ZDR with Anthropic and OpenAI, and if you want to use libretto for healthcare use cases having a BAA is essential. Was using Codex in the demo, and we've seen that both Claude and Codex work pretty well
I built something very similar for my company internally. The idea was that that the maintenance of the code is on the agent and the code is purely an optimization. If it breaks the agent runs it iteratively, fixes the code for next time. Happy to replace my tool with this and see how it does!
Super cool! Please let me know how it goes. Since agents are so good at writing code, we think letting the agent rewrite/test the code on failure is better than just using a prompt at runtime
I literally _just_ put up an announcement on our internal Slack of a tool I had spent a few weeks trying to get right. Strange to post the announcement and, literally the same day, see a better, publicly available toolkit to do enable that very workflow!
I'm also using Playwright, to automate a platform that has a maze of iframes, referer links, etc. Hopefully I can replace the internals with a script I get from this project.
There are a couple ways to handle JS components rendered at runtime:
- Libretto prefers network requests over DOM interaction when possible, so this will circumvent a lot of complex JS rendering issues
- When you do need the DOM, playwright can handle a lot of the complexity out of the box: playwright will re-query the live DOM at action time and automatically wait for elements to populate. Libretto is also set up to pick selectors like data-testid, aria-label, role, id over class names or positional stuff that's likely to be dynamic.
- At the end of the day the files still live as code so you could always just throw a browser agent at it to handle a part of a workflow if nothing else works
we started using stagehand initially! But it doesn't follow the same model of pre-generating deterministic code. Your code is meant to look like this:
// Let AI click
await stagehand.act("click on the comments link for the top story");
the issue with this is that there's now runtime non-determinism. We move the AI work during dev-time: AI explores and crawls the website first, and generates a deterministic legible script.
Tangentially, Stagehand's model may have worked 2 years ago when humans still wrote the code, but it's no longer the case. We want to empower agents to do the heavy lifting of building a browser automation for us but reap the benefits of running deterministic, fast, cheap, straightforward code.
Very interesting idea. Old school solutions but with new methods.
But maybe we can't make everything deterministic for complex cases, the scenarios that opened after LLM arrived into scene. Maybe we need a mix of both.
The interesting part to me is recovery after the first generated script goes stale. I’d be curious whether you measure success as 'initial generation works' or 'the same flow still passes after small DOM/layout changes a week later', since that seems like the boundary between a neat demo and something a team can rely on.
1. playwright-cli for exploration and ad-hoc scraping, in order to determine what works.
2. playwright code generation based on 1, which captures a repeatable workflow
3. agent skills - these can be playwright based, but in some cases if I can just rely on built-in tools like Web Search and Web Fetch, I will.
playwright is one of the unsung heroes of agentic workflows. I heavily rely on it. In addition to the obvious DOM inspection capabilities, the fact that the console and network can be inspected is a game changer for debugging. watching an agent get rapid feedback or do live TDD is one of the most satisfying things ever.
Browser automation and being able to record the graphics buffer as video, during a run, open up many possibilities.
Looks awesome, but I wonder if its functionality could be exposed to existing CLIs such as Claude Code instead of having to run it through its own CLI, mainly because I don't want to spend on credits when I've already got a CC subscription.
devstatic | 18 hours ago
tanishqkanc | 7 hours ago
messh | 18 hours ago
[OP] muchael | 17 hours ago
The implementation is also pretty different:
- libretto gives your agent a single exec tool (instead of different tools for each action) so it can write arbitrary playwright/javascript and is more context efficient
- Also we gave libretto instructions on bot detection avoidance so that it will prefer using network requests for automation (something that other tools don’t support), but will fall back to playwright if it identifies network requests as too risky
tanishqkanc | 7 hours ago
libretto gives a similar ability for agents for building scripts but:
- agents automatically run, debug, and test the integrations they write - they have a much better understanding of the semantics of the actions you take (vs. playwright auto-assuming based on where you clicked) - they can parse network requests and use those to make direct API calls instead
there's fundamentally a mismatch where playwright-cli is for building e2e test scripts for your own app but libretto is for building robust web automations
seagull | 18 hours ago
tanishqkanc | 7 hours ago
surgical_fire | 17 hours ago
[OP] muchael | 15 hours ago
dang | 14 hours ago
gbibas | 17 hours ago
tanishqkanc | 7 hours ago
arpadav | 17 hours ago
tanishqkanc | 7 hours ago
etwigg | 17 hours ago
canarias_mate | 15 hours ago
alexbike | 16 hours ago
[OP] muchael | 15 hours ago
For more complex cases where libretto can't validate that the network approach would produce the right data (like sites that rely on WebSockets or heavy client-side logic) it falls back to using the DOM with playwright
dang | 14 hours ago
daveguy | 16 hours ago
Edit: nevermind. I see from the website it is MIT. Probably should add a COPYING.md or LICENSE.md to the repository itself.
tanishqkanc | 7 hours ago
z3ugma | 16 hours ago
[OP] muchael | 15 hours ago
tanishqkanc | 8 hours ago
heyitsaamir | 15 hours ago
[OP] muchael | 15 hours ago
anthuswilliams | 14 hours ago
I'm also using Playwright, to automate a platform that has a maze of iframes, referer links, etc. Hopefully I can replace the internals with a script I get from this project.
[OP] muchael | 13 hours ago
boriskurikhin | 14 hours ago
[OP] muchael | 13 hours ago
- Libretto prefers network requests over DOM interaction when possible, so this will circumvent a lot of complex JS rendering issues
- When you do need the DOM, playwright can handle a lot of the complexity out of the box: playwright will re-query the live DOM at action time and automatically wait for elements to populate. Libretto is also set up to pick selectors like data-testid, aria-label, role, id over class names or positional stuff that's likely to be dynamic.
- At the end of the day the files still live as code so you could always just throw a browser agent at it to handle a part of a workflow if nothing else works
yehia2amer | 11 hours ago
tanishqkanc | 8 hours ago
// Let AI click await stagehand.act("click on the comments link for the top story");
the issue with this is that there's now runtime non-determinism. We move the AI work during dev-time: AI explores and crawls the website first, and generates a deterministic legible script.
Tangentially, Stagehand's model may have worked 2 years ago when humans still wrote the code, but it's no longer the case. We want to empower agents to do the heavy lifting of building a browser automation for us but reap the benefits of running deterministic, fast, cheap, straightforward code.
coderw | 4 hours ago
admiralrohan | 3 hours ago
voidUpdate | 3 hours ago
afro88 | 2 hours ago
potter098 | an hour ago
skapadia | 48 minutes ago
2. playwright code generation based on 1, which captures a repeatable workflow
3. agent skills - these can be playwright based, but in some cases if I can just rely on built-in tools like Web Search and Web Fetch, I will.
playwright is one of the unsung heroes of agentic workflows. I heavily rely on it. In addition to the obvious DOM inspection capabilities, the fact that the console and network can be inspected is a game changer for debugging. watching an agent get rapid feedback or do live TDD is one of the most satisfying things ever.
Browser automation and being able to record the graphics buffer as video, during a run, open up many possibilities.
arizen | 35 minutes ago
terabytest | 25 minutes ago