OP here - yes, this is my use case too: integration and regression testing, as well as providing learning environments. It makes working with larger datasets a breeze.
OP here - still have to try (generally operate on VM/bare metal level); but my understanding is that ioctl call would get passed to the underlying volume; i.e. you would have to mount volume
We do this, preview deploys, and migration dry runs using Neon Postgres’s branching functionality - seems one benefit of that vs this is that it works even with active connections which is good for doing these things on live databases.
Aurora clones are copy-on-write at the storage layer, which solves part of the problem, but RDS still provisions you a new cluster with its own endpoints, etc, which is slow ~10 mins, so not really practical for the integration testing use case.
For anyone looking for a simple GUI for local testing/development of Postgres based applications. I built a tool a few years ago that simplifies the process: https://github.com/BenjaminFaal/pgtt
Is this basically using templates as "snapshots", and making it easy to go back and forth between them? Little hard to tell from the README but something like that would be useful to me and my team: right now it's a pain to iterate on sql migrations, and I think this would help.
In theory, a database that uses immutable data structures (the hash array mapped trie popularized by Clojure) could allow instant clones on any filesystem, not just ZFS/XFS, and allow instant clones of any subset of the data, not just the entire db. I say "in theory" but I actually built this already so it's not just a theory. I never understood why there aren't more HAMT based databases.
This is typical HN: everyone is here. I've seen a number of threads that unflod like this: "Lately I hacked up a satellite link to..." → "As an engineer who built the comm equipment of that satellite,.." → "As the astronaut who launched the satellite from ICS,..", etc.
Does datomic have built in cloning functionality? I’ve been wanting to try datomic out but haven’t felt like putting in the work to make a real app lol
Surprisingly, no it does not. Datomic has a more limited feature that lets you make an in-memory clone of the latest copy of the db for speculative writes, which might be useful for tests, but you can't take an arbitrary version of the db with as-of and use it as the basis for a new version on disk. See: https://blog.danieljanus.pl/2025/04/22/datomic-forking-the-p...
There's nothing technically that should prevent this if they are using HAMTs underneath, so I'm guessing they just didn't care about the feature. With HAMT, cloning any part of the data structure, no matter how nested, is just a pointer copy. This is more useful than you'd think but hardly any database makes it possible.
Uff, I had no idea that Postgres v15 introduced WAL_LOG and changed the defaults from FILE_COPY. For (parallel CI) test envs, it make so much sense to switch back to the FILE_COPY strategy ... and I previously actually relied on that behavior.
As an aside, I just jumped around and read a few articles. This entire blog looks excellent. I’m going to have to spend some time reading it. I didn’t know about Postgres’s range types.
Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.
Not sure why this is downvoted. For a critical tool like DB cloning, I‘d very much appreciate if it was hand written. Simply because it means it’s also hand reviewed at least once (by definition).
We wouldn’t have called it reviewed in the old world, but in the AI coding world we’re now in it makes me realise that yes, it is a form of reviewing.
I use Claude a lot btw. But I wouldn’t trust it on mission critical stuff.
App migrations that may fail and need a rollback have the problem that you may not be allowed to wipe any transactions so you may want to be putting data to a parallel world that didn't migrate.
> App migrations that may fail and need a rollback have the problem that you may not be allowed to wipe any transactions so you may want to be putting data to a parallel world that didn't migrate.
This is why migrations are supposed to be backwards compatible
> Eh, DB branching is mostly only necessary for testing - locally
For local DB's, when I break them, I stop the Docker image and wipe the volume mounts, then restart + apply the "migrations" folder (minus whatever new broken migration caused the issue).
It's being downvoted because the commenter is asking for something that is already in the readme. Furthermore, it's ironic that the person raising such an issue is performing the same mistake as they are calling out - neglecting to read something they didn't write.
It‘s at the very bottom of the readme, below the MIT license mention. Yes, it’s there, but very much in the fineprint. I think the easier thing to spot is the CLAUDE.md in the code (and in particular how comprehensive it is).
Again, I love Claude, I use it a ton, but a topic like database cloning requires a certain rigour in my opinion. This repo does not seem to have it. If I had hired a consultant to build a tool like this and would receive this amount of vibe coding, I’d feel deceived. I wouldn’t trust it on my critical data.
And are we really doing this? Do we need to admit how every line of code was produced? Why? Are you expecting to see "built with the influence of Stackoverflow answers" or "google searches" on every single piece of software ever? It's an exercise of pointlessness.
I think you need to start with the following statement:
> We would like to acknowledge the open source people, who are the traditional custodians of this code. We pay our respects to the stack overflow elders, past, present, and future, who call this place, the code and libraries that $program sits upon, their work. We are proud to continue their tradition of coming together and growing as a community. We thank the search engine for their stewardship and support, and we look forward to strengthening our ties as we continue our relationship of mutual respect and understanding
Then if you would kindly say that a Brazilian invented the airplane that would be good too. If you don’t do this you should be cancelled for your heinous crime.
Indeed. There is a difference between "I have learnes by reading a lot of SO" and "I have copied the contents of this file verbatim from SO". Using Claude is very close to the latter without saying it.
It is the new normal, whether you are against it or not.
If someone used AI, it is a good discussion to see whether they should explicitly disclose it, but people have been using assisted tools, from auto-complete, text expanders, IDE refactoring tools, for a while - and you wouldn't make a comment that they didn't build it. The lines are becoming more blurry over time, but it is ridiculous to claim that someone didn't build something if they used AI tools.
Let's say there is an architect and he also owns a construction company. This architect, then designs a building and gets it built from of his employees and contractors.
In such cases the person says, I have built this building. People who found companies, say they have built companies. It's commonly accepted in our society.
So even if Claude built for it for GP, as long as GP designed it, paid for tools (Claude) to build it, also tested it to make sure that it works, I personally think, he has right to say he has built it.
If you don't like it, you are not required to use it.
What an outrageously bad analogy. Everyone involved in that building put their professional reputations and licenses on the line. If that building collapses, the people involved will lose their livelihoods and be held criminally liable.
Meanwhile this vibe coded nonsense is provided “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. We don’t even know if he read it before committing and pushing.
Even billion dollar software products have similar clauses, it doesn't have anything to do with vibe coding. To build and sell software no educational qualification is needed.
Quality of the software comes from testing. Humans and LLMs both make mistakes while coding.
As an autodidact, and someone who has seen plenty of well educated idiots in the software profession, I'm happy there are no such requirements... I think a guild might be more reasonable than a professional org more akin to how it works for other groups (lawyers, doctors, etc).
There are of course projects that operate at higher development specification standards, often in the military or banking. This should be extended to all vehicles and invasive medical devices.
Depends on the building type/size/scale and jurisdiction. Modern tract homes are really varied, hit or miss and often don't see any negative outcomes for the builders in question for shoddy craftsmanship.
Same with any OSS. Up to you to validate whether or not it is worth depending on, regardless of how built. Social proof is a primary avenue to that and has little to do with how built.
But here's the problem. Five years ago, when someone on here said, "I wrote this non-trivial software", the implication was that a highly motivated and competent software engineer put a lot of effort into making sure that the project meets a reasonable standard of quality and will probably put some effort into maintaining the project.
Today, it does not necessarily imply that. We just don't know.
Agree that just being hand-written doesn’t imply quality, but based on my priors, if something obviously looks like vibe-code it’s probably low quality.
Most of the vibe-code I’ve seen so far appears functional to the point that people will defend it, but if you take a closer look it’s a massively over complicated rat’s nest that would be difficult for a human to extend or maintain. Of course you could just use more AI, but that would only further amplify these problems.
If someone puts weeks and months of their time into building something, then I'm willing to take that as proof of their motivation to create something good.
I'm also willing to take the existence of non-trivial code that someone wrote manually as proof of some level of competence.
The presence of motivation + competence makes it more likely that the result could be something good.
Even with LLMs delivering software that consistently works requires quite a bit of work and in most cases requires certain level of expertise. Humans also write quite a bit of garbage code.
People using LLMs to code these days is similar to how majority people stopped using assembly and moved to C and C++, then to garbage collected languages and dynamically typed languages. People were always looking for ways to make programmers more productive.
Programming is evolving. LLMs are just next generation programming tools. They make programmers more productive and in majority of the cases people and companies are going to use them more and more.
I'm not opposed to AI generated code in principle.
I'm just saying that we don't know how much effort was put into making this and we don't know whether it works.
The existence of a repository containing hundereds of files, thousands of SLOCs and a folder full of tests tells us less today than it used to.
There's one thing in particular that I find quite astonishing sometimes. I don't know about this particular project, but some people use LLMs to generate both the implementation and the test cases.
What does that mean? The test cases are supposed to be the formal specification of our requirements. If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?
I fully agree with your overall message and sentiment. But let me be nit-picky for a moment.
> The test cases are supposed to be the formal specification of our requirements
Formal methods folks would strongly disagree with this statement. Tests are informal specifications in the sense that they don't provide a formal (mathematically rigorous) description of the full expected behavior of the system. Instead, they offer a mere glimpse into what we hope the system would do.
And that's an important part, which is where your main point stands. The test is what confirms that the thing the LLM built conforms to the cases the human expected to behave in a certain way. That's why the human needs to provide them.
(The human could take help of an LLM to write the tests, as in they give an even-more-informal natural language description of what the test should do. But the human then needs to make sure that the test really does that and maybe fill in some gaps.)
> If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?
You don’t. That’s the scary part. Up until now, this was somewhat solved by injecting artificial friction. A bank that takes 5 days for a payment to clear. And so on.
But it’s worse than this, because most problems software solves cannot even be understood until you partially solve the problem. It’s the trying and failing that reveals the gap, usually by someone who only recognizes the gap because they were once embarrassed by it, and what they hear rhymes with their pain. AI doesn’t interface with physical reality, as far as we know, or have any mechanism to course correct like embarrassment or pain.
In the future, we will have flown off the cliff before we even know there was a problem. We will be on a space ship going so fast that we can’t see the asteroid until it’s too la...
We know. It is not difficult to tell them apart. Good taste is apparent and beauty is universal.
The amount of care and attention someone put into a craft is universally appreciated.
Also, I am 100% confident this comment was the output of a human process.
We can tell.
There is something more. It is obvious for those that have a soul.
We know if we make the effort to find out. But what we really want to know is not whether AI was used in the process of writing the software. What we want to know is whether it's worth checking out. That's what has become harder to know.
the implication was that a highly motivated and competent software engineer put a lot of effort into making sure that the project meets a reasonable standard of quality and will probably put some effort into maintaining the project
That is entirely an assumption on the part of the reader. Nothing about someone saying "I built this complicated thing!" implies competence, or any desire to maintain it beyond building it.
The problem you're facing is survivorship bias. You can think of lots of examples of where that has happened, and very few where it hasn't, because when the author of the project is incompetent or unmotivated the project doesn't last long enough for you to hear about it twice.
>Nothing about someone saying "I built this complicated thing!" implies competence, or any desire to maintain it beyond building it.
I disagree. The fact that someone has written a substantial amount of non-trivial code does imply a higher level of competence and motivation compared to not having done that.
You never knew. There are plenty of intelligent, well-intentioned software engineers that publish FOSS that is buggy and doesn’t meet some arbitrary quality standards.
Every single commit is Claude.
No human expert involved.
Would you trust your company database to an 25 dollars vibe session?
Would you live in a 5 dollars building?
Is there any difference from hand tailored suit, constructed to your measurements, and a 5 dollars t-shirt?
Some people don't want to live in a five dollars world.
Yes but there’s no evidence this is vibe coded or not. You’re cynically claiming it due to agent authorship. As if there is no legitimate use.
> No human expert involved
You don’t know this, you are just hating.
Besides the close review and specification that may be conducted with agents, even if you handwrite / edit code, it will say that it was co-authored by the agent if you have the agent do the commit for you.
There was a recent wave of such comment on the rust subreddit - exactly in this shape "Oh you mean you built this with AI". This is highly toxic, lead to no discussion, and is literally drove by some dark thought from the commentator. I really hope HN will not jump on this bandwagon and will focus instead on creating cool stuff.
Everybody in the industry is vibecoding right now - the things that stick are due to sufficient quality being pushed on it. Having a pessimistic / judgmental surface reaction to everything as being "ai slop" is not something that I'm going to look forward in my behavior.
Why good faith is a requirement for commenting but not for submissions?
I would argue the good faith assumption should be disproportionately more important for submissions given the 1 to many relationship.
You're not lying, it indeed is toxic and rapidly spreading. I'm glad this is the case.
Most came here for the discussion and enlightenment to be bombarded by heavily biased, low effort marketing bullshit. Presenting something that has no value to anyone besides the builder is the opposite of good faith.
This submissions bury and neglect useful discussion, difficult to claim they are harmless and just not useful.
Not everyone in the industry is vibe coding, that is simply not true. but that's not the point I want to make.
You don't need to be defensive about your generative tools usage, it is ok to use whatever, nobody cares. Just be ready to maintain your position and defend your ideals. Nothing is more frustrating then giving honest attention to a problem, considering someone else perspective, to just then realize it was just words words words spewed by slop machine. Nobody would give a second thought if that was disclosed.
You are responsible for your craft. The moment you delegate that responsibility into the thrash you belong.
If the slop machine is so great, why in hell would I need you to ask it to help me? Nonsensical.
Your bias is that you think that because you can use a bike then my bike efforts are worthless. Considering that I often thrash out what I generate and I know I do not generate -> ship ; but have a quality process that validate my work by itself - the way I'm reaching my goals present no value to my public.
The reason this discussion is pathetic, is that it shifts the discussion from the main topic (here it was a database implementation) - to abide by a reactionary emotive emulation with no grace or eloquence - that is mostly driven by pop culture at this point with a justification mostly shaping your ego.
There is no point in putting yourself above someone else just to justify your behavior - in fact it only tells me what kind of person you were in the first place - and as I said, this is not the kind of attitude that i'm looking up to.
Justifiably, there is 0 correlation between something written manually and quality - in fact I argue it's quiet the opposite since you were unable to process as much play and architecture to try & break, you have spent less time experimenting, and more time pushing your ego.
Do you take issue with companies stating that they (the company) built something, instead of stating that their employees built something? Should the architects and senior developers disclaim any credit, because the majority of tickets were completed by junior and mid-level developers?
Do you take issue with a CNC machinist stating that they made something, rather than stating that they did the CAD and CAM work but that it was the CNC machine that made the part?
Non-zero delegation doesn’t mean that the person(s) doing the delegating have put zero effort into making something, so I don’t think that delegation makes it dishonest to say that you made something. But perhaps you disagree. Or, maybe you think the use of AI means that the person using AI isn’t putting any constructive effort into what was made — but then I’d say that you’re likely way overestimating the ability of LLMs.
Could we please avoid the strawmen? Nowhere have I claimed that they didn't put work into this. Nowhere did I say that delegation is bad. I'd like to encourage a discussion, but then please counter the opinion that I gave, not a made-up one that I neither stated nor actually hold.
We all agree that crafting the right prompts (or however we call the CLAUDE.md instructions) is a lot of work, don't we? Of course they put work into this, it's a file of substantial size. And then Claude used it to build the thing. Where is a contradiction? I don't see the mental gymnastics, sorry.
I think they may be jumping on the "shit on AI assisted project" bandwagon. I am by no means reaching for ai tools at every turn, but to suggest its plagiarized is laughable.
Despite all of the complaints in other comments about the use of Claude Code, it looks interesting and I appreciated the video demo you put on the GitHub page.
Agentic coding detractors: "If AI is so great, where all the thriving new open source projects to prove it?"
Also agentic coding detractors: "How dare you use AI to help build a new open source project."
I'm joking and haven't read the comments you're referring to, but whether or not AI was involved is irrelevant per se. If anyone finds themselves having a gut reaction to "AI", just mentally replace it with "an intern" or "a guy from Fiverr". Either way, the buck stops with whomever is taking ownership of the project.
If the code/architecture is buggy or unsafe, call that out. If there's a specific reason to believe no one with sufficient expertise reviewed and signed off on the implementation, call that out. Otherwise, why complain that someone donated their time and expertise to give you something useful for free?
For real. For someone to even understand why this tool is useful and functions as intended, they need to have some deeper understanding of software development. Who cares if the implementation was done with AI. With Claude Code, I rarely write code by hand these days, yet my brain hurts more than ever from all the actual problem solving I’m able to drill into with all the programming cruft out of the way. I did it by hand for 15 years, and I don’t feel bad at all for handing that part over.
A decade ago, a senior staff engineer at Google told me that he doesn't mind delegating the data-entry parts of his job to junior SWEs, so he can focus on higher-level problem solving.
This is how I've been treating AI, except instead of assuming your junior SWE is generally sane and has some understand of what you're doing, you have to make sure you double check everything.
> If anyone finds themselves having a gut reaction to "AI", just mentally replace it with "an intern" or "a guy from Fiverr"
It’s not the guy from Fiverr anyone is annoyed with. It’s the tech CEOs who beat everyone over the head with:
- ”the future will be a-guy-from-Fiverr-native”
- ”we are mandating that 80% of our employees incorporate a-guy-from-Fiverr into their daily workflow by year end”
And everyone pretends this is serious.
Then there are people who are pulling off cool demo stunts that amount to duct taping fireworks to a lawn mower but they post about it on X doing their best Steve Jobs thought leader impersonation.
And again everyone pretends like this is serious.
The annoyance is like that friend you tell about this great new song, and they’re excited, but only because it’s something they can to tell other people and look cool. Not because they’re into music.
I mean, if the end result is that I get a bunch of guys from Fiverr constantly at my beck and call for pennies on the dollar, I'm not sure why I should care what some CEO thinks they have to say to make money.
(Regarding mandates, of course they're a hamfisted solution, but it's not totally unreasonable that management would attempt to establish an incentive for its workforce to learn and put into practice a valuable new skill.)
Either way, that doesn't address the response to this project. Johan isn't Sam Altman. All Johan is guilty of here is building something useful and giving it to the rest of us for free.
thanks for sharing its interesting approach. I am not sure why people are complaining most of the software is written with the help of agents these days.
It’s rampant. Launch anything these days and it’s bombarded with “vibe-coded” comments.
The issue of quality makes sense since it’s so easy to build these days, but when the product is open-source, these vibe coded comments make no sense. Users can literally go read the code or my favorite? Repomix it, pop it into AI Studio, and ask Gemini what this person has built, what value it brings, and does it solve the problem I have?
For vibe coded proprietary apps, you can’t do that so the comments are sort of justified.
This is really cool, looking forward to trying it out.
Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.
Assuming I'd like to replicate my production database for either staging, or to test migrations, etc,
and that most of my data is either:
- business entities (users, projects, etc)
- and "event data" (sent by devices, etc)
where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")
what would be the best strategy to create a kind of "staging clone"? ideally I'd like to tell the database (logically, without locking it expressly): do as though my next operations only apply to items created/updated BEFORE "currentTimestamp", and then:
- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy)
- copy a subset of my event data (same constraint)
pg_dump has a few annoyances when it comes to doing stuff like this — tricky to select exactly the data/columns you want, and also the dumped format is not always stable. My migration tool pgmigrate has an experimental `pgmigrate dump` subcommand for doing things like this, might be useful to you or OP maybe even just as a reference. The docs are incomplete since this feature is still experimental, file an issue if you have any questions or trouble
Indeed, but is there a way to do it as a "point in time", eg do a "virtual checkpoint" at a timestamp, and do all the copy operations from that timestamp, so they are coherent?
Is anyone aware of something like this for MariaDB?
Something we've been trying to solve for a long time is having instant DB resets between acceptance tests (in CI or locally) back to our known fixture state, but right now it takes decently long (like half a second to a couple seconds, I haven't benchmarked it in a while) and that's by far the slowest thing in our tests.
I just want fast snapshotted resets/rewinds to a known DB state, but I need to be using MariaDB since it's what we use in production, we can't switch DB tech at this stage of the project, even though Postgres' grass looks greener.
Restarting the DB is unfortunately way too slow. We run the DB in a docker container with a tmpfs (in-memory) volume which helps a lot with speed, but the problem is still the raw compute needed to wipe the tables and re-fill them with the fixtures every time.
But how does the reset happen fast, the problem isn't with preventing permanent writes or w/e, it's with actually resetting for the next test. Also using overlayfs will immediately be slower at runtime than tmpfs which we're already doing.
Yeah unfortunately I think that it's not really possible to hit the speed of a TEMPLATE copy with MariaDB. @EvanElias (maintainer of https://github.com/skeema/skeema about this) was looking into it at one point, might consider reaching out to him — he's the foremost mysql expert that I know.
There's actually a potential solution here, but I haven't personally tested it: transportable tablespaces in either MySQL [1] or MariaDB [2].
The basic idea is it allows you to take pre-existing table data files from the filesystem and use them directly for a table's data. So with a bit of custom automation, you could have a setup where you have pre-exported fixture table data files, which you then make a copy of at the filesystem level, and then import as tablespaces before running each test. So a key step is making that fs copy fast, either by having it be in-memory (tmpfs) or by using a copy-on-write filesystem.
If you have a lot of tables then this might not be much faster than the 0.5-2s performance cited above though. iirc there have been some edge cases and bugs relating to the transportable tablespace feature over the years as well, but I'm not really up to speed on the status of that in recent MySQL or MariaDB.
Resetting is free if you discard the overlayfs writes, no? I am not sure if one can discard at runtime, or if the next test should be run in a new container. But that should still be fast.
If your db is small enough to fit in tmpfs, than sure, that is hard to beat. But then xfs and zfs are overkill too.
EDIT: I see you mentioning that starting the db is slow due to wiping and filling at runtime. But the idea of a snapshot is that you don't have to do that, unless I misunderstand you.
LVM snapshots work well. Used it for years with other database tools.. But make sure you allocate enough write space for the COW.. when the write space fills up, LVM just 'drops' the snapshot.
I was able to accomplish this by doing each test within its own transaction session that gets rolled-back after each test. This way I'm allowed to modify the database to suit my needs for each test, then it gets magically reset back to its known state for the next test. Transaction rollbacks are very quick.
Unfortunately a lot of our tests use transactions themselves because we lock the user row when we do anything to ensure consistency, and I'm pretty sure nested transactions are still not a thing.
Really interesting article, I didn't know that the template cloning strategy was configurable. Huge fan of template cloning in general; I've used Neon to do it for "live" integration environments, and I have a golang project https://github.com/peterldowns/pgtestdb that uses templates to give you ~unit-test-speed integration tests that each get their own fully-schema-migrated Postgres database.
Back in the day (2013?) I worked at a startup where the resident Linux guru had set up "instant" staging environment databases with btrfs. Really cool to see the same idea show up over and over with slightly different implementations. Speed and ease of cloning/testing is a real advantage for Postgres and Sqlite, I wish it were possible to do similar things with Clickhouse, Mysql, etc.
Once upon a time, MySQL/InnoDB was a better performance choice for UPDATE-heavy workloads. There was a somewhat famous blog post about this from Uber[1]. I'm not sure to what extent this persists today. The other big competitor is sqlite3, which fills a totally different niche for running databases on the edge and in-product.
Personally, I wouldn't use any SQL DB other that PostgreSQL for the typical "database in the cloud" use case, but I have years of experience both developing for and administering production PostgreSQL DBs, going back to 9.5 days at least. It has its warts, but I've grown to trust and understand it.
Any non-trivial amount of data and you’ll run into non-trivial problems.
For example, some of our pg databases got into such state, that we had to write custom migration tool because we couldn’t copy data to new instance using standard tools. We had to re-write schema to using custom partitions because perf on built-in partitioning degrades as number of partitions gets high, and so on.
To be fair, postgres still suffers from a poor choice of MVCC implementation (copy on write rather than an undo log). This one small choice has a huge number of negative knock on effects once your load becomes non-trivial
I set this up for my employer many years ago when they migrated to RDS. We kept bumping into issues on production migrations that would wreck things. I decided to do something about it.
The steps were basically:
1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.
2. Get the arn and from that the cname or public IP.
3. Plug that into the DB connection in your app
4. Run the migration on pseudo prod.
This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.
Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!
I love those "migration only fails in prod because of data quirks" bugs. They are the freaking worst. Have called off releases in the past because of it.
we just build the database, commit it to a container (without volumes attached), and programmatically stop and restart the container per test class (testcontainers.org). the overhead is < 5 seconds and our application recovers to the reset database state seamlessly. it's been awesome.
mvcosta91 | a day ago
[OP] radimm | a day ago
febed | 15 hours ago
drakyoko | a day ago
[OP] radimm | a day ago
odie5533 | 22 hours ago
presentation | a day ago
1f97 | a day ago
horse666 | a day ago
nateroling | a day ago
TimH | a day ago
1a527dd5 | a day ago
BenjaminFaal | a day ago
okigan | a day ago
Also docker link seems to be broken.
BenjaminFaal | a day ago
peterldowns | a day ago
BenjaminFaal | a day ago
radarroark | a day ago
zX41ZdbW | a day ago
ozgrakkurt | a day ago
orthecreedence | a day ago
nine_k | a day ago
chamomeal | a day ago
radarroark | a day ago
There's nothing technically that should prevent this if they are using HAMTs underneath, so I'm guessing they just didn't care about the feature. With HAMT, cloning any part of the data structure, no matter how nested, is just a pointer copy. This is more useful than you'd think but hardly any database makes it possible.
majodev | a day ago
Raised an issue in my previous pet project for doing concurrent integration tests with real PostgreSQL DBs (https://github.com/allaboutapps/integresql) as well.
christophilus | a day ago
pak9rabid | a day ago
zachrip | a day ago
christophilus | a day ago
elitan | a day ago
Works with any PG version today. Each branch is a fully isolated PostgreSQL container with its own port. ~2-5 seconds for a 100GB database.
https://github.com/elitan/velo
Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.
teiferer | a day ago
Mind you, I'm not saying it's bad per se. But shouldn't we be open and honest about this?
I wonder if this is the new normal. Somebody says "I built Xyz" but then you realize it's vibe coded.
earthnail | a day ago
We wouldn’t have called it reviewed in the old world, but in the AI coding world we’re now in it makes me realise that yes, it is a form of reviewing.
I use Claude a lot btw. But I wouldn’t trust it on mission critical stuff.
ffsm8 | a day ago
Or at least I cannot come up with a usecase for prod.
From that perspective, it feels like it'd be a perfect usecase to embrace the LLM guided development jank
notKilgoreTrout | a day ago
App migrations that may fail and need a rollback have the problem that you may not be allowed to wipe any transactions so you may want to be putting data to a parallel world that didn't migrate.
parthdesai | a day ago
This is why migrations are supposed to be backwards compatible
notKilgoreTrout | a day ago
You can certainly bet you followed that advice correctly, now what are the odds you could test a what-if like that in sufficient depth?
gavinray | a day ago
dpedu | a day ago
earthnail | a day ago
Again, I love Claude, I use it a ton, but a topic like database cloning requires a certain rigour in my opinion. This repo does not seem to have it. If I had hired a consultant to build a tool like this and would receive this amount of vibe coding, I’d feel deceived. I wouldn’t trust it on my critical data.
rat9988 | a day ago
This is where it belongs, at best. He doesn't even have to disclose it. Prompting so that the ai writes the code faster than you is okay.
renewiltord | a day ago
dpedu | a day ago
https://github.com/elitan/velo/blame/12712e26b18d0935bfb6c6e...
And are we really doing this? Do we need to admit how every line of code was produced? Why? Are you expecting to see "built with the influence of Stackoverflow answers" or "google searches" on every single piece of software ever? It's an exercise of pointlessness.
renewiltord | a day ago
> We would like to acknowledge the open source people, who are the traditional custodians of this code. We pay our respects to the stack overflow elders, past, present, and future, who call this place, the code and libraries that $program sits upon, their work. We are proud to continue their tradition of coming together and growing as a community. We thank the search engine for their stewardship and support, and we look forward to strengthening our ties as we continue our relationship of mutual respect and understanding
Then if you would kindly say that a Brazilian invented the airplane that would be good too. If you don’t do this you should be cancelled for your heinous crime.
stronglikedan | a day ago
lol, good one!
hu3 | a day ago
last I checked, Wright brothers used a catapult while Santos-Dumont made a plane that took off by itself.
pbh101 | a day ago
teiferer | a day ago
elAhmo | a day ago
If someone used AI, it is a good discussion to see whether they should explicitly disclose it, but people have been using assisted tools, from auto-complete, text expanders, IDE refactoring tools, for a while - and you wouldn't make a comment that they didn't build it. The lines are becoming more blurry over time, but it is ridiculous to claim that someone didn't build something if they used AI tools.
pritambarhate | a day ago
In such cases the person says, I have built this building. People who found companies, say they have built companies. It's commonly accepted in our society.
So even if Claude built for it for GP, as long as GP designed it, paid for tools (Claude) to build it, also tested it to make sure that it works, I personally think, he has right to say he has built it.
If you don't like it, you are not required to use it.
greatgib | a day ago
pebble | a day ago
foobarbecue | a day ago
testdelacc1 | a day ago
Meanwhile this vibe coded nonsense is provided “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. We don’t even know if he read it before committing and pushing.
pritambarhate | a day ago
Quality of the software comes from testing. Humans and LLMs both make mistakes while coding.
tracker1 | a day ago
There are of course projects that operate at higher development specification standards, often in the military or banking. This should be extended to all vehicles and invasive medical devices.
tracker1 | a day ago
pbh101 | a day ago
rootnod3 | a day ago
fauigerzigerk | a day ago
But here's the problem. Five years ago, when someone on here said, "I wrote this non-trivial software", the implication was that a highly motivated and competent software engineer put a lot of effort into making sure that the project meets a reasonable standard of quality and will probably put some effort into maintaining the project.
Today, it does not necessarily imply that. We just don't know.
wahnfrieden | a day ago
foltik | a day ago
Most of the vibe-code I’ve seen so far appears functional to the point that people will defend it, but if you take a closer look it’s a massively over complicated rat’s nest that would be difficult for a human to extend or maintain. Of course you could just use more AI, but that would only further amplify these problems.
fauigerzigerk | a day ago
If someone puts weeks and months of their time into building something, then I'm willing to take that as proof of their motivation to create something good.
I'm also willing to take the existence of non-trivial code that someone wrote manually as proof of some level of competence.
The presence of motivation + competence makes it more likely that the result could be something good.
dabber | a day ago
fauigerzigerk | a day ago
pritambarhate | a day ago
People using LLMs to code these days is similar to how majority people stopped using assembly and moved to C and C++, then to garbage collected languages and dynamically typed languages. People were always looking for ways to make programmers more productive.
Programming is evolving. LLMs are just next generation programming tools. They make programmers more productive and in majority of the cases people and companies are going to use them more and more.
fauigerzigerk | a day ago
I'm just saying that we don't know how much effort was put into making this and we don't know whether it works.
The existence of a repository containing hundereds of files, thousands of SLOCs and a folder full of tests tells us less today than it used to.
There's one thing in particular that I find quite astonishing sometimes. I don't know about this particular project, but some people use LLMs to generate both the implementation and the test cases.
What does that mean? The test cases are supposed to be the formal specification of our requirements. If we do not specify formally what we expect a tool to do, how do we know whether the tool has done what we expected, including in edge cases?
teiferer | a day ago
> The test cases are supposed to be the formal specification of our requirements
Formal methods folks would strongly disagree with this statement. Tests are informal specifications in the sense that they don't provide a formal (mathematically rigorous) description of the full expected behavior of the system. Instead, they offer a mere glimpse into what we hope the system would do.
And that's an important part, which is where your main point stands. The test is what confirms that the thing the LLM built conforms to the cases the human expected to behave in a certain way. That's why the human needs to provide them.
(The human could take help of an LLM to write the tests, as in they give an even-more-informal natural language description of what the test should do. But the human then needs to make sure that the test really does that and maybe fill in some gaps.)
halfcat | 15 hours ago
You don’t. That’s the scary part. Up until now, this was somewhat solved by injecting artificial friction. A bank that takes 5 days for a payment to clear. And so on.
But it’s worse than this, because most problems software solves cannot even be understood until you partially solve the problem. It’s the trying and failing that reveals the gap, usually by someone who only recognizes the gap because they were once embarrassed by it, and what they hear rhymes with their pain. AI doesn’t interface with physical reality, as far as we know, or have any mechanism to course correct like embarrassment or pain.
In the future, we will have flown off the cliff before we even know there was a problem. We will be on a space ship going so fast that we can’t see the asteroid until it’s too la...
heliumtera | a day ago
fauigerzigerk | a day ago
pbh101 | a day ago
onion2k | a day ago
That is entirely an assumption on the part of the reader. Nothing about someone saying "I built this complicated thing!" implies competence, or any desire to maintain it beyond building it.
The problem you're facing is survivorship bias. You can think of lots of examples of where that has happened, and very few where it hasn't, because when the author of the project is incompetent or unmotivated the project doesn't last long enough for you to hear about it twice.
fauigerzigerk | a day ago
I disagree. The fact that someone has written a substantial amount of non-trivial code does imply a higher level of competence and motivation compared to not having done that.
dirtbag__dad | a day ago
You never knew. There are plenty of intelligent, well-intentioned software engineers that publish FOSS that is buggy and doesn’t meet some arbitrary quality standards.
happymellon | a day ago
risyachka | a day ago
When you order a website on upwork - you didn't build it. You bought it.
victorbjorklund | a day ago
heliumtera | a day ago
wahnfrieden | a day ago
heliumtera | a day ago
wahnfrieden | a day ago
> No human expert involved
You don’t know this, you are just hating.
Besides the close review and specification that may be conducted with agents, even if you handwrite / edit code, it will say that it was co-authored by the agent if you have the agent do the commit for you.
pbh101 | a day ago
6r17 | a day ago
Everybody in the industry is vibecoding right now - the things that stick are due to sufficient quality being pushed on it. Having a pessimistic / judgmental surface reaction to everything as being "ai slop" is not something that I'm going to look forward in my behavior.
heliumtera | a day ago
Why good faith is a requirement for commenting but not for submissions? I would argue the good faith assumption should be disproportionately more important for submissions given the 1 to many relationship. You're not lying, it indeed is toxic and rapidly spreading. I'm glad this is the case.
Most came here for the discussion and enlightenment to be bombarded by heavily biased, low effort marketing bullshit. Presenting something that has no value to anyone besides the builder is the opposite of good faith. This submissions bury and neglect useful discussion, difficult to claim they are harmless and just not useful.
Not everyone in the industry is vibe coding, that is simply not true. but that's not the point I want to make. You don't need to be defensive about your generative tools usage, it is ok to use whatever, nobody cares. Just be ready to maintain your position and defend your ideals. Nothing is more frustrating then giving honest attention to a problem, considering someone else perspective, to just then realize it was just words words words spewed by slop machine. Nobody would give a second thought if that was disclosed. You are responsible for your craft. The moment you delegate that responsibility into the thrash you belong. If the slop machine is so great, why in hell would I need you to ask it to help me? Nonsensical.
6r17 | a day ago
The reason this discussion is pathetic, is that it shifts the discussion from the main topic (here it was a database implementation) - to abide by a reactionary emotive emulation with no grace or eloquence - that is mostly driven by pop culture at this point with a justification mostly shaping your ego.
There is no point in putting yourself above someone else just to justify your behavior - in fact it only tells me what kind of person you were in the first place - and as I said, this is not the kind of attitude that i'm looking up to.
rileymichael | a day ago
no ‘everybody’ is not. a lot of us are using zero LLMs and continuing to build (quality) software just fine
6r17 | a day ago
cstrahan | a day ago
Do you take issue with a CNC machinist stating that they made something, rather than stating that they did the CAD and CAM work but that it was the CNC machine that made the part?
Non-zero delegation doesn’t mean that the person(s) doing the delegating have put zero effort into making something, so I don’t think that delegation makes it dishonest to say that you made something. But perhaps you disagree. Or, maybe you think the use of AI means that the person using AI isn’t putting any constructive effort into what was made — but then I’d say that you’re likely way overestimating the ability of LLMs.
teiferer | a day ago
eudoxus | 22 hours ago
> Nowhere have I claimed that they didn't put work into this.
There's some mental gymnastics.
> please counter the opinion that I gave
The reply your responding to did exactly that, and you just gave more snarky responses.
teiferer | 12 hours ago
taude | 23 hours ago
whalesalad | a day ago
elitan | 22 hours ago
Rovanion | a day ago
anonymars | a day ago
wahnfrieden | a day ago
elitan | 22 hours ago
eudoxus | 22 hours ago
Don't worry about these trolls.
72deluxe | a day ago
buu700 | a day ago
Also agentic coding detractors: "How dare you use AI to help build a new open source project."
I'm joking and haven't read the comments you're referring to, but whether or not AI was involved is irrelevant per se. If anyone finds themselves having a gut reaction to "AI", just mentally replace it with "an intern" or "a guy from Fiverr". Either way, the buck stops with whomever is taking ownership of the project.
If the code/architecture is buggy or unsafe, call that out. If there's a specific reason to believe no one with sufficient expertise reviewed and signed off on the implementation, call that out. Otherwise, why complain that someone donated their time and expertise to give you something useful for free?
bicx | 21 hours ago
QuercusMax | 21 hours ago
This is how I've been treating AI, except instead of assuming your junior SWE is generally sane and has some understand of what you're doing, you have to make sure you double check everything.
halfcat | 16 hours ago
It’s not the guy from Fiverr anyone is annoyed with. It’s the tech CEOs who beat everyone over the head with:
- ”the future will be a-guy-from-Fiverr-native”
- ”we are mandating that 80% of our employees incorporate a-guy-from-Fiverr into their daily workflow by year end”
And everyone pretends this is serious.
Then there are people who are pulling off cool demo stunts that amount to duct taping fireworks to a lawn mower but they post about it on X doing their best Steve Jobs thought leader impersonation.
And again everyone pretends like this is serious.
The annoyance is like that friend you tell about this great new song, and they’re excited, but only because it’s something they can to tell other people and look cool. Not because they’re into music.
buu700 | 16 hours ago
(Regarding mandates, of course they're a hamfisted solution, but it's not totally unreasonable that management would attempt to establish an incentive for its workforce to learn and put into practice a valuable new skill.)
Either way, that doesn't address the response to this project. Johan isn't Sam Altman. All Johan is guilty of here is building something useful and giving it to the rest of us for free.
newusertoday | a day ago
theturtletalks | a day ago
The issue of quality makes sense since it’s so easy to build these days, but when the product is open-source, these vibe coded comments make no sense. Users can literally go read the code or my favorite? Repomix it, pop it into AI Studio, and ask Gemini what this person has built, what value it brings, and does it solve the problem I have?
For vibe coded proprietary apps, you can’t do that so the comments are sort of justified.
anonzzzies | 15 hours ago
horse666 | a day ago
Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.
oulipo2 | a day ago
and that most of my data is either:
- business entities (users, projects, etc)
- and "event data" (sent by devices, etc)
where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")
what would be the best strategy to create a kind of "staging clone"? ideally I'd like to tell the database (logically, without locking it expressly): do as though my next operations only apply to items created/updated BEFORE "currentTimestamp", and then:
- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy) - copy a subset of my event data (same constraint)
what's the best way to do this?
gavinray | a day ago
Something like:
https://www.postgresql.org/docs/current/sql-copy.htmlIt'd be really nice if pg_dump had a "data sample"/"data subset" option but unfortunately nothing like that is built in that I know of.
peterldowns | a day ago
https://github.com/peterldowns/pgmigrate
oulipo2 | 22 hours ago
francislavoie | a day ago
Something we've been trying to solve for a long time is having instant DB resets between acceptance tests (in CI or locally) back to our known fixture state, but right now it takes decently long (like half a second to a couple seconds, I haven't benchmarked it in a while) and that's by far the slowest thing in our tests.
I just want fast snapshotted resets/rewinds to a known DB state, but I need to be using MariaDB since it's what we use in production, we can't switch DB tech at this stage of the project, even though Postgres' grass looks greener.
proaralyst | a day ago
francislavoie | a day ago
renewiltord | a day ago
1. Have a local data dir with initial state
2. Create an overlayfs with a temporary directory
3. Launch your job in your docker container with the overlayfs bind mount as your data directory
4. That’s it. Writes go to the overlay and the base directory is untouched
francislavoie | a day ago
peterldowns | a day ago
evanelias | a day ago
There's actually a potential solution here, but I haven't personally tested it: transportable tablespaces in either MySQL [1] or MariaDB [2].
The basic idea is it allows you to take pre-existing table data files from the filesystem and use them directly for a table's data. So with a bit of custom automation, you could have a setup where you have pre-exported fixture table data files, which you then make a copy of at the filesystem level, and then import as tablespaces before running each test. So a key step is making that fs copy fast, either by having it be in-memory (tmpfs) or by using a copy-on-write filesystem.
If you have a lot of tables then this might not be much faster than the 0.5-2s performance cited above though. iirc there have been some edge cases and bugs relating to the transportable tablespace feature over the years as well, but I'm not really up to speed on the status of that in recent MySQL or MariaDB.
[1] https://dev.mysql.com/doc/refman/8.0/en/innodb-table-import....
[2] https://mariadb.com/docs/server/server-usage/storage-engines...
exceptione | a day ago
If your db is small enough to fit in tmpfs, than sure, that is hard to beat. But then xfs and zfs are overkill too.
EDIT: I see you mentioning that starting the db is slow due to wiping and filling at runtime. But the idea of a snapshot is that you don't have to do that, unless I misunderstand you.
renewiltord | 22 hours ago
ikatson | a day ago
Then spin up the dB using that image instead of an empty one for every test run.
This implies starting the DB through docker is faster than what you're doing now of course.
francislavoie | a day ago
briffle | a day ago
pak9rabid | a day ago
hu3 | a day ago
The only detail is that autoincrements (SEQUENCEs for PotgreSQL folks) gets bumped even if the transaction rollsback.
So tables tend to get large ids quickly. But it's just dev database so no problem.
fanf2 | a day ago
francislavoie | a day ago
hu3 | a day ago
peterldowns | a day ago
Back in the day (2013?) I worked at a startup where the resident Linux guru had set up "instant" staging environment databases with btrfs. Really cool to see the same idea show up over and over with slightly different implementations. Speed and ease of cloning/testing is a real advantage for Postgres and Sqlite, I wish it were possible to do similar things with Clickhouse, Mysql, etc.
riskable | a day ago
I'm wondering why anyone would want to use anything else at this point (for SQL).
wahnfrieden | a day ago
efxhoy | a day ago
scottyah | a day ago
aftbit | a day ago
Personally, I wouldn't use any SQL DB other that PostgreSQL for the typical "database in the cloud" use case, but I have years of experience both developing for and administering production PostgreSQL DBs, going back to 9.5 days at least. It has its warts, but I've grown to trust and understand it.
1: https://www.uber.com/blog/postgres-to-mysql-migration/
vl | a day ago
Any non-trivial amount of data and you’ll run into non-trivial problems.
For example, some of our pg databases got into such state, that we had to write custom migration tool because we couldn’t copy data to new instance using standard tools. We had to re-write schema to using custom partitions because perf on built-in partitioning degrades as number of partitions gets high, and so on.
nine_k | a day ago
* MySQL has a much easier story of master-master replication.
* Mongo has a much easier story of geographic distribution and sharding. (I know that Citus exists, and has used it.)
* No matter how you tune Postgres, columnar databases like Clickhouse are still faster for analytics / time series.
* Write-heavy applications still may benefit from something like Cassandra, or more modern solutions in this space.
(I bet Oracle has something to offer in the department of cluster performance, too, but I did not check it out for a long time.)
pstuart | 19 hours ago
hu3 | a day ago
bddicken | a day ago
https://www.neki.dev
hu3 | 23 hours ago
It will take years to call this mature. Certainly not "soon"
oblio | 23 hours ago
dangoodmanUT | 23 hours ago
turtles3 | 21 hours ago
sheepscreek | a day ago
The steps were basically:
1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.
2. Get the arn and from that the cname or public IP.
3. Plug that into the DB connection in your app
4. Run the migration on pseudo prod.
This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.
Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!
Tostino | 22 hours ago
QuercusMax | 21 hours ago
leetrout | 18 hours ago
https://www.honeycomb.io/blog/testing-in-production
tehlike | a day ago
hmokiguess | a day ago
wayeq | a day ago
eatsyourtacos | 23 hours ago