Works with any PG version today. Each branch is a fully isolated PostgreSQL container with its own port. ~2-5 seconds for a 100GB database.
https://github.com/elitan/velo
Main difference from PG18's approach: you get complete server isolation (useful for testing migrations, different PG configs, etc.) rather than databases sharing one instance.
The steps were basically:
1. Clone the AWS RDS db - or spin up a new instance from a fresh backup.
2. Get the arn and from that the cname or public IP.
3. Plug that into the DB connection in your app
4. Run the migration on pseudo prod.
This helped up catch many bugs that were specific to production db or data quirks and would never haven been caught locally or even in CI.
Then I created a simple ruby script to automate the above and threw it into our integrity checks before any deployment. Last I heard they were still using that script I wrote in 2016!
Back in the day (2013?) I worked at a startup where the resident Linux guru had set up "instant" staging environment databases with btrfs. Really cool to see the same idea show up over and over with slightly different implementations. Speed and ease of cloning/testing is a real advantage for Postgres and Sqlite, I wish it were possible to do similar things with Clickhouse, Mysql, etc.
Raised an issue in my previous pet project for doing concurrent integration tests with real PostgreSQL DBs (https://github.com/allaboutapps/integresql) as well.
I'm wondering why anyone would want to use anything else at this point (for SQL).
We've built Xata with this idea of using copy-on-write database branching for staging and testing setups, where you need to use testing data that's close to the real data. On top of just branching, we also do things like anonymization and scale-to-zero, so the dev branches are often really cheap. Check it out at https://xata.io/
> The source database can't have any active connections during cloning. This is a PostgreSQL limitation, not a filesystem one. For production use, this usually means you create a dedicated template database rather than cloning your live database directly.
This is a key limitation to be aware of. A way to workaround it could be to use pgstream (https://github.com/xataio/pgstream) to copy from the production database to a production replica. Pgstream can also do anonymization on the way, this is what we use at Xata.
and that most of my data is either:
- business entities (users, projects, etc)
- and "event data" (sent by devices, etc)
where most of the database size is in the latter category, and that I'm fine with "subsetting" those (eg getting only the last month's "event data")
what would be the best strategy to create a kind of "staging clone"? ideally I'd like to tell the database (logically, without locking it expressly): do as though my next operations only apply to items created/updated BEFORE "currentTimestamp", and then:
- copy all my business tables (any update to those after currentTimestamp would be ignored magically even if they happen during the copy) - copy a subset of my event data (same constraint)
what's the best way to do this?
Something we've been trying to solve for a long time is having instant DB resets between acceptance tests (in CI or locally) back to our known fixture state, but right now it takes decently long (like half a second to a couple seconds, I haven't benchmarked it in a while) and that's by far the slowest thing in our tests.
I just want fast snapshotted resets/rewinds to a known DB state, but I need to be using MariaDB since it's what we use in production, we can't switch DB tech at this stage of the project, even though Postgres' grass looks greener.
Obligatory mention of Neon (https://neon.com/) and Xata (https://xata.io/) which both support “instant” Postgres DB branching on Postgres versions prior to 18.