Looks interesting, curious what your moat here is. What prevents Supabase/Neon from doing this? Actually don't they already do this? How does this differ from the branching Neon and Supabase already offer?
We enable branching on any postgres DB through our architecture. So if you're on RDS, Planetscale, etc you can keep your DB where it is but also get the ability to branch with a full clone of the DB.
Neon does support copy on write branching natively and autoscaling compute but you make certain performance tradeoffs. A lot of the folks we've talked to that use RDS or Planetscale are reliant on things like query latencies supported by that platform's specific architecture but also want the ability to test on branches. We let you get the best of both worlds (branch but leave your DB where it is and freely choose your production environment based on prod concerns)
Supabase does have branching but they do not branch the data so you can't test any interactions that rely on the data. You can restore from backup as an option but this slows down based on data size since you're actually moving data as opposed to copy on write.
Longer term we want to be the place you branch all your data infra. So expanding to S3, Snowflake, MySQL etc.
For now though we're focusing on just postgres and getting it right!
“Never impacts production data” is impossible to guarantee. Playing with real world data often has side effects outside of the database. For example if you store oauth tokens to external services in your DB (customer integrations) it’s easy to mess up your customers data through a bad API call (been there done that).
There is still value in carefully testing on your prod DB, but for that you could just easily maintain a read replica. I don’t see the need for a SaaS here.
One of the main things people use us for is ease of testing writes on a per dev/agent basis which would be difficult on a read replica!
On the real world data impact I absolutely agree. We added something called "branch hooks" which essentially let you define SQL to run against the branch before it's returned
This lets you essentially anonymize and modify the branch to scrub unintended external side effects.
It's something that we're still working on though and trying to design the right abstractions around because we want to get that part right.
A true read replica won't let you write! So if you need to test something like a backfill and see if anything goes wrong you wouldn't be able to quite as easily.
We'd let you instantly clone prod + user defined auto-anonymization so you can test writes. The architecture also somewhat takes the place of an existing read replica if you want to use it like that to make it more cost efficient.
Also since we're using copy on write for the clones they're incredibly storage efficient and the autoscaling compute helps minimize cost on clones by minimizing excess compute uptime
Doesn't look open-source. If you are interested in having a Neon or git-like branching for PostgreSQL experience, have a look at Xata, which is based on ZFS like Delphix was:
Neon does support copy on write branching natively and autoscaling compute but you make certain performance tradeoffs. A lot of the folks we've talked to that use RDS or Planetscale are reliant on things like query latencies supported by that platform's specific architecture but also want the ability to test on branches. We let you get the best of both worlds (branch but leave your DB where it is and freely choose your production environment based on prod concerns)
Supabase does have branching but they do not branch the data so you can't test any interactions that rely on the data. You can restore from backup as an option but this slows down based on data size since you're actually moving data as opposed to copy on write.
Longer term we want to be the place you branch all your data infra. So expanding to S3, Snowflake, MySQL etc.
For now though we're focusing on just postgres and getting it right!
There is still value in carefully testing on your prod DB, but for that you could just easily maintain a read replica. I don’t see the need for a SaaS here.
On the real world data impact I absolutely agree. We added something called "branch hooks" which essentially let you define SQL to run against the branch before it's returned
This lets you essentially anonymize and modify the branch to scrub unintended external side effects.
It's something that we're still working on though and trying to design the right abstractions around because we want to get that part right.
How does this compare to managing our own read-only replica with anonymized data?
We'd let you instantly clone prod + user defined auto-anonymization so you can test writes. The architecture also somewhat takes the place of an existing read replica if you want to use it like that to make it more cost efficient.
Also since we're using copy on write for the clones they're incredibly storage efficient and the autoscaling compute helps minimize cost on clones by minimizing excess compute uptime
I mean, they said "read-only" ...
https://github.com/xataio/xata