FRESH

Hacker News

Home

Scaling PostgreSQL to power 800M ChatGPT users

343 points by mustaphah

by ninkendo

3 subcomments

> It’s also common to find long-running idle queries in PostgreSQL. Configuring timeouts like idle_in_transaction_session_timeout is essential to prevent them from blocking autovacuum.
Idle transactions have been a huge footgun at $DAYJOB… our code base is full of “connect, start a transaction, do work, if successful, commit.” It means you’re consuming a connection slot for all work, even while you’re not using the database, and not releasing it until you’re done. We had to bump the Postgres connection limits by an order of magnitude, multiple times, and before you know it Postgres takes up more RAM than anything else just to support the number of connections we need.
The problem permeated enough of our (rust) codebase that I had to come up with a compile time check that makes sure you’re not awaiting any async functions while a Postgres connection is in your scope. Using the .await keyword on an async function call, but not passing the pg connection to that function, ends up being a nearly perfect proxy for “doing unrelated work while not releasing a connection”. It worked extremely well, the compiler now just straight up tells us where we’re doing it wrong (in 100+ places in fact.)
Actually getting away from that pattern has been the hard part, but we’re almost rid of every place we’re doing it, and I can now run with a 32-connection pool in load testing instead of a 10,000 connection pool and there’s no real slowdowns. (Not that we’d go that low in production but it’s nice to know we can!)
Just decreasing the timeout for idle transactions would have probably been the backup option, but some of the code that holds long transactions is very rarely hit, and it would have taken a lot of testing to eliminate all of it if we didn’t have the static check.

by AlisdairO

1 subcomments

Regarding schema changes and timeouts - while having timeouts in place is good advice, you can go further. While running the schema rollout, run a script alongside it that kills any workload conflicting with the aggressive locks the schema change is trying to take. This will greatly reduce the pain caused by lock contention, and prevent you from needing to repeatedly rerun statements on high-throughput tables.
This would be a particularly nice-to-have feature for Postgres - the option to have heavyweight locks just proactively cancel any conflicting workload. For any case where you have a high-throughput table, the damage of the heavyweight lock sitting there waiting (and blocking all new traffic) is generally much larger than just cancelling some running transactions.

by winterrx

1 subcomments

First OpenAI Engineering blog? I'm definitely interested in seeing more and how they handled the rapid growth.

by n_u

1 subcomments

Cool! I'd love to know a bit more about the replication setup. I'm guessing they are doing async replication.
> We added nearly 50 read replicas, while keeping replication lag near zero
I wonder what those replication lag numbers are exactly and how they deal with stragglers. It seems likely that at any given moment at least one of the 50 read replicas may be lagging cuz CPU/mem usage spike. Then presumably that would slow down the primary since it has to wait for the TCP acks before sending more of the WAL.

by huksley

1 subcomments

"... If a new feature requires additional tables, they must be in alternative sharded systems such as Azure CosmosDB rather than PostgreSQL...."
So it is not really scaling too much now, rather maintaining current state of things and new features go to a different DB?

by dbuser99

1 subcomments

I don’t get it. This whole thing says single writer does not scale, so we stopped writing as much and removed reads away from it, so it works ok and we decided that’s enough. I guess thats great.

by ggregoire

2 subcomments

> scaled up by increasing the instance size
I always wondered what kind of instance companies at that level of scalability are using. Anyone here have some ideas? How much cpu/ram? Do they use the same instance types available to everyone, or does AWS and co offer custom hardware for these big customers?

by everfrustrated

2 subcomments

This is why I love Postgres. It can get you to being one of the largest websites before you need to reconsider your architecture just by throwing CPU and disk at it. At that point you can well afford to hire people who are deep experts at sharding etc.

by kuon

0 subcomment

The article is basically "we use PostgreSQL, it works, but we had to do some optimization to make it scale".
I don't really get the point here. What is novel and great? It feels they followed the first " how to scale pg" article.

by mrweasel

0 subcomment

Running this on Azure Postgresql, even migrating to CosmosDB, cannot be cheap. I know that OpenAI have to deal/relationship with Microsoft, but still, this has to be expensive.
This is however the most down to earth: How we scale Postgresql I've read in a long time. No weird hacker, no messing around with the source code or tweaking the Linux kernel. Running on Azure Postgresql it's not like OpenAI have those options anyway, but still it seems a lot more relatable than: We wrote our own drive/filesystem/database-hack in Javascript.

by bzmrgonz

5 subcomments

Someone ask Microsoft what does it feel to be bested by an open source project on their very own cloud platform!!! Lol.

by hu3

2 subcomments

From what I understand they basically couldn't scale writes in PostgreSQL to their needs and had to offload what they could to Azure's NoSQL database.
I wonder, is there another popular OLTP database solution that does this better?
> For write traffic, we’ve migrated shardable, write-heavy workloads to sharded systems such as Azure CosmosDB.
> Although PostgreSQL scales well for our read-heavy workloads, we still encounter challenges during periods of high write traffic. This is largely due to PostgreSQL’s multiversion concurrency control (MVCC) implementation, which makes it less efficient for write-heavy workloads. For example, when a query updates a tuple or even a single field, the entire row is copied to create a new version. Under heavy write loads, this results in significant write amplification. It also increases read amplification, since queries must scan through multiple tuple versions (dead tuples) to retrieve the latest one. MVCC introduces additional challenges such as table and index bloat, increased index maintenance overhead, and complex autovacuum tuning.

by ahachete

1 subcomments

Postgres can really scale well vertically (and horizontally for read-only workloads) as the post shows.
However, I'm still surprised about the reasons for not sharding. They have been mentioned before, but I haven't seen a substantial rationale.
Sharding is almost only analyzed from the perspective of write scaling. But sharding may not only be about write scaling, but a path to reducing blast radius. And this is something that triggers much earlier than write scaling needs (especially given how well Postgres scales vertically and reads).
When you shard your database, you end up having N clusters (for HA purposes, each "shard" must be a primary-replica(s) cluster itself), each holding 1/Nth of the data.
There are certain scenarios which, while unlikely, may hit you: data corruption in the WAL replication stream, a problem during a major upgrade, a situation that requires a whole cluster restore from a backup, you name it. For those cases, the whole cluster may experience notable downtime.
If you have a single cluster, 100% of your users experience downtime. If you sharded into N clusters, only 1/Nth of your users experience downtime. For a company servicing 800M users the difference from both scenarios is dramatically different. Even for much much smaller companies.
I'm puzzled why this is not perceived as a risk, and if it is not, how it is mitigated.
While I wouldn't advocate to shard "too early", given that it comes with notable caveats, I believe more and more in sharding your workloads when possible more earlier than later. Way before truly needing it from a write scaling perspective. Because apart from reducing the blast radius, it applies implicitly the principle of "divide-and-conquer", and your problems become much more manageable (your number of connections per cluster decreases at will, backup restore times can be a fraction of the time, logical replication can be considered as a true option for replication/upgrades/etc if you keep shards relatively small and many other operational procedures are greatly simplified if now you have much smaller databases, even if you have many more of them).

by _ink_

2 subcomments

Did I miss it, or did they not say why they picked CosmoDB? Postgres has also sharding, so instead of moving to a different DB they could have added a new postgres instance with sharding for the new requests.

by nasretdinov

1 subcomments

I honestly don't understand such negative response tone from the comments. Yes, it does promote Azure, but that's to be expected from a company with is part owned by Microsoft :).
The main point of the article is that it's actually not that hard to live with a single primary Postgres for your transactional workloads (emphasis on _transactional_), and if OpenAI with their 800M+ users can still survive on a single primary (with 50(!) read replicas), so could you, especially before you've reached your first 100M users.
Any non-distributed database or setup is orders of magnitude easier to design for, and it's also typically much more cost efficient too, both in terms of hardware and software too.
There are some curious details, e.g.:
- you can ship WAL to 50 read replicas simultaneously from a single primary and be fine - you can even be using an ORM and still get decent performance - schema changes are possible, and you can just cancel a slow ALTER to prevent production impact - pgbouncer is ok even for OpenAI scale
There are so many things that contradict current "conventional wisdom" based on the experience from what was possible with the hardware 10+ (or even 20+) years ago. Times finally changed and I really welcome articles like these that show how you can greatly simplify your production setup by leveraging the modern hardware.

by trhway

0 subcomment

"However, some read queries must remain on the primary because they’re part of write transactions. "
if there is a read replica that has reached required snapshot - it is usually enough (depends on your task of course) for it to be the snapshot that was at the start of your transaction - and if the read query doesn't need to read your transaction uncommitted data, then that replica can serve the read query.

by QuiCasseRien

1 subcomments

I like the way of thinking. Instead of migrating to another database, they keep that awesome one running and found smart workaround to push limits.

by neya

0 subcomment

Out of pure boredom and tired of all these Chat websites selling my data and with ChatGPT's new update on ads - I decided enough was enough and created my own Chat application for privacy. Like every other architect, I searched for a good database and eventually gave up on specialized ones for chat because they were either too expensive to host or too complex to deal with. So, I simply just used PostgreSQL. My chat app has basic RAG, not ground breaking or anything - but the most important feature I made was ability to add different chat models into one group chat. So, when you ask for opinions on something - you are not relying on just a single model and you can get a multi-model view of all the possible answers. Each model can have its own unique prompt within the group chat. So basically, a join table.
Months passed by since this application was developed (a simple Phoenix/Elixir backend), and yesterday I was casually checking my database to see how many rows it had - about 500,000+ roughly. I didn't notice a single hint of the volume the Postgres was handling, granted - I'm the only user, but there's always a lot going on - RAG, mostly that requires searching of the database for context before multiple agents send you a response (and respond amongst themselves). Absolutely zero performance degradation.
I'm convinced that Postgres is a killer database that doesn't get the attention it deserves over the others (for chat). Already managing some high traffic websites (with over 500M+ requests) with no issues, so I am extremely unsurprised that it works really well for chat apps at scale too.

by DeathArrow

1 subcomments

"This effort demonstrates that with the right design and optimizations, Azure PostgreSQL can be scaled to handle the largest production workloads."
Sure, but choosing from the start a DB that can scale with ease would have taken far less time and effort.
You can bend any software into doing anything, but is it worth it?

by ph4evers

0 subcomment

Nice write up! It is cool to see that PostgreSQL is still standing. Adyen has some nice blog posts about squeezing the max out of PostgreSQL https://medium.com/adyen/all?topic=postgres

by KellyCriterion

0 subcomment

I would be super curious about:
How do they store all the other stuff related to operating the service? This must be a combination of several components? (yes, including some massdata storage, Id guess?)
This would be cool to understand, as Ive absolutely no idea how this is done (and could be done :-)

by zallarak

0 subcomment

Anyone have any idea of what their caching strategy may be? I think that may be the most interesting and impactful thing here.
ChatGPT is definitely the snappiest web UI of any LLM.

by ed_mercer

2 subcomments

Why does the [Azure PostgreSQL flexible server instance] link point to Chinese Azure?

by ASalazarMX

0 subcomment

I was confused when I saw elsewhere a "goose with a knife" meme asking someone if they had 800M users, and then if they had donated to PostgreSQL. Now I understand.

by kachapopopow

0 subcomment

Why does everyone make a "how we scaled PostgreSQL" article.

by killingtime74

1 subcomments

Why a single postgres? Why not shard by users?

by PunchyHamster

0 subcomment

Weird, I'd imagine for kind of use they are getting it would be easy to shard the infrastructure to entirely separate instances
.

by mannyv

0 subcomment

Uh, they scaled PostgreSQL by offloading a lot of it to Azure CosmosDB.
I'm not sure that's the answer people are looking for.

0 subcomment

by ahmetozer

0 subcomment

ai written blog, its very generic and same context is repated many times

0 subcomment

by HenriTEL

0 subcomment

Yeah that's an ad for Azure Cosmos DB

by szastamasta

4 subcomments

Article has so much fluff and only some very coarse information like (we sharded writes, yay!). Almost no detail just keywords for SEO, or whatever they’re aiming for.
There’s also a lot of repetition. Maybe it was AI generated…?

by bhouston

0 subcomment

Ah yes, OpenAI is sharding now. Surprise surprise.
I mentioned that as a right solution to the problem last time they posted about Postgres performance issues:
https://news.ycombinator.com/item?id=44072645
But the response from an OpenaI engineer (who is the author of this article) was that sharding isn't the solution:
https://news.ycombinator.com/item?id=44074702

by qaq

0 subcomment

for people not burning billions of VC $ sharding Postgres is not a bad option.

by lighthouse1212

6 subcomments

The 'single primary with read replicas' pattern scaling to 800M users is the real insight here. Most startups reach for sharding or distributed databases way too early, adding complexity for scale they don't have. If OpenAI can serve hundreds of millions from one Postgres primary by offloading reads and pushing new write-heavy features elsewhere, that's a strong argument for simplicity.

by LudwigNagasena

0 subcomment

TL;DR There is no secret sauce, it's the same set of techniques you’ve seen in most PostgreSQL scaling guides. Those techniques do work.

by resdev

0 subcomment

They could’ve used mongodb which is web scale NoSQL database because SQL is 1990’s era legacy technology.
/s

by MarginalGainz

0 subcomment

[flagged]