FRESH

Hacker News

Home

Postgres Postmaster does not scale

128 points by davidgu

by kayson

2 subcomments

> sudo echo $NUM_PAGES > /proc/sys/vm/nr_hugepages
This won't work :) echo will run as root but the redirection is still running as the unprivileged user. Needs to be run from a privileged shell or by doing something like sudo sh -c "echo $NUM_PAGES > /proc/sys/vm/nr_hugepages"
The point gets across, though, technicality notwithstanding.

by jstrong

1 subcomments

From the article:
> The real bottleneck is the single-threaded main loop in the postmaster.
A single-threaded event loop can do a lot of stuff. Certainly handle 4000 tasks of some sort in under 10s. Just offhand it seems like it would be eminently possible to handle incoming connections on the scale they describe in a single-threaded event loop.
Clearly the existing postgres postmaster thread is a bottleneck as it is implemented today. But I'd be interested to go deeper into what it's doing that causes it to be unable to keep up with a fairly low workload vs. what is possible to do on a single thread/core.

by paulkre

5 subcomments

Can’t believe they needed this investigation to realize they need a connection pooler. It’s a fundamental component of every large-scale Postgres deployment, especially for serverless environments.

by vel0city

3 subcomments

Isn't this kind of the reason why teams will tend to put database proxies in front of their postgres instances, to handle massive sudden influxes of potentially short lived connections?
This sounds exactly like the problem tools like pgbouncer were designed to solve. If you're on AWS one could look at RDS Proxy.

by _bohm

0 subcomment

Good reminder to always remember Chesterton's Fence. The post indicates that the bottleneck occurs when "many thousands" of EC2 instances are connecting simultaneously. In order for this to happen, presumably someone had to turn `max_connections` way up on their database server to make this to work at all. Seems like the issue could have been avoided at that point with a bit more understanding about why the default is an order of magnitude or more lower than whatever they tuned it to.

by haki

2 subcomments

Some a prime example of a service that naturally peaks at round hours.
We have a habbit of never scheduling long running processes at round hours. Usually because they tend to be busier.
https://hakibenita.com/sql-tricks-application-dba#dont-sched...

by iamleppert

1 subcomments

Why do you need a connection to a database during the meeting? Doesn't it make more sense to record the meeting data to some local state first, and then serialize it to database at the end of the meeting or when a database connection is available? Or better yet, have a lightweight API service that can be scaled horizontally that is responsible for talking to the database and maintains its own pool of connections.
They probably don't even need a database anyway for data that is likely write once, read many. You could store the JSON of the meeting in S3. It's not like people are going back in time and updating meeting records. It's more like a log file and logging systems and data structures should be enough here. You can then take that data and ingest it into a database later, or some kind of search system, vector database etc.
Database connections are designed this way on purpose, it's why connection pools exist. This design is suboptimal.

by the_mitsuhiko

1 subcomments

I'm not working at this company but I found that these types of problems can often be simplified in the architecture.
> Most meetings start on the hour, some on the half, but most on the full. It sounds obvious to say it aloud, but the implication of this has rippled through our entire media processing infrastructure.
When you can control when it happens, you can often jitter things. For instance the naive approach of rate limiting users down to quantized times (eg: the minute, the hour, etc.) leads to every client coming back at the same time. The solution there is to apply a stable jitter so different clients get different resets.
That pattern does not go all that well with meetings as they need to happen when they happen, which is going to be mostly the hour and 30 minutes etc. However often the lead up time to those meetings is quite long, so you can do the work needed that should happen on the hour, quite a bit ahead of time and then apply the changes in one large batch on the minute.
You have similar problems quite often with things like weekly update emails. At scale it can take you a lot of time to prepare all the updates, often more than 12 hours. But you don't want the mails to come in at different times of the day so you really need to get the reports prepared and then send them out when ready.

by PunchyHamster

1 subcomments

> We record millions of meetings every week.
My first thought was "why even use big databases, you have perfect workload to shard it between a bunch of instances and as a bonus any downtime would only affect smaller part of customers"

by mannyv

1 subcomments

Note that they were running Postgres on a 32 CPU box with 256GB of ram.
I'm actually surprised that it handled that many connections. The data implies that they have 4000 new connections/sec...but is it 4000 connections handled/sec?

by rsanek

0 subcomment

Great investigation. Slight aside, but I found the few typos that I noticed to make me feel better about continuing to read -- as a sign that the post wasn't AI-generated.

by truekonrads

0 subcomment

Cool debugging, but… 1) if you have very spiky loads (on the hour) and you can distribute them a little it’s obvious that this is will be good thing. 2) they had the answer all along in their telemetry Sometimes wisdom beats effort

by j16sdiz

0 subcomment

>... we run an unusual workload
ya, right. just make up some reason not following the best practices

by atherton94027

3 subcomments

I'm a bit confused here, do they have a single database they're writing to? Wouldn't it be easier and more reliable to shard the data per customer?

by levkk

1 subcomments

One of the many problems PgDog will solve for you!

by vivzkestrel

1 subcomments

very stupid question: similar to how we had a GIL replacement in python, cant we replace postmaster with something better?

by moomoo11

2 subcomments

maybe this is silly but these days cloud resources are so cheap. just loading up instances and putting this stuff into memory and processing it is so fast and scalable. even if you have billions of things to process daily you can just split if needed.
you can keep things synced across databases easily and keep it super duper simple.

by clarity_hacker

0 subcomment

[dead]

by ltxdsf

0 subcomment

[dead]

by parentheses

0 subcomment

I think this is the kind of investigation that AI can really accelerate. I imagine it did. I would love to see someone walk through a challenging investigation assisted by AI.