FRESH

Hacker News

Home

Vm.overcommit_memory=2 is the right setting for servers

40 points by signa11

by LordGrey

2 subcomments

For anyone not familiar with the meaning of '2' in this context:
The Linux kernel supports the following overcommit handling modes
0 - Heuristic overcommit handling. Obvious overcommits of address space are refused. Used for a typical system. It ensures a seriously wild allocation fails while allowing overcommit to reduce swap usage. root is allowed to allocate slightly more memory in this mode. This is the default.
1 - Always overcommit. Appropriate for some scientific applications. Classic example is code using sparse arrays and just relying on the virtual memory consisting almost entirely of zero pages.
2 - Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM. Depending on the amount you use, in most situations this means a process will not be killed while accessing pages but will receive errors on memory allocation as appropriate. Useful for applications that want to guarantee their memory allocations will be available in the future without having to initialize every page.

by c0l0

1 subcomments

I realize this is mostly tangential to the article, but a word of warning for those who are about to mess with overcommit for the first time: In my experience, the extreme stance of "always do [thing] with overcommit" is just not defensible, because most (yes, also "server") software is just not written under the assumption that being able to deal with allocation failures in a meaningful way is a necessity. At best, there's an "malloc() or die"-like stanza in the source, and that's that.
You can and maybe even should disable overcommit this way when running postgres on the server (and only a minimum of what you would these days call sidecar processes (monitoring and backup agents, etc.) on the same host/kernel), but once you have a typical zoo of stuff using dynamic languages living there, you WILL blow someone's leg off.

by EdiX

1 subcomments

This is completely wrong. First, disabling overcommit is wasteful because of fork and because of the way thread stacks are allocated. Sorry, you don't get exact memory accounting with C, not even Windows will do exact accounting of thread stacks.
Secondly, memory is a global resource so you don't get local failures when it's exhausted, whoever allocates first after memory has been exhausted will get an error they might be the application responsible for the exhaustion or they might not be. They might crash on the error or they might "handle it", keep going and render the system completely unusable.
No, exact accounting is not a solution. Ulimits and configuring the OOM killer are solutions.

by vin10

0 subcomment

For anyone feeling brave enough to disable overcommit after reading this, be mindful that default `vm.overcommit_ratio` is 50% which means that if no swap is available, on a system with 2GB of total RAM, more than 1GB of RAM can't be allocated and requests will fail with preemptive OOMs. (e.g. postgresql servers typically disable overcommit)
- https://github.com/torvalds/linux/blob/master/mm/util.c#L753

by wmf

1 subcomments

This doesn't address the fact that forking large processes requires either overcommit or a lot of swap. That may be the source of the Redis problem.

by laurencerowe

1 subcomments

Disabling overcommit on V8 servers like Deno will be incredibly inefficient. Your process might only need ~100MB of memory or so but V8's cppgc caged heap requires a 64GB allocation in order to get a 32GB aligned area in which to contain its pointers. This is a security measure to prevent any possibility of out of cage access.

by Animats

1 subcomments

Setting 2 is still pretty generous. It means "Kernel does not allow allocations that exceed swap + (RAM × overcommit_ratio / 100)." It's not a "never swap or overcommit" setting. You can still get into thrashing by memory overload.
We may be entering an era when everyone in computing has to get serious about resource consumption. NVidia says GPUs are going to get more expensive for the next five years. DRAM prices are way up, and Samsung says it's not getting better for the next few years. Bulk electricity prices are up due to all those AI data centers. We have to assume for planning purposes that computing gets a little more expensive each year through at least 2030.
Somebody may make a breakthrough, but there's nothing in the fab pipeline likely to pay off before 2030, if then.

by deathanatos

0 subcomment

This is quite the bold statement to make with RAM prices sky high.
I want to agree with the locality of errors argument, and while in simple cases, yes, it holds true, it isn't necessarily true. If we don't overcommit, the allocation that kills us is simply the one that fails. Whether this allocation is the problematic one is a different question: if we have a slow leak that, every 10k allocation allocs and leaks, we're probably (9999 / 10k, assuming spherical allocations) going to fail on one that isn't the problem. We get about as much info as the oom-killer would have, anyways: this program is allocating too much.

by jleyank

1 subcomments

As I recall, this appeared in the 90’s and it was a real pain debugging then as well. Having errors deferred added a Heisenbug component to what should have been a quick, clean crash.
Has malloc ever returned zero since then? Or has somebody undone this, erm, feature at times?

by renehsz

2 subcomments

Strongly agree with this article. It highlights really well why overcommit is so harmful.
Memory overcommit means that once you run out of physical memory, the OOM killer will forcefully terminate your processes with no way to handle the error. This is fundamentally incompatible with the goal of writing robust and stable software which should handle out-of-memory situations gracefully.
But it feels like a lost cause these days...
So much software breaks once you turn off overcommit, even in situations where you're nowhere close to running out of physical memory.
What's not helping the situation is the fact that the kernel has no good page allocation API that differentiates between reserving and committing memory. Large virtual memory buffers that aren't fully committed can be very useful in certain situations. But it should be something a program has to ask for, not the default behavior.

by pizlonator

0 subcomment

This is such an old debate. The real answer, as with all such things, is "it depends".
Two reasons why overcommit is a good idea:
- It lets you reserve memory and use the dirtying of that memory to be the thing that commits it. Some algorithms and data structures rely on this strongly (i.e. you would have to use a significantly different algorithm, which is demonstrably slower or more memory intensive, if you couldn't rely on overcommit).
- Many applications have no story for out-of-memory other halting. You can scream and yell at them to do better, but that won't help, because those apps that find themselves in that supposedly-bad situation ended up there for complex and well-considered reasons. My favorite: having complex OOM error handling paths is the worst kind of attack surface, since it's hard to get test coverage for it. So, it's better to just have the program killed instead, because that nixes the untested code path. For those programs, there's zero value in having the memory allocator be able to report OOM conditions other than by asserting in prod that mmap/madvise always succeed, which then means that the value of not overcommitting is much smaller.
Are there server apps where the value of gracefully handling out of memory errors outweighs the perf benefits of overcommit and the attack surface mitigation of halting on OOM? Yeah! But I bet that not all server apps fall into that bucket

by charcircuit

0 subcomment

>Would you rather debug a crash at the allocation site
The allocation site is not necessarily what is leaking memory. What you actually want in either case is a memory dump where you can tell what is leaking or using the memory.

by blibble

2 subcomments

redis uses the copy-on-write property of fork() to implement saving
which is elegant and completely legitimate

0 subcomment

by jcalvinowens

3 subcomments

There's a reason nobody does this: RAM is expensive. Disabling overcommit on your typical server workload will waste a great deal of it. TFA completely ignores this.
This is one of those classic money vs idealism things. In my experience, the money always wins this particular argument: nobody is going to buy more RAM for you so you can do this.