FRESH

Hacker News

Home

How We Found 7 TiB of Memory Just Sitting Around

205 points by anurag

by Aeolun

4 subcomments

I read this and I have to wonder, did anyone ever think it was reasonable that a cluster that apparently needed only 120gb of memory was consuming 1.2TB just for logging (or whatever vector does)

by nitinreddy88

1 subcomments

The other way to look is why adding NS label is causing so much memory footprint in Kubernetes. Shouldn't be fixing that (could be much bigger design change), will benefit whole Kube community?

by shanemhansen

1 subcomments

The unreasonable effectiveness of profiling and digging deep strikes again.

by liampulles

1 subcomments

I'm a little surprised that it got to the point where pods which should consume a couple MB of RAM were consuming 4GB before action was taken. But I can also kind of understand it, because the way k8s operators (apps running in k8s that manipulate k8s resource) are meant to run is essentially a loop of listing resources, comparing to spec, and making moves to try and bring the state of the cluster closer to spec. This reconciliation loop is simple to understand (and I think this benefit has led to the creation of a wide array of excellent open source and proprietary operators that can be added to clusters). But its also a recipe for cascading explosions in resource usage.
These kind of resource explosions are something I see all the time in k8s clusters. The general advice is to always try and keep pressure off the k8s API, and the consequence is that one must be very minimal and tactical with the operators one installs, and then engage in many hours of work trying to fine tune each operator to run efficiently (e.g. Grafana, whose default helm settings do not use the recommended log indexing algorithm, and which needs to be tweaked to get an appropriate set of read vs. write pods for your situation).
Again, I recognize there is a tradeoff here - the simplicity and openness of the k8s API is what has led to a flourish of new operators, which really has allowed one to run "their own cloud". But there is definitely a cost. I don't know what the solution is, and I'm curious to hear from people who have other views of it, or use other solutions to k8s which offer a different set of tradeoffs.

by hinkley

1 subcomments

Keys require O(logn) space per key or nlogn for the entire data set, simply to avoid key collisions. But human friendly key spaces grow much, much faster and I don’t think many people have looked too hard at that.
There were recent changes to the NodeJS Prometheus client that eliminates tag names from the keys used for storing the tag cardinality for metrics. The memory savings wasn’t reported but the cpu savings for recording data points was over 1/3. And about twice that when applied to the aggregation logic.
Lookups are rarely O(1), even in hash tables.
I wonder if there’s a general solution for keeping names concise without triggering transposition or reading comprehension errors. And what the space complexity is of such an algorithm.

by timzaman

0 subcomment

7tib.. that's like 3 servers..