On a system with integrated graphics and 8 (16 logical) cores and 32 GB of system memory I achieve what appears to be optimal performance using:
zramen --algorithm zstd --size 200 --priority 100 --max-size 131072 make
sysctl vm.swappiness=180
sysctl vm.page-cluster=0
sysctl vm.vfs_cache_pressure=200
sysctl vm.dirty_background_ratio=1
sysctl vm.dirty_ratio=2
sysctl vm.watermark_boost_factor=0
sysctl vm.watermark_scale_factor=125
sysctl kernel.nmi_watchdog=0
sysctl vm.min_free_kbytes=150000
sysctl vm.dirty_expire_centisecs=1500
sysctl vm.dirty_writeback_centisecs=1500
Compression factor tends to stay above 3.0. At very little cost I more than doubled my effective system memory. If an individual workload uses a significant fraction of system memory at once complications may arise.I had considered some kind of test where each parameter is perturbed a bit in sequence, so that you get an estimate of a point partial derivative. You would then do an iterative hill climb. That probably won't work well in my case since the devices I'm optimizing have too much variance to give a clear signal on benchmarks of a reasonable duration.
I use lz4-rle as first layer, but if page is idle for 1h it is recompressed using zstd lvl 22 in the background
it is great balance, for responsiveness Vs compression ratio