About "Since allocations are expensive due to them requiring a system interaction via syscalls" and "For this only bump allocation is an option, simply because syscalls are slow, multiple for many small objects are even slower and hot paths would explode" there's no reason that a free-list allocator has to do more syscalls than a bump allocator; any malloc under the sun is going to have at least one layer of caching before it makes a syscall for more memory. You could do a doubling scheme like with the bump allocator, or allocate a lot at a time; mimalloc for example requests segments of 4MiB from the kernel [2].
[0] It's approximately steps #1 and #3 fused together, then #2 in https://en.wikipedia.org/wiki/Mark%E2%80%93compact_algorithm...
[1] Something like for (void *object = segment->start; object < segment->end; object += size_in_words(object))
[2] https://www.microsoft.com/en-us/research/wp-content/uploads/...