I am also not entirely sure whether "manorboy" is a good benchmark, but for estimating the function call overhead it should be ok. For the access to variables part, other considerations are far more important IMHO. So I am not sure I would put too much weight on the accessing "k" part of the post.
n3654 contains a criticism of the lambda approach from a language-design perspective. The issue is that it seems not to be a good fit for C. Copying values is always cheap, but this obviously works in toy examples but not necessarily for interesting data structures. In C++ you could then capture a pointer as a value, but then you haven't avoided an indirection either. In general, this is fine in C++ as smart pointer than deal with the memory management, but in C having a captured value in a lambda you can not have explicit access to anymore does not make too much sense.
The "unfortunately, is that unlike C++ there are no templates in C" is also interesting. I fled from C++ back to C exactly because of templates. In a performance context, the fallacy is that you can always create super-optimized code using compile-time techniques that absolutely shines in microbenchmarks (such as this one) but cause a lot of bloat on larger scale (and long compilations times). If you want this, I think you should stick to C++.
So, we need to swap to the logarithmic graphs to get a better picture
I wish more people would know about decibels.
C does not have closures. You could simulate closures, but it is neither robust not automatic compared to languages tha truly support them.