[disclaimer] Without brushing up on the details of this, I strongly suspect that this is about removing the need for executable stacks than performance. Allocating a trampoline on the stack rather than heap is good for efficiency.
These days, many GNU/Linux distros are disabling executable stacks by default in their toolchain configuration, both for building the distro and for the toolchain offered by the system to the user.
When you use GCC local functions, it overrides the linker behavior so that the executable is marked for executable stacks.
Of course, that is a security concession because when your stack is executable, that enables malicious remote execution code to work that relies on injecting code into the stack via a buffer overflow and tricking the process into jumping to it.
If trampolines can be allocated in a heap, then you don't need an executable stack. You do need an executable heap, or an executable dedicated heap for these allocations. (Trampolines are all the same size, so they could be packed into an array.)
Programs which indirect upon GCC local functions are not aware of the trampolines. The trampolines are deallocated naturally when the stack rolls back on function return or longjmp, or a C++ exception passing through.
Heap-allocated trampolines have an obvious deallocation problem; it would be interesting to see what strategy is used for that.
The most striking surprise is the magnitude of the gap between std::function and std::function_ref. It turns out std::function (the owning container) forces a "copy-by-value" semantics deeply into the recursion. In the "Man-or-Boy" test, this apparently causes an exponential explosion of copying the closure state at every recursive step. std::function_ref (the non-owning view) avoids this entirely.
Therefore it's very jarring with this text after the first C code example:
This uses a static variable to have it persist between both the compare function calls that qsort makes and the main call which (potentially) changes its value to be 1 instead of 0
This feels completely made up, and/or some confusion about things that I would expect an author of a piece like this to really know.
In reality, in this usage (at the global outermost scope level) `static` has nothing to do with persistence. All it does is make the variable "private" to the translation unit (C parliance, read as "C source code file"). The value will "persist" since the global outermost scope can't go out of scope while the program is running.
It's different when used inside a function, then it makes the value persist between invocations, in practice typically by moving the variable from the stack to the "global data" which is generally heap-allocated as the program loads. Note that C does not mention the existence of a stack for local variables, but of course that is the typical implementation on modern systems.
Something I've been thinking about lately is having a "state" keyword for declaring variables in a "stateful" function. This works just like "static" except instead of having a single global instance of each variable the variables are added to an automatically defined struct, whose type is available using "statetype(foo)" or some other mechanism, then you can invoke foo as with an instance of the state (in C this would be an explicit first parameter also marked with the "state" parameter.) Stateful functions are colored in the sense that if you invoke a nested stateful function its state gets added to the caller's state. This probably won't fly with separate compilation though.
int main(int argc, char* argv[]) {
if (argc > 1) {
char\* r_loc = strchr(argv[1], 'r');
if (r_loc != NULL) {
ptrdiff_t r_from_start = (r_loc - argv[1]);
if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
in_reverse = 1;
}
}
}
...
}Why not
if (argc > 1 && strcmp(argv[1], "-r") == 0) {
in_reverse = 1;
}for example?
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3654.pdf
(and I am not impressed by micro benchmarks)
With "static", it is implemented as an ordinary function, but the name is local to the function that contains it; it cannot access stuff within the function containing it unless those things are also declared as "static".
With "register", the address of the function cannot be taken, and if the function accesses other stuff within the function that contains it then the compiler will add additional arguments to the function so that its type does not necessarily match the type which is specified in the program.
This is not good enough for many uses though, so having the other extensions would also be helpful (possibly including implementing Apple Blocks in GCC).
You can call the local functions directly and get the benefits of the specialized code.
There's no way to spell out this function's type, and no way to store it anywhere. This is true of regular functions too!
To pass it around you need to use the type-erased "fat pointer" version.
I don't see how anything else makes sense for C.
(I can't be bothered to run his benchmarks)
#include <stdio.h>
typedef struct env_ E;
typedef struct fat_ptr_ Fp;
typedef int fn(E*);
struct fat_ptr_ {
fn *f;
E *e;
};
#define INT(body) ({ int lambda(E*){ return body; }; (Fp){lambda,0}; })
struct env_ {
int k;
Fp xl; Fp x2; Fp x3; Fp x4;
};
#define FpMk(fn,e) {fn, e}
#define FpCall(fn) (fn.f(fn.e))
int main(){
int a(E env, Fp x5){
int b(E *ep){
return a( (E){--(ep->k), FpMk(b, ep), ep->xl, ep->x2, ep->x3}, ep->x4 );
}
return env.k<=0 ? FpCall(env.x4) + FpCall(x5) : b(&env);
}
printf(" %d\n", a( (E){10, INT(1), INT(-1), INT(-1), INT(1)}, INT(0)) );
}I have a case where I need to create a static templated lambda to be passed to C as a pointer. Such thing is impossible in Rust, which I considered at first.
// imagine my_function takes 3 ints, the first 2 args are captured and curried.
Function<void(int)> my_closure(&my_function, 1, 2);
my_closure(3);
I've never implemented it myself, as I don't use C++ features all too much, but as a pet project I'd like to someday. I wonder how something like that compares!Practically speaking all lambda options except for the one involving allocation (why would you even do that) are equivalent modulo inlining.
In particular, the caveat with the type erasure/helper variants is precisely that it prevents inlining, but given everything is in the same translation unit and isn't runtime-driven, it's still possible for the compiler to devirtualize.
I think it would be more interesting to make measurements when controlling explicitly whether inlining happens or the function type can be deduced statically.