“This is why Linux now provides rseq() which is a much more enlightened solution. With restartable sequences, you actually can get rid of both the mutex and atomics, while the OS continues to fully abstract scheduling. The way it works is you advise the kernel whenever your program enters a critical section of code that you don't want interrupted. It's probably going to be maybe 10 assembly instructions tops. The first assembly opcode should be a move instruction that sets the rseq_cs field. The last instruction needs to be the thing that makes the modification to your global data structure. Think of it sort of like a really tiny database transaction. What makes it go fast, is that the bidirectional communication with the kernel happens via shared memory.”
https://github.com/compudj/librseq
This has helpers for common use cases like counters and linked lists. You shouldn't need to write assembly at all to use rseq in most applications.
The key insight is that the preempter can introspect the program counter of the code being preempted (which is now stable since it was preempted) and act accordingly. The simplest mechanism is to reset their program counter if in a critical section. The more generic mechanism is to jump them to a supplied address. This allows you to do things like hard abort and more.
You can further remove the need for the preempter to understand the preempted code by having the preempted code create a self-introspection code snippet and supplying that with the program counter at preemption. So the preempter just vectors them to their own code which knows how to interpret its own state at any preemption point.