> Why you should never suspend a thread in your own process.
This sounds like a good general princple but suspending threads in your own process is kind of necessary for e.g. many GC algorithms. Now imagine multiple of those runtimes running in the same process.
The tricky part is ensuring that the signal handler code is async-signal-safe (which pretty much boils down to "ensure you're not acquiring any locks and be careful about reentrant code"), but at least that only has to be verified for a self-contained small function.
Is there anything similar to signals on Windows?
Why was the service holding things up? Because it was waiting on acquiring a lock held by one of its other threads.
What was that other thread doing? It was deadlocked because it tried to recursively acquire an exclusive srwlock (exactly what the docs say will happen if you try).
Why was it even trying to reacquire said lock? Ultimately because of a buffer overrun that ended up overwriting some important structures.
Just curious, is this customer a game studio? I have never done any serious system programming but the gist feels like one.
Unfortunately sometimes you don't have the luxury of being able to do this (e.g. on iOS, especially pre-MetricKit). We shipped one such implementation in the Twitter app (which was still there last I checked) and as far as I can tell it's safe but mostly by accident–I didn't want to to pause things for very long, so the code just suspends the thread, grabs register state, then writes the backtrace to a stack buffer before resuming. I originally wanted to grab traces without suspending the process, which is something you can actually "do" because getting register state doesn't require suspension and you need to put guards on your frame decoding anyway ("is this address I am about to dereference actually in the stack?"). But unfortunately after thinking about it I added the suspension back because trying to collect a trace from a running thread could give you a fragmented backtrace as it modifies it out from under you.
The correct terminology is 'stopped responding' Raymond. You need to consult the style guide.