rather than "What Python's asyncio primitives get wrong" this seems more like "why we chose one asyncio primitive (queue) instead of others (event and condition)"
also, halfway through the post, the problem grows a new requirement:
> Instead of waking consumers and asking "is the current state what you want?", buffer every transition into a per-consumer queue. Each consumer drains its own queue and checks each transition individually. The consumer never misses a state.
if buffering every state change is a requirement, then...yeah, you're gonna need a buffer of some kind. the previous proposed solutions (polling, event, condition) would never have worked.
given the full requirements up-front, you can jump straight to "just use a queue" - with the downside that it would make for a less interesting blog post.
also, this is using queues without any size limit, which seems like a memory leak waiting to happen if events ever get enqueued more quickly than they can be consumed. notably, this could not happen with the simpler use cases that could be satisfied by events and conditions.
> A threading.Lock protects the value and queue list.
unless I'm missing something obvious, this seems like it should be an asyncio.Lock?
But coroutines still interleave execution at every await point, so shared mutable state can become just as fragile as in multithreaded code — the scheduling boundary just moves from OS threads to cooperative yield points.
In practice that tends to push designs toward queues, actors, or message-passing patterns if you want to avoid subtle state corruption.
In attempt 2 the old school C way of writing the state machine would work just fine in python, avoid a bunch of the boilerplate and avoid the “state setter needs to know a bunch of stuff” problem. Basically you make the states as a table and put the methods you need in the table so in python a dictionary is convenient. Then you have
> def set_state(new_state):
> state = new_state
> events[new_state].set()
Aaand you’re done. When you add a new state, you add an event corresponding to that state into the events table. If the stuff you would put into a conditional in set_state is more complicated, you could make a state transition method and link to it in the table. Or you could make a nested dict or whatever. It’s not hard, and the fact that the author doesn’t know an idomatic way to write a fsm definitely isn’t something that’s wrong with python’s asyncio and shared state.In general if you’re writing a state machine and you have a lot of “if curr_state == SOME_STATE” logic, chances are it would be better if you used tables.
You can often take the naive solution and it will be the correct one. Your code will looks like your intent.
TFA's first attempt:
async def drain_requests():
while state != "closing":
await asyncio.sleep(0.1)
print("draining pending requests")
Got it. Let's port it to STM: let drain_requests = do
atomically (
do s <- readTVar state
when (s /= "closing")
retry )
print("draining pending requests")
Thread-safe and no busy-waiting. No mention of 'notify', 'sleep'. No attempt to evade the concurrency issues, as in the articles "The fix: per-consumer queues - Each consumer drains its own queue and checks each transition individually."There's so many solutions in the middle, I have this theory that most people that get into async don't really know what threading is. Maybe they have a world vision where before 2023 python just could not do more than one thing at once, that's what the GIL was right? But now after 3.12 Guido really pulled himself by the bootstraps and removed the GIL and implemented async and now python can do more than one thing at a time so they start learning about async to be able to do more than one thing at a time.
This is a huge disconnect between what python devs are actually building, a different api towards concurrency. And some junior devs that think they are learning bleeding edge stuff when they are actually learning fundamentals through a very contrived lens.
It 100% comes from ex-node devs, I will save the node criticism, but node has a very specific concurrency model, and node devs that try out python sometimes run to asyncio as a way to soften the learning curve of the new language. And that's how they get into this mess.
The python devs are working on these features because they have to work on something, and updates to foundational tech are supposed to have effects in decades, it's very rare that you need to use bleeding edge features. In 95% of the cases, you should be restricting yourself to using features from versions that are 5-10 years old, especially if you come from other languages! You should start old to new, not new to old.
Sorry, for the rant, or if I misjudged, making a broader claim based on multiple perspectives.