For comparison, the current agent swarm challenge on HF is at 508 tok/s on a A10G GPU:
https://huggingface.co/spaces/gemma-challenge/gemma-dashboar...
> Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s.
Wow. Limitnig access to models for other reasons than that you can't physically provide it should be a crime against humanity or the planet or something. So much immediate efficency left on the table for stupid reasons.