From the conclusion, I agree with:
> I wouldn't make either one the top-level coordinator by default.
But I do not agree with the follow-up sentence:
> The best shape is still a frontier coordinator or judge above them: GPT-5.5 or Claude Opus deciding what to delegate, checking the finished work, and rerunning narrow pieces when the answer looks wrong. These models make the worker layer much more serious, not the coordinator layer unnecessary.
For the coordinator or judge above them I would put myself, not a too expensive LLM under the control of an external entity, achieving thus simultaneously higher quality, lower cost and greater security.
Currently testing M3 for agentic tasks. It works OK and their token plan is very cheap. Highly recommend for claw / hermes type of work.
Tested GLM 5.1 for coding last month and it burned through my tokens a bit too quickly, but it worked well enough.
Considering how close the models are, the extra free queries may be worth it.