The specifics of Python were chosen only due to the language ecosystem being fragmented and inconsistent while Python remains an essential learning, research, and now ML programming language (it was my first language and I still love it).
My thoughts on LLM generated code have changed immensely in the last 9 months as I've taken on teams and projects through my consulting work [1] as a fractional CTO. Python remains a difficult, flakey, and inconsistent programming language for complex production systems. Most other programming languages suffer from fragmented toolchains and ecosystems: JavaScript (famously), PHP, and even C/C++ to a degree.
Languages with a single way to do things benefit the most: Ruby, Rust, Swift (even). Low entropy is the way to go and convention > configuration seems to pay off with LLMs.
Mean cost of management is more important than specific edge examples "X company run on Y language". I think that 'boring' languages with rock-solid compilers, toolchains, testing frameworks, and package managers make for high return on engineering time and production maintenance.
[1]: sancho.studio
The parallelism issue in particular was also not something I noticed agent struggling with in JavaScript, although JavaScript concurrency model is clearly fundamentally different.
The concurrency issues that I saw LMM‘s face was one reason why I created freelang which uses a very boring and audible concurrency model of OS processes that use the file system to talk instead of IPC, shared state, or anything like that. Higher overhead, lower throughput, but more boring and hopefully less bugs: https://github.com/DO-SAY-GO/freelang
Most models come up with the least effective solutions when writing Python.
The more assumptions I can move to compile time the better models are at dealing with emerging complexity.
I would go the other way with LLMs and I wish for liquid types and effects in Rust to make type specifications even more strict.
P.S. effects and liquid types and type specifications in general add a lot of busywork, but models have higher level of tolerance to busywork compared to developers.
This is annoying but only needs to be solved once at the start, either by the LLM or the human guiding it. A single prompt of "Set up a uv project in this directory with Python 3.13" is enough that it's never an issue again for that repo.
> Goroutines are a far more tractable primitive for coding agents than threads, callbacks, async/await, or any of the colored-function regimes that dominate elsewhere.
I disagree with this. Goroutines, along with threads, callbacks, and traditional async, are all in the same category: spaghetti of unbounded background tasks. Structured concurrency [1] on the other hand is dramatically easier to reason about. Python has support for this (in Trio and asyncio.TaskGroup) as do other languages like Kotlin and Swift. Function colouring a red herring; if anything, it's useful because it highlights the scheduling/cancellation points in your code.
[1] https://vorpus.org/blog/notes-on-structured-concurrency-or-g...
-----
This really does read as "Go is my favourite language". In fairness, that's a good reason to choose a language to use with an LLM (so long as it's powerful enough and not too obscure). But let's not pretend it's the best language for everyone.
Therefore the best language for agents is likely the one that, on one hand erases all irrelevant details (ie. raises the level of abstraction and does not force focusing on eg. memory management), and on the other hand encodes any domain-relevant details in the code (eg. using advanced type systems, annotations, contracts, spec-like tests eg. property-based).
Human readability is a separate concern and still relevant, but the two mentioned properties actually generally improve on that as well (at least for engineers persistent enough to scale the tower of abstraction).
Based on this, it seems Go is certainly not that "agent endgame" language. It has large amounts of boilerplate, a general lack of safety around concurrency features, a pretty middling static safety story overall with a generally underpowered type system.
I don't think the perfect language exists, yet, but just wildly imagining, it would probably be something like a cross between Scala, Elixir and Lean (or equivalents). Unfortunately none of those languages also have the large training corpus required to make them perform well in all agenting engineering situations (yet).
For any language comparison, one must separate the expressiveness of the language, which limits the long-term possibilities for agents, and the training corpus, which is what mostly gives it the current standing. I think we are still in the phase where the languages are separated by essentially random non-design factors such as the amount of training environments the frontier labs are willing to create for them.
Given that, the syntax does not matter all that much, as long as the base language itself is flexible enough - as a another wild idea, it's also possible that eg. Python could mostly swallow all these features through external tools (eg. the pre-existing type checkers or linters), and if the frontier labs bother to RL on those tools, that would also work (see also: Mojo).
From what I can tell, LLMs know/use patterns above the syntax and idioms of specific languages and the syntax and idioms of specific languages and how to apply the former to the latter.
The bottleneck isn't what languages the LLM can handle, but what I can handle coming out of the LLM. The general advice, then, is to use the language (and related setup/environment) you're familiar with.
There's a lot of stuff in Python's favor in regard to coding with LLMs: its wildly popular so there's a lot of references for the right and wrong ways to use it, it can be typed using included libraries - its as simple as telling the LLM "use typing for this", and there are several great lint and unit testing tools to cover the hallucinations and poor decisions. The flexibility seems like an advantage to me personally, but I've always been a Python stan.
Times change, and I work more in R&D space than on legacy codebases, but I still ask it to write something in Python then convert it to the actual language on occasion. I don't know if I'm tricking the context window, forcing alternate pathways, or both, but it works.
In the last year or so, I have been using LLMs, to assist my work, with generally, excellent results.
I have noticed that the LLM delivers much better PHP, than Swift. I seldom need to rewrite or correct, the PHP code I get from it, and am constantly correcting the Swift. Part of the reason, may be that I am a much better Swift programmer, than PHP programmer, and there’s just a lot more Swift code. I haven’t really taken the time to analyze it.
I have my theories, as to why, but it’s not something I’m really into researching. I’ve just noted the trend.
I don't really buy the intuition (aka Goroutines are more 'clear' than 'coloured' functions or threads), and there's no evidence presented for this either.
Although this could very well be true, I'm doubtful without seeing some real world data points.
The 'general premise' aka 'cosine similarity' may have been true before bit it may not be that anymore.
AI just pretty good at anything it's 'seen enough' and that's it, I think it's more likely a 'threshold' problem than an ability problem, at least for most things.
'Rust' may represent a different domain, given the very detailed nature of notation and the vast possibilities that arise from that.
So I think the author is saying that go is a simple language that tends to have less solutions to the same problem. I personally agree to that to a degree.
What I don't agree on is that we can choose what "low variance" is. There is a lot of go code out there, it's shape may have little "noise", but the variance is massive.
We have decades of compiler research, static code analysis etc, why do these extremely complicated black boxes of billions of parameters have to produce readable source code as their main output?
Without any typechecking, LLMs obviously find it harder to work agentically and validate their work.
With too much typechecking (I'm looking at you, rust), I've found agents get themselves stuck in local "architectural minima" and end up doing insane shit to mitigate ownership/borrow-checker issues inherent in the design they ended up with.
That said, if you're hands-on I think rust is a fantastic language for pairing with an LLM.
I've also been doing quite a bit of Rust for web services and wasm targets, which has worked exceedingly well... similarly with Tokio + Axum, etc.
I have seen very few issues with either of the above... that said, C# has been a bit more painful by comparison... I often rely on FastEndpoints for services and Grate for database migrations, and LLMs often get a bit tangled with those libraries in practice.
On the other hand, even if that were true, I don’t know how important it would actually be since LLMs can generalise across languages well.
It might be best to pick languages where it’s just harder to screw up, the canonical example being to prefer typescript over JavaScript.
They seem quite good at figuring this out in my experience
But as someone who is working in python since ages - I guess it is pretty much easy too, and as not as hard as you described. LOL, but whatever, your this post was really amazing
Additionally, fault-tolerant languages such as Erlang/Elixir allow me to not worry about the billions of edge-cases, and let Claude aggressively implement a mostly good-enough application. With LLMs, accepting a limited amount of failure may be a necessity (depending on the business/domain), and that's exactly what the BEAM enables.
This is another way of saying that the tools you equip the LLM affect their effectiveness, in other words, the harness you build around them matters and matters a lot.
At the end of the day, the language you pick, enriches the harness with the toolchain, libraries etc. it offers. This is most evident with the toolchain as the author mentioned but if you think about it, picking a specific framework that constraints the choices the model can make (e.g. the Ruby on Rails example) is also affecting the behavior it has.
The best language I have seen an LLM use was Kotlin. It actually surprised me how well it wrote the language. I wrote a project in it and I think I didn't have to correct it once. Like I was seriously impressed. I just wish Kotlin had better tooling so I didn't have to use gradle or maven lol.
* Chances are that fewer people (maybe even none) will look at the code when it's LLM-generated
* Amount of code being written isn't all that critical anymore
* Keeping patches small isn't that big of a deal anymore (because it's now the LLM's job to maintain it, not the human's)
All of this implies: boilerplate isn't a good reason to avoid a language anymore. (I hate this conclusion, because I hate boilerplate).
Then the question is: what kind of language can you use that buys safety with boilerplate? Probably a statically typed one, possibly with lots of asserts... Eiffel? I don't know if there's enough Eiffel code around the Internet to train LLMs, so maybe a more popular one would be better.
Maybe Java or C#? Haskell? OCaml?
The article suggests golang, and I think there are use cases where golang would be a good candidate.
It would be quite interesting to run an experiment: give separate instances of the same LLM coding agent the task to implement a specific application, and use different languages. Then compare quality, code size, runtime performance and token cost. Ideal would be a multi-stage development that better simulates a real development workflow (bug reports and new feature requests come in over time).
Just narrow your window of thought to easier problems for the LLM, and all of a sudden the LLMs do everything you want!
Reminds me of playing around with image generation models. Someone who's been practicing can crank out prompts for really impressive images back to back. But you try to use an everyday object or concept the model isn't trained on? Everybody will race to show off how smart they are by saying "just don't hold it like that."
Python __is__ a boring language (it is mature and well supported) with a somewhat convoluted package manager that has gotten a lot better since that xkcd came out.
Yeah, I get it, Go is better for distributing your code-- just one binary you can copy. But what does that have to do with "boring"?