In my work, the bigger bottleneck to productivity is that very few people can correctly articulate requirements. I work in backend, API development, which is completely different from fullstack development with backend development. If you ask PMs about backend requirements, they will dodge you, and if you ask front-end or web developers, they are waiting for you to provide them the API. The hardest part is understanding the requirements. It's not because of illiteracy. It's because software development is a lot more than coding and requires critical thinking to discover the requirements.
On the other hand, I've enjoyed vibe coding Rust more, because I'm interested in Rust and felt like my understanding approved along they way as I saw what code was produced.
A lot of coding "talent" isn't skill with the language, it's learning all the particularities of the dependencies: The details of the Smithay package in Rust, the complex set of GTK modules or the Wayland protocol implementation.
On a good day, AI can help navigate all that "book knowledge" faster.
Without checks and feedback, LLMs can easily generate unsafe code. So even if they can generate C or Assembly that works, they’re likely to produce code that’s riddled with incorrect edge cases, memory leaks, and so on.
Also, abstraction isn’t only for humans; it’s also for LLMs. Sure, they might benefit from different kinds of abstraction - but that doesn’t mean “oh, just write machine code” is the way to go.
Vibe-coding a program that segfaults and you don't know why and you keep burning compute on that? Doesn't seem like a great idea.
Doubt. These things have been trained to emulate humans, why wouldn't they make the same mistakes that humans do? (Yes, they don't make spelling errors, but most published essays etc. don't have spelling errors, whereas most published C codebases do have undefined behaviour).
A language designed for vibe coding could certainly be useful, but what that means is the opposite of what the author thinks that means.
The author thinks that such a language wouldn't need to have lots of high-level features and structure, since those are things that exist for human comprehension.
But actually, the opposite is true. If you're designing a language for LLMs, the language should be extremely strict and wordy and inconvenient and verbose. You should have to organize your code in a certain way, and be forced to check every condition, catch every error, consider every edge case, or the code won't compile.
Such a language would aggravate a human, but a machine wouldn't care. And LLMs would benefit from the rigidness, as it would help prevent any confusion or hallucination from causing bugs in the finished software.
Because I want to be able to review it, and extend it myself.
edit: Pure vibe coding is a joke or thought exercise, not a goal to aspire to. Do you want to depend on a product that has not been vetted by any human? And if it is your product, do you want the risk of selling it?
I can imagine a future where AI coders and AI QA bots do all the work but we are not there yet. Besides, an expressive language with safety features is good for bots too.
The reason is that you want to have some kind of guidance from a larger perspective in the long run. And that is exactly what types and module systems provide. The LLM has to create code which actually type checks, and it can use type checking as an important part of verification.
If you push this idea further: use Lean, Agda or Rocq. Let the LLM solve the nitty gritty details of proof, but use the higher-level theorem formation as the vessel for doing great things.
If you ask for a Red-black tree, you get a red-black tree. If you ask for a red-black tree where all the important properties are proven, you don't have to trust the LLM anymore. The proof is the witness of correctness. That idea is extremely powerful, because it means you can suddenly lift software quality by an order of magnitude, without having to trust the LLM at all.
We currently don't do this. I think it's because proving software correctness is just 50x more work, and it moves too slow. But if you could get an amplifier (LLM) to help out, it's possible this becomes more in the feasible area for a lot of software.
If a project is important enough to require C or x86 assembly, where memory management and undefined behavior have real consequences, then it’s important enough to warrant a real developer who understands every line. It shouldn’t be vibe coded at all.
Python’s “adorable concern for human problems” isn’t a bug here, it’s a feature. The garbage collection, the forgiving syntax, the interpreted nature: these create a sandbox where vibe coded solutions can fail safely. A buggy Python script throws an exception. A buggy C program gives you memory corruption or security holes that show up three deployments later.
The question isn’t what language should AI write. It’s what problems should we trust to vibe coding. The answer: problems where Python’s safety net is enough. The moment you need C’s performance or assembly’s precision, you’ve crossed into territory that demands human accountability.
My philosophy regarding AI is that you should never have it do something you couldn't do yourself.
Of course people break this rule, or the concept of vibe coding wouldn't exist. But some of us actually get a lot of value from AI without succumbing to it. It just doesn't make sense to me to trust a machine's hallucinations for something like programming code. It fabricates things with such confidence that I can't even imagine how it would go if I didn't already know the topic I had it work on.
Both the author and I agree in that yes, it can.
Does it always generate good code?
Here is where the author and I disagree vehemently. The author implies that the ai-generated code is always correct. My personal experience is that it often isn't. Not even for big projects - for small bugfixes it also misunderstands and hallucinates solutions.
So no C or assembly for me, thank you very much.
Well, because you can do it in Fortran, of course!
What else do you want? Multidimensional arrays out of the box, fast loops, native cuda support, trivial build and packaging system, zero version churning... all of this just with the bare language. It's the anti-python! The perfect language, you could say! Strings and i/o are a bit cumbersome, agreed, but your llm can take care of these without any trouble, no matter the language.
- a lot of C code out there is not safe, so the LLM outputs that
- C encodes way less of the programmer's intention, and way more implementation details. So unless the author is extremely good at naming, encapsulating and commenting, the LLM just has less to work with. Not every C code is Sqlite/redis/ffmeg quality.
- the feedback loop is slower, so the LLM has less chance to brute force a decent answer
- there is no npm/pypi equivalent for C on which to train the LLM so the pool for training is less diverse
- the training pool is vastly Linux-oriented, with the linux kernel and distro system libs being very prominent in the training data because C programs on Windows are often proprietary. But most vibe coders are not on Linux, nor into system programming.
Sure, you can vibe code in C. Antirez famously states he gets superb ROI out of it.
But it's likely you'll get even better results with other languages.
What frontier models also excel at is writing their own libraries and skipping third-party dependencies. It's very easy for a human to just pick up a bloated 750kb library they're only going to actually use 15kb worth of its code for, BUT that library can serve as a valuable implementation model for someone very patient and willing to "reinvent the wheel" a little bit, which is definitely going to be AI and not me, because I just want to point to a black box and tell it what to do. For big things like web server, I'm still defaulting to Axum, but for things like making a JS soundbank parser/player or a WebGL2 mp4 & webm parser/demuxer & player, these are tasks frontier models are good for with the right prompting.
To an extent, maybe counter-intuitively, I think the big thing we'll see out of AI is an explosion of artisanship -- with humanoid robots in 2040, perhaps, our households may be making their own butter again, for example.
As to why not use C, or assembly, it’s not just about the code, but the toolchains. These require way more knowledge and experience to get something working than, say, Python - although that has its own rather horrible complexities with packaging and portability on the back end of the code authoring process.
And just in my experience, I feel everyone is slowly learning, all models are better at the common thing, they are better at bash, they are better at Python and JS, and so on. Everyone trying to invent at that layer has failed to beat that truth. That bootstrapping challenge is dismissed much too easily in the article in my opinion.
I'll admit that I'd like to do a programming challenge with or without AI that would be like "advent of code" in assembly but if it was actual "advent of code" the direct route is to write something that looks like a language runtime system so you have the dynamic data structures you need on your fingertips.
I love C. I came up on C. But C does not tell you a story. It tells you about the machine. It tells you how to keep the machine happy. It tells you how to translate problems into machine operations. It is hard to read. It takes serious effort to discern its intent.
I think any time you believe the codebase you're developing will have to be frequently modified by people unfamiliar with it, you should reach for a language which is both limiting and expressive. That is, the language states the code intent plainly in terms of the problem language and it allows a limited number of ways to do that. C#, Java (Kotlin) and maybe Python would be big votes from me.
And FYI, I came up on C. One of the first senior engineers I was tutored by in this biz loved to say, good code will tell you a story.
When you're living with a large, long lived codebase, essenti
Now, it is true that vibes results in producing a larger quantity of lower-level code than we would stomach on our own. But that has some consequences for the resulting maintenance challenge, since the system-as-a-whole is less structured by its boundaries.
I think a reasonable approach when using the tools is to address problems "one level down" from where you'd ordinarily do it, and to allow yourself to use something older where there is historical source for the machine to sample from. So, if you currently use Python, maybe try generating some Object Pascal. If you use C++, maybe use plain C. If there were large Forth codebases I'd recommend targeting that since it breaks past the C boundary into "you're the operator of the system, not just a developer", but that might be the language that the approach stumbles over the most.
I was thinking that this week. We are quickly reaching a point where the quality of the code isn't as important as the test suite around it and reducing the number of tokens. High level languages are for humans to read/write, if most people aren't reading the code we should just skip this step.
It's an ugly future but it seems inevitable.
After having understood the context, I still believe that a strongly typed language would be a much better choice of a language, for exactly the same reason why I wouldn't recommend starting a project in C unless there is a strong preference (and even then Rust would probably be better still).
LLMs are not perfect, just like humans, so I would never vibe code in any other environment than one in which many/most logical errors are by definition impossible to compile.
Not sure if C is worse than python/js in that respect (I'd argue it is better for some and worse for other, regarding safety) but Java, Swift, C#, Go, Rust, etc. are great languages for vibe coding since you have the compiler giving you almost instant feedback on how well your vibe coding is going.
> Thus, programs must be written for people to read, and only incidentally for machines to execute
But that's... terrible. Humans can barely communicate to each other. And now you wanna take our terrible communication, and make a machine try to guess what the hell we want to happen? You want a plane to operate like that?… well, you are wrong.
I recently gave the "vibe" AI the assignment of "using GTK [4, I think], establish a global shortcut key".
No amount of massaging the prompt, specifying the version of GTK, etc. could prevent it from just outright hallucinating the functions it wanted to call into existence. The entire reason I was asking was because I did not know what function to call, and was having difficulty discerning that from GTK's documentation. (I know how to do this now, and it is effectively undocumented.)
Prior to that, an assignment to determine some information from Alembic. Again, the AI desired to just hallucinate the functions it required into existence.
A script to fetch the merge queue length from GH. It decided to call GH's GraphQL API, which is fine, and doable for the task, but the query was entirely hallucinated.
A bash script to count files change in git. The code ran, and the output was wrong. The author did not check the LLM's code.
Even non-programming tasks are the same. Image generation is a constant fight of trying to get the AI to understand what you mean, or it just ignoring your prompts, etc. I went about 10 prompts trying to get an image with a stone statue of 4 ASCII characters in a field. The last character was consistently just wrong, and no amount of prompting to fix.
"Generate a character with a speech bubble that says 'Hi'" -> speech bubble has Japanese in it! (And the Japanese is gibberish, but if you ask AI to translate it, it "will".)
I prompt my agents to use proper OO-encapsulated idiomatic ruby paradigms. Your goal should be reduced cognitive load.
Even if you never write a line of code, you will still need to understand your problems to solve them.
"Vibe debugging" will get you stuck in loops of hallucinated solutions.
If an LLM is in fact capable of generating code free of memory safety errors, then it's certainly also capable of writing the Rust types that guarantee this and are checkable. We could go even further and have automated generation of proofs, either in C using tools similar to CompCert, or perhaps something like ATS2. The reason we don't do these at scale is that they're tedious and verbose, and that's presumably something AI can solve.
Similar points were also made in Martin Kleppmann's recent blog post [1].
[1]: https://martin.kleppmann.com/2025/12/08/ai-formal-verificati...
I don't get how the second follows from the first. One of the main complaints levelled against Rust is that it is not ergonomic for humans, specifically because it forces you to work to a set of restrictions that benefit the machine.
With an LLM coding agent that quickly produces volumes of somewhat-scattershot code, surely we're better off implementing incredibly onerous guardrails (that a human would never be willing to deal with)?
- The C-compiler. AI tools work better if their automated feedback loop via tools includes feedback on correctness, safety, etc. The C compiler is not great at that. It requires a lot of discipline from the programmer. There mostly isn't a compile time safety net.
- Macros add to this mess. C's macros are glorified string replacements.
- Automated tests are another tool that helps improving quality of vibe coded code. While you can of course write tests for C code, the test frameworks are a bit immature and it's hard to write testable code in C due to the lack of abstractions.
- Small mistakes can have catastrophic consequences (crashes, memory overflows)
- A lot of libraries (including the standard library) contain tools with very sharp edges.
- Manual memory management adds a lot of complexity to code bases and the need for more discipline.
- Weak/ambiguous semantics mean that it's harder to reason about code.
There are counter arguments to each of those things. Compilers have flags. There are static code analyzers. And with some discipline, it gets better. You could express that discipline in additional instructions for your agent. And of course people do test C code. There just are a lot of projects where none of that stuff is actively used. Vibe coding on those projects would probably be a lot harder than on a project that uses more structured languages and tools.
All these things make it harder to work with C code for humans; and for AIs. But not impossible of course. AI coding models are getting quite good at coding. Including coding in C.
But it makes it a poor default language for AI coding. The ideal vibe coding language for an AI would be simple, expressive, have great tools and compilers, fast feedback loops, etc. It means the AI has less work to do: shorter/faster feedback loops, less iterations and reasoning to do, less complex problems to solve, less ambiguity, entire categories of bugs that are avoided, etc. Same reasons as to why it is a poor choice for most human programmers to default to.
> But this leads me to my second point, which I must make as clearly and forcefully as I can. Vibe coding actually works. It creates robust, complex systems that work. You can tell yourself (as I did) that it can’t possibly do that, but you are wrong. You can then tell yourself (as I did) that it’s good as a kind of alternative search engine for coding problems, but not much else. You are also wrong about that. Because when you start giving it little programming problems that you can’t be arsed to work out yourself (as I did), you discover (as I did) that it’s awfully good at those. And then one day you muse out loud (as I did) to an AI model something like, “I have an idea for a program…” And you are astounded. If you aren’t astounded, you either haven’t actually done it or you are at some stage of grief prior to acceptance. Perfect? Hardly. But then neither are human coders. The future? I think the questions answers itself.
This cannot be repeated enough. For all the AI hype, if you think AI isn't the most useful programming tool invented in the last 20 years, you're ignorant of the SOTA or deeply in denial.
As @tptacek recently wrote:
> All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.
This exactly. Programming is art, because it comes from the soul. You can tackle a problem a million ways, but there's only one way that YOU would solve it.
Vibe coding feels like people who aren't creative, stealing everyone's creativity, and then morphing it into something they find appealing.
There is no skill.
There is no talent.
You are asking the machine to do ALL THE THINKING. All the little decisions, the quirks, the bugs, the comments to yourself. All the things that make a piece of code unique.
Perhaps the programming languages of the future will be designed with AI in mind: to properly put guardrails on it.
Could also be that the models just accelerate to the future so fast that they'll simply stop making mistakes. Then we'll be coding in Assembler, because why waste CPU time for anything else?
I've had the most success running Claude iteratively with mypy and pytest. But it regularly wants to just delete the tests or get rid of static typing. A language like Haskell augmented with contracts over tests wouldn't allow that. (Except for diverging into a trivial ad-hoc dynamic solution, of course.)
- Everything about rust enforcing correctness catches lots of bugs
- Using a high-level API means I can easily hand-check things in a repl
- In addition to tests, I required a full “demo notebook” with any PR — I should be able to read through it and confirm that all the functionality I wanted has actually been implemented
If the philosophy is (and it should be) “loc is free”, it’s worth thinking about how we can make LLMs produce more loc to give us additional comfort with correctness. Language choice is very much a way.
I do vibe code in C; I'm not a C programmer and I certainly couldn't do a security audit of any serious C codebase, but I can read and understand a simple C program, and debug and refactor it (as long as it's still quite simple).
And it's super fun! Being able to compile a little C utility that lives in the Windows tray and has a menu, etc. is exhilarating.
But I couldn't do that in assembly; I would just stare at instructions and not understand anything. So, yes for C, no for assembly.
C code that survives in the wild tends to be written by experienced devs who care about correctness. The JS corpus includes mountains of tutorial code, Stack Overflow copypasta, and npm packages with 3 downloads. In my experience, generated C is noticeably better—and this might be why.
Now personally I don't think this ultimate vibe-coding paradigm is just around the corner, but it does seem that it's the direction we're heading and I think this article does a good job of explaining why.
In C, without anything like a borrow checker or such, I'd be very worried about there being subtle pointer safety issues...
Now, some of that could be volumes of training data, etc, but Rust is widely discussed these days in the places these models are trained on, so I'm not certain it's a training problem vs a attention-to-detail across files problem. I.e., since LLMs are trained to mimic human language, programming languages that are most procedural-human-language-like (vs having other levels o fmeaning embedded in the syntax too) may exactly be those "LLM-friendly" languages.
But then again LLMs in their current form are trained on mountains of human language so maybe having them output human readable code makes sense at least for now
It is not portable to computers other than x86. It is one of the reasons I do not use x86 assembly much even though I have a x86 computer; I prefer C. It is not about vibe coding.
> I suppose what I’m getting at, here, is that if vibe coding is the future of software development (and it is), then why bother with languages that were designed for people who are not vibe coding? Shouldn’t there be such a thing as a “vibe-oriented programming language?” VOP. You read it here first.
Someone told me that two companies (one of which is Google) were working on such a thing, although I do not know the details (or if they were correct about that), and I do not know whether or not it resembles what is described in that article.
I do not use LLM myself, although I have seen a few examples of it. I have not seen very many so the sample size is too small, but what I have seen (from simple example programs), the program works although it is not written very well.
Yes, it can consistently generate code that works and seems to be on top of it all due to a lot of training data. But wouldn't that use up more tokens and computational resources to produce than e.g. the same program in python?
If using more complex languages requires more resources from LLM, the same principles apply. One-off scripts are better in high level languages. Hot paths executed millions of times a second are better in lower level languages with high optimisation potential.
LLMs might slightly shift the correct choice towards lower languages. E.g. a small one-off script in C is much more viable with LLM's help. But the moment one needs to reuse it, it grows, and needs to be modified, one might regret not using higher level language.
And these are systems that require a human in the loop to verify the output because you are ultimately responsible for it when it makes a mistake. And it will.
It’s not fun because it’s not fun being an appendage to a machine that doesn’t know or care that you exist. It will generate 1200 lines of code. You have to try and make sure it doesn’t contain the subtle kinds of errors that could cost you your job.
At least if you made those errors you could own them and learn from it. Instead you gain nothing when the machine makes an error except the ability to detect them over time.
I think if you don’t know C extremely well then there’s no point vibe coding it. If you don’t know anything about operating systems you’re not going to find the security bugs or know if the scheduler you chose does the the right thing. You won’t be able to tell the difference between good code and bad.
But code doesn’t only need to be understood for maintenance purposes: code is documentation for business processes. It’s a thing that needs to be understandable and explainable by humans anytime the business process is important.
LLMs can never / should never replace verifiability, liability, or value judgment.
C extensions for SQLite: https://simonwillison.net/2024/Mar/23/building-c-extensions-...
This one is closest to something I might use because it's C compiled to WebAssembly, so the blast radius for any dumb bugs is strictly limited: https://github.com/simonw/research/blob/main/cmarkgfm-in-pyo...
It's one thing to program as a hobby or to do programming in an institutional environment free of economic pressures like academia (like this educator), it's another thing to exist as a programmer outside that.
My partner was telling me her company is now making all their software engineers use ChatGPT Codex. This isn't a company with a great software engineer culture, but it's probably representative of the median enterprise/non SV/non tech start employer than people realise.
Or why not just produce a binary directly? It seems we've just invented a compiler.
I think that’s what makes it so common in codebases that have long term maintenance stories.
(I say that because my personal project has me reading great loads of C written by diverse authors and I am surprised at how easy it is to figure out, compared to most other languages)
Going forwards, when LLMs / coding tools are able to learn new languages, then languages designed for machines vs humans certainly makes sense.
Languages designed for robust error detection and checking, etc. Prefer verbosity where it adds information rather than succintness. Static typing vs dynamic. Contractual specification of function input/output guarantees. Modular/localized design.
It's largely the same considerations that make a language good for large team, large code base projects, opposite end of the spectrum to scripting languages, except that if it's machine generated you can really go to town on adding as much verbosity is needed to tighten the specification and catch bugs at compile time vs runtime.
So, vibecoding in C feels like playing with loaded gun.
My approach is evolving due to NixOS and home-manager with vibe coding to do the lifting. I increasing lean on vibe coding to handle simple details to safely write shell scripts (escaping strings, fml) and C/C++ apps. The complexity is minimized, allowing me to almost one-shot small utilities, and Nix handles long-term maintenance.
With NixOS, a simple C/C++ application can often replace a Python one. Nix manages reading the source, pulling dependencies, and effectively eliminating the overhead that used to favor scripting languages while marking marginal power savings during everyday use.
Or assembly, or binary
Yes, this is a completely valid take and it is the ultimate answer to why vibe coding, the way most people define vibe coding is a dead end.
The point is we want the LLM to generate code that is first and foremost readable by humans and structured in such a way that a human can take over control at any time.
If you think this is how LLM should generate code, congratulations we are already in agreement.
If you think programmers should not exist and that you will help your bottom line by reducing the number of programmers on your payroll or worse, completely eliminate programmers from your payroll by paying product managers who will never ever look at the code (which is required for vibe coding the way I understand it), then this question at the top is for you.
Many people I've seen have taken existing software and 'ported' it to more performant languages like C, Rust, etc.
LLMs are extremely efficient and good at translation.
The biggest question is maintainability and legibility. If you want it for your own proprietary software, this can (and probably is, generally) a good pattern if you can get the LLM to nail language specific challenges (e.g. memory allocation in C)
However, fewer people can write C code generally, and even fewer can use it to build things like UI's. So you're by definition moving the software away from a collaborative mechanism.
The abstraction layers were built for human maintenance. LLMs don't need that.
Also vibe coding is a mistake, it will undoubtedly turn anything more elaborate than a simple script into a monstrosity. DO NOT TELL AI TO WRITE MORE THAN A FUNCTION OR PART OF A FUNCTION.
I would absolutely love to teach programming to non-programmers. I have also been offered a job at the technical school where I graduated. But remembering how uninterested the vast majority of my classmates were back then discouraged me from even trying. I guess what I'd want is a teach a room full of people excited to learn about programming.
This is a big issue, personally. I write Python and bash these days and I love that we're not bottlenecked by IDE-based autocomplete anymore, especially for dynamic languages and a huge amount of fixing and incremental feature work can be done in minutes instead of days thanks to AI being able to spot patterns. Simultaneously I'm frustrated when these agents fail to deliver small changes and I have to jump in and change something I don't have a good mental model of or, worse still, something that's all Greek to me, like Javascript.
The problem here is that human languages are terrible programming languages and LLMs are terrible compilers.
We can't teach AI to code in languages that do not have human ergonomics because, as of now, all AI is based on human example.
I must take issue with the central point however: the machines of LLMs are very different than the machines of CPUs. While it is true that Claude writes fewer memory errors than even expert C programmers (I’ve had to fully accept this only this week), the LLM is still subject to mistakes that the compiler will catch. And I dare say the category of error coding agents commit are eerily similar to those of human developers.
Of course in practice I think the author is actually correct - LLM's struggle more than humans with sophisticated formal constraints and less than humans with remembering to write a bunch of boilerplate. But I think it's a pretty counterintuitive result and I'd love to have seen more discussion of it.
Or... I want to only write the tests. The implementation is... an implementation detail!
Also, like others said, even once you have your formal spec, C is a particularly bad choice (unless you want to specify quite a bit more). You want the program implemented in a language with as many safety constraints on it as possible, not one where you have to mentally track memory.
Also I want to understand the code as much as possible - sometimes the agent overcomplicates things and I can make suggestions to simplify. But if it's writing in a language only it can understand, that's going to be harder for me.
Unless it's an existing project where migration is too costly, C is just entering a time wasting pact along with a lot of other people that like suffering for free.
(https://chatgpt.com/share/693891af-d608-8002-8b9b-91e984bb13...)
* boring and straightforward syntax and file structure: no syntax sugar, aliases, formatting freedom that humans cherish, but machines are getting confused, no context-specific syntax.
* explicitness: no hidden global state, shortcuts and UB
* basic static types and constraints
* tests optimized for machine evaluation
etc.
On the topic: I feel like we still need at least a few more innovations in the space before we can rely on them to work in areas where we as humans still have trouble (that pesky training data!). Even when providing documentation, I still find LLMs to often have trouble creating code in newer versions of libraries.
My biggest fear with LLMs is that it will steer a lot of development into a more homogenous space over time (even just with the types and versions of libraries it chooses when vibing).
If you don't give a damn about integrity though, then may as well get funky with it. Hell, go hard: Do it in brainfuck and just let it rip.
Alternatively, use a language like ZL that embeds C/C++ in a macro-supporting, high-level language (eg Scheme). Encode higher level concepts in it with generation of human-readable, low-level code. F* did this. Now, you get C with higher-level features we can train AI's on
IF this is true, you have bad PMs.
My experience with LLMs is that they are not good at tracking resources and perform much better with languages that reduce cognitive load for humans.
Agreed. It's like replacing a complex and fulfilling journey with drugs.
That has not been my experience when using Codex, Composer, Claude, or ChatGPT.
Things have just gotten to the point over the last year that the undefined behavior, memory safety, and thread safety violations are subtler and not as blindingly obvious to the person auditing the code.
But I guess that's my problem, because I'm not fully vibing it out.
If you could invent a language that is somehow tailored for vibe coding _and then_ produce a sufficient high quality corpus of it to train the AI on them, that would be something.
I was really hoping you were going to make this argument, based upon the title of the piece! Still a good read, but if you have the inclination I hope you loop back around to weighing the pros and cons of vibe coding in different languages
Because you would not be able to audit the code if you don't (you'll be terribly slow to read and understand the inner flows correctly and that's if these aren't so bad that would do you some brain damage).
Dang, AI is pushing us all to become managers.
I find CUX to be very intuitive for prototyping. But my game is Language and HCI at heart, logic that allows the development process to go smoothly. It is certainly not for everyone or every project.
If I'm making a C#/WPF app, I can't just decide to make part of it C.
I get it's just a generalised criticism of vibe coding, but "why not use a harder language then" doesn't seem to make any sense.
I used to say that implementation does not matter, tests should be your main focus. Now I treat every bit of code I wrote with my bare hands like a candy, agents have sucked the joy out of building things
Claude hilariously refused to rewrite my rails codebase in Brainfuck…not that I really expected it to. But it went on a hilarious tirade about how doing so was a horrible idea and I would probably fired if I did.
Tons of people make the above claim and tons of other people make the exact opposite claim that it’s just a search engine and it can’t actually code. I’m utterly shocked at how two people can look at ground truth reality and derive two different factual conclusions. Make no doubt one of these two people is utterly wrong.
The only thing I can conclude is many people are either in denial or outdated. A year ago none of what this man said was true.
While I am not a big fan of Rust, the philosophy is likely useful here. Perhaps something like it, with a lot of technical validation pushed to the compiler, could actually be really useful here.
Getting rid of the garbage collector with no major increase in human cognitive load might actually be a big win.
Then show us this robust, complex code that was produced by vibe coding and let us judge for ourselves.
C is actually pretty good, if you can manage to architect your project cohesively
The author's point is correct IMO. If you have direct mappings between assembly and natural language, there's no functional need for these intermediate abstractions to act as pseudo-LUIs. If you could implement it, you would just need two layers above assembly: an LLM OS [1], and a LUI-GUI combo.
However, I think there's a non-functional, quality need for intermediate abstractions - particularly to make the mappings auditable, maintainable [2], understandable, etc. For most mappings, there won't be a 1:1 representation between a word and an assembly string.
It's already difficult for software devs to balance technical constraints and possibilities with vague user requirements. I wonder how an LLM OS would handle this, and why we would trust that its mappings are correct without wanting to dig deeper.
[1] Coincidentally, just like "vibe coding", this term was apparently also coined by Andrej Karpathy.
[2] For example, good luck trying to version control vectors.
I suspect the assembly would not be as highly optimized as what a modern C compiler would output.
I propose WASM, or an updated version of it
I sincerely hope the author is joking.
Including shooting yourself in the foot.
/Rust
It produces hot garbage when it needs to bring together two tokens from the far ends of a large code base together.
This comes as no surprise to anyone who understands what the attention mechanism actually is, and as a great surprise to everyone who thinks transformers are AI magic.
I would appreciate a post with examples, not just prose. It helps to put things in a more grounded reality.
for all the same reasons I wouldn’t have done it in C a decade ago!
Plus, now: credit limits!
No one (other than computer people) wants computers and software, they want results.
This generation of AI will be used to bootstrap the next generation of AI.
Programmers getting excited about vibe coding is like candlemakers getting excited about installing electric lights in their shops, so they can make more candles!
Currently, using Claude to vibe code Rust is _much_ more hit-or-miss than using it for Python... so Python has become the lingua franca or IR I use with it.
Often I'll ask Claude to implement something in Python, validate and correct the implementation, and in a separate session ask it to translate it from Python to Rust (with my requirements). It often helps.
Claude is particularly bad at hallucinating the APIs of Crates, something it does a lot less for python.
I spend all day in Claude Code, and use Codex as a second-line code reviewer.
They do not create robust systems. They’ve been a huge productivity boost for me in certain areas, but as soon as you stop making sure you understand every line it’s writing, or give it a free reign where you’re not auto-approving everything, the absolute madness sets in.
And then you have to unpick it when it’s trying to read the source of npm because it’s decided that’s where the error in your TypeScript project must lie, and if you weren’t on top of the whole thing from the start, this will be very difficult.
Don’t vibe-code in C unless you are a very strong C developer who can reliably catch subtle bugs in other people’s code. These things have no common sense.
Or greater hell, why not binary?
If we ignore human readability (as the author suggests), the answer is context. The token count explodes as you fall down the abstraction rabbit hole. Context consumption means worse reasoning.
In turn, this means expressiveness matters to LLMs just as much as it matters to us. A functional map reduce can be far simpler to write and edit than an imperative loop. Type safety and borrow checking free an LLM from having to reason about types and race conditions. Interpreted languages allow an LLM to do rapid exploration and iteration. Good luck with live reloading a GUI in C.
And if you were to force the LLM to do all that in C, at some point it might decide to write its own language.
The second and more important point is that what makes coding simpler for humans is the ability of the language to facilitate communication so you don't have to translate, now LLMs are good at translation, but it's still work. Imagine you have an implementation of a program and you want to see what it does, for any non-trival program you must scan millions of tokens, are current LLMs even physcially capable of attending to that? nonono we need names.
Besides how can you even vibecode if you aren't (just) using names.
This is such a bad take. I'm convinced that engineers simply don't understand what the job is. The point was never "does it output code that works", the point is "can it build the right thing in a way that is maintainable and understandable". If you need an LLM to understand the output then you have failed to engineer software.
If all you're doing is spitting out PoCs and pure greenfield development then I'm sure it looks very impressive, as the early language models did when it looked like they were capable of holding a conversation. But 99% of software engineering is not that kind of work.
For this it’s the web deployment target and fast compile times rather than the language itself that is useful.
Routinely I am sent code that works, but obviously nobody has looked at. Because they don’t even actually know what the code does.
For prototyping this is quite great. I think it’s the biggest competitor to tools like figma, because writing actually functional program with access to real APIs beats mocks. Now, how often will these end up in production and blow everything up…
A legitimate point, there are lots of performance and fine grain changes you can make, and it's a simple, common language many people use. Perhaps we could realize some of these benefits from a simple, fast language.
> Or hell, why not do it in x86 assembly?
A terrible take imo. This would be impossible to debug and it's complex enough you likely won't see any performance improvements from writing in assembly. It's also not portable, meaning you'd have to rewrite it for every OS you want to compile on.
I think there's an argument that if machines are writing code, they should write for a machine optimized language. But even using this logic I don't want to spend a bunch of time and money writing multiple architectures, or debugging assembly when things go wrong.
Recently I've been preparing a series that teaches how to use AI to assist with coding, and in preparation for that there's this thing I've coded several times in several different languages. In the process of that, I've observed something that's frankly bizarre: I get a 100% different experience doing it in Python vs C#. In C#, the agent gets tripped up in doing all kinds of infrastructure and overengineering blind alleys. But it doesn't do that when I use Python, Go, or Elixir.
My theory is that there are certain habits and patterns that the agents engage with that are influenced by the ecosystem, and the code that it typically reads in those languages. This can have a big impact on whether you're achieving your goals with the activity, either positive or negative.
Which will probably store up a problem for the future, with an outsized amount of programs written in languages that were popular on SO when models started learning...
I'm planning to, why bother with react when I can jump straight into WASM?
No, it absolutely doesn't. We've seen so much vibe coded slop that it's very clear that vibe coding produces a hot mess which no self respecting person would call acceptable. No idea how you can say this as it isn't remotely true.
In the game we're building we generate, compile and run code (C#) in real time to let the player "train and command" its monster in creative ways. So, I've thought about this.
You need both a popular language and one that has a ton of built-in verifying tools.
The author correctly highlights the former, but dismisses the latter as being targeted to humans. I think it is even more important for LLMs!
These coding agents are excellent at generating plausible solutions, but they have no guarantees whatsoever. So you need to pair them with a verifying system. This can be unit tests, integration tests, static / type checks, formal methods, etc. The point is that if you don't have these "verifier" systems you are creating an open loop and your code will quickly devolve to nonsense [0].
In my view, the best existing languages for vibe coding are: - Rust: reasonably popular, very powerful and strict type system, excellent compiler error messages. If it compiles you can be confident that a whole class of errors won't exist in your program. Best for "serious" programs, but probably requires more back and forths with the coding agent. - TypeScript: extremely popular, powerful type system, ubiquitous. Best for rapid iteration. - Luau: acceptably popular, but typed and embeddable. Best as a real-time scripting sandbox for LLMs (like our use case).
I think there is space for a "Vibe-Oriented Programming" language (VOP as the author says), but I think it will require the dust to settle a bit on the LLM capabilities to understand how much can we sacrifice from the language's lack of popularity (since its new!) and the verifiability that we should endow it with. My bet is that something like AssemblyScript would be the way to go, ie, something very, very similar to an existing, typed popular language (TS) but with extra features that serve the VOP needs.
Another aspect to consider besides verifiability is being able to incrementally analyze code. For structured outputs, we can generate guaranteed structures thanks to grammar-based sampling. There are papers studying how to use LSPs to guide LLM outputs at the token level [1] . We can imagine analyzers that also provide context as needed based on what the LLM is doing, for example there was this recent project that could trace all upstream and downstream information flow in a program thanks to Rust's ownership features [2].
Finally, the importance of a LLM-coding friendly sandbox will only increase: we already are seeing Anthropic move towards using LLMs to generate script as a way to make tool calls rather than calling tools directly. And we know that verifiable outputs are easier to hillclimb. So coding will get increasingly better and probably mediate everything these agents do. I think this is why Anthropic bought Bun.
[0] very much in the spirit of the LLM-Modulo framework: https://arxiv.org/pdf/2402.01817 [1] https://proceedings.neurips.cc/paper_files/paper/2023/file/6... [2] https://cel.cs.brown.edu/paper/modular-information-flow-owne...
It was Schadenfreude watching the CEO's son (LLM guys) implode a public-facing production server ( https://en.wikipedia.org/wiki/Dunning-Kruger_effect .)
Slop content about slop code is slop recursive. Languages like C are simply very unforgiving to amateurs, and naive arbitrary code generators. Bad workmanship writes bad code in any language. Typically the "easier" the compiler is to use... the more complex the failure mode. =3
Vibe coders usually offer zero workmanship, and are enamored with statistically salient generated arbitrary content. https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev...
For reference, here are the two heavy-lifting workers:
- https://github.com/akaalias/bipscan/blob/main/src/c/find_seq...
- https://github.com/akaalias/bipscan/blob/main/src/c/check_se...
and here's a screenshot of the thing running:
- https://x.com/SpringStreetNYC/status/1996951130526425449/pho...
and here's the full story:
LOL, I got 100% nerd-sniped by my friend Sönke this week and wound up building a small spaceship.
On Monday he's like "Hey, what if you found obscure seed phrases embedded in public texts? You'd only need to remember the name of the book and the paragraph and go from there."
I honestly could care less about crypto(currencies) and I'm 100% sure this is like cryptanalysis 101. But, yeah, it seemed like an interesting problem anyways.
First, I downloaded a few hundred books from Gutenberg, wrote a ruby script and found BIP39 word sequences with a tolerable buffer for filler-words.
Then, I was like, okay, gotta now check them against actual addresses. Downloaded a list of funded ETH addresses. Wrote the checker in ruby. Ran it. No hits but this was now definitely weirdly interesting.
Because: And what if I downloaded the whole pg19 text corpus to scan! And what if I'd add BTC addresses! And what if I checked every permutation of the seed phrase!
Everything got really slow once I got to processing 12G of raw text for finding sequences and then checking a few million candidates with 44.000+ variations per candidate.
So, let's rewrite this into C! And since I've got 16 cores, let's parallelize this puppy! And since it's a MacBook, let's use GCD! Optimize all the things!
Lol, so NOW this thing is so fucking FAST. Takes four minutes to go through the full pg19 corpus and generates 64,205,390 "interesting" seed phrases. The fully parallelized checker (see Terminal screenshot) processes 460 derived addresses per second.
I really don't care if I get a match or not. I feel like I started with building a canoo and wound up with a spaceship is in itself just the best thing in the world.
But anyway. That’s all besides the point. Because the progress apologists[1] come in all shapes and forms (we are lead to believe), now also uber-passionate college professor who aah loves programming as much as the day he met her. But unlike you he’s a hard-prostheticed pragmatist. He both knows and sympathises with your “passion” but is ready to assert, in a tptacek-memetic style, that it is the way it is—and if you think otherwise (pause for effect), you are wrong.
Because don’t you see? Why are you so blind? No, we can’t let the chips fall as they may and just send you a “told you so” letter once everything you know-now is resolutely quaint. No, we must assert it right now. (So you don’t miss out on the wonderful ride.)
Aah the text complains. It saddens me to think of “coding by hand” becoming a kind of quaint Montessori-school... Oh, the twists and turns of the turbulent text, so organic. Just like your mind. But awake.
The room is at this point drenched in a mist of farts. Yes, programming by-hand, I think we ought to call it a quaintism at this point.
And did you know: people used to resist mechnical computers. Hmm? Yes, indeed, favoring people computers. The text prompts for another model to make an image of a person smirking so hard that their eyes become kind of diagonal and their cheeks disappear. But not in an evil cartoon character way. In a human way. That three years ago felt slightly off-putting. Now just looks like, well, you know.
- - -
Ahh. (Again.) These fools. With their hand-coding. Do they really think they will be employable three years from now? Well, no matter. I have a PhD from MIT along with my associate professorship. I only came out here to Iowa Community College because of my disabled son. Needed more time with him. And to get away from dat citation grind. Man. I have many organic hobbies. And a few very, really incredibly specific collections, as is fitting. puffs pipe Mmm yeah what do I care, so what if programming is quaint now—I’m already in my “ivory tower”, baby. People will listen to my takes on AI. They are appropriately detached, informal, just saying it like it is, you know? And if they don’t? Well, there’s an army of texts right behind me. They’ll be convinced to suppress any feelings of alienation eventually. Eventually, there will just be their own vanishing, small-minded, petty, “thoughts” on the matter. That tiny holdout. Against all content they can sense.
[1] Insert scare quotes here. All history is whitewashed. “We” progressed and defeated “them”. It’s all just a linear curve. No critical thinking is supposed to occur here. Those idiots thirty years ago used reusable underwear and had to load detergent into a washing machine and then even bend over to turn on a “button” to make the underwear reusable. Our underwear costs fifty cents, is made from the most comfortable plastic you can get, and dissolves and crumbles when it gets into contact with water; down the bathroom drain it goes.
No it doesn't. Just for the fun of it because I'm somewhat familiar with the VLC codebase I tried to fix some bugs with "agentic tooling" and "vibe coding". And it just produces crap. Which is one metric I'd propose for the usefulness of these tools, why aren't they fixing real bugs in the large open source codebases of this world? You'd be a hero, VLC has like 4000 open issues.
The answer is of course because these tools, in particular in manual memory managed languages which the author proposes to use, don't work at all. Maybe they work on a toy project of 500 lines of code, which is all every demo ever produces, but these text based systems have no actual understanding of the hardware underlying a complex program. That's just not how they work.