https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0
(Disclosure: while I haven't talked with him in years, Tridge was my colleague and mentor for many years. I feel it is worth considering his view before joining a crusade)
original commit: https://github.com/RsyncProject/rsync/commit/d046525de39315d...
```
- if (!ptr)
- ptr = malloc(num * size);
- else if (ptr == do_calloc)
+ if (!ptr || ptr == do_calloc)
ptr = calloc(num, size);
```Written with claude. This is a good example of what slips through LLM attention. It forces all allocations to be calloc as if it is a strict upgrade. For large and recursive allocations, this becomes a significant cost.
reverted in https://github.com/RsyncProject/rsync/commit/7db73ad9a1b8721...
if you read the description of revert half carefully, it's easy to tell that even that was written by an LLM .
I can understand the sentiment of whoever posted the original thread.
- The release with the highest number of attributed bugs is the release _right before_ the first release with Claude-coauthored commits, released in January; is there a chance that unattributed LLM-authored commits made it into this release?
- The release attribution methodology is not great, since it will tend to attribute bugs introduced in a minor version update to the longest-lived patch release of that minor version. I doubt that 3.4.1 actually introduced a lot of bugs, but since it was released a day after 3.4.0, bugs that were introduced in that release get attributed to 3.4.1.
- Relatedly, more recent releases have had less time to have bugs filed against them, so there may be a bit of a bias toward evaluating recent releases as less buggy.
TFA is defending the use of AI, and it very clearly (to me) used AI to analyze the data and present the results.
In doing so, the author used statistics in a way they do not appear to understand, and ended up making numerous false claims (you can see the thread discussing these here https://news.ycombinator.com/item?id=48417626 )
In short, the study doesn't have sufficient statistical power, and is making "no difference" claims that aren't justified.
The meta-irony is this: the author used an LLM to interpret data in this study, and seems to have made the same category of mistake (confidently asserting falsehoods) that the study was supposed to be investigating (confidently submitting bad commits to the rsync project).
I don't know how to word this in a non-confrontational, respectful way, but this article just feels like ammo for your next "debate with your anti-AI ennemies" where you get to say "look, I proved with data that those people had a disproportionate reaction and have double standards, therefore anyone who dislikes LLMs or their impact are the same!". Like, sorry, I know that sounds really reductive, but this really is the vibe I get when reading this and your other replies where you repeatedly talk about "showing the hypocrisy and double standards".
The global LLM discourse has grown massive, it spans trillions of dollars in promises and investments and affects pretty much everyone, so it's the easiest thing in the world for both sides to just find some people being assholes in the other camp and say "look, here's how [other camp] behaves".
The irrational, extreme, and heinous reactions are partly bandwagoning, and you can go about your day thinking that anyone who reacts like that is evil. But if you wanna dig a bit further, you'll notice that the entire media sphere has been screaming in everyone's ears for a few years now, that they're expandable, low-value-human-capital. All the money in the world (exaggerating a little) is being spent on making sure to remind anyone who opens a computer, opens a website, looks at a billboard, or turns on his tv... that their boss really really really wants to replace them.
Now you'll say that the friendly rsync contributor has nothing to do with any of this and... well yeah he doesn't. You don't need to agree with an emotional response to understand where it's coming from, and even if you're still dead set on considering them "the enemy", then understanding why the anti-AI crowd reacts like that is STILL a positive for you.
If by fairest you mean to say that this analysis and response is sufficient, then I'm sorry but I have to disagree. We really need to understand if the nature of the bugs are worse from a user's perspective. Even if the rate stayed unchanged, if the result is the perceived quality of the software declined then I would personally consider that worse, especially if I were a project maintainer.
That's not meant to be wholly dismissive either. But in general, I don't think quantitative analysis alone is enough to fully answer this type of question.
* Why was v3.4.1 the most buggy, right before the Claude commits? Why did "nobody notice"? It's way to strange to just say welp, it must be human error. * Why does v3.4.2 have 0 bugs, or 0 bug score. And why was such an outlier (no other commit seemingly has this??) allowed to mix into aggregate statistics and bring all the "is Claude buggy?" scores down. Tbh idk how that _wasn't_ a red flag in the author's analysis...
This article feels like half of an analysis presented as a highly complex finished product due all the advanced stats they're running.
Here is the process I used to do it, which was way more complex than I thought it would be:
https://gist.github.com/e40/caa67c1b8d439a528695f996d0519d8e
Bugs per commit as a metric papers over severity, both in terms of security severity as well as the effect on the user. A mislabeled button has the same weight as the entire app crashing in this framework.
I was an AI skeptic some months ago but truly Claude and Codex have changed my development style and velocity in a way I never imagined would ever be possible. With that, yes, I produce more code and am finding more bugs.
So looking over at comments in HN articles the amount of polarising hate to anything produced with AI is quite surprising. Just because some AI helped or even produced entirely doesn't suddenly make a project 'vibe coded' as if that's meant to be some insult levelled at users of LLMs.
It reminds me a lot of when offshore outsources started getting more software development work from the mid-90s with all the derogatory remarks made towards 'Indian developers'. Now we're in the mid 2020s and similar remarks are made towards AI.
I don't get it. I really don't. What I do know for sure is more and more code will be AI generated with or without the detractors.
I don't have empirical evidence for this claim, but best I can tell, security patches are the principal source of observed bugs in software of a certain vintage, because they cause churn. (Just think of Windows updates that break drivers.)
This would be even harder to measure.
So the criticism was bad, and that somehow makes it ok to use a bad metric?
I run a smallish project with ~1k stars and I've stopped maintaining it last year because people feel like they're absolutely owed features or bug-fixes or whatever. It's tiring and a complete shame that author has to make such an insane deep dive into a random accusation that just caught on social media. I want to emphasize that this has nothing to do with AI, it's just tech tourists, consumers (as opposed to creators), and engagement farmers that have taken over. AI slop probably doesn't help, but the underlying issue has been brewing for at least a decade.
Also, the "making soup for the homeless & pissing in it" is not only an off-base analogy (software is pretty low on Maslow’s Hierarchy of Needs), but also somehow looks down on both people in need and the volunteers that help them. Just absolutely gross.
Even this report is full of claude-introduced bugs
There was one regression bug apparently (related to multiple destinations and the way people do backups), but all the attention/anger has been about a test suite that makes the rsync development better, more rigorous and copes with the onslaught of both good and bad AI generated PRs as well as hardening something that has two decades of C code in it.
People need to grow up and appreciate what others in the community (especially people like tridge) have provided.
But the reality is that if you were already set enough to call rsync slop because of a single post, you aren't going to be more down now. Even in these responses I see everyone nitpicking and moving goalposts as if one more commit being actually claude-aided will tip the scales from stable project to "vibe coded slop".
Software has always been fuzzy, we have never come up with an objective way to handle software quality, and this Uber hatred of llm contributions lets the humans who make egregious bugs and mistakes off the hook.
Taking a step back, we need to have more empathy and thoughtfulness of one another in this space. Its new and people are experimenting and there will be nothing good coming from personal insults and DDOsing a good project just because someone got ragebaited on threads, x, mastodon or whatever else.
How do we determine bugs and increase quality? Its almost like we have been grappling with this question for decades and I still hear people fight on the best way forward. Simple design, test driven development, user surveys, all of the above have been used as a proxy for software and they all failed to capture everything. Back in the day we used that ambiguity to give each other grace, now we use that ambiguity to tear down other creators. Whatever, if open source software really is dying its because of this toxic shit just as much as the llms
Your verbosity and sentence structure are not a problem. I hope that publishing this gives you a bit more confidence in your writing, because it's legitimately good.
Is this a configuration that's not common and thus not tested?
If people think they can do better, I want to see their forks and them keeping up with it.
https://github.com/RsyncProject/rsync/graphs/contributors?fr...
Hey, 'logicprog, your writing is fine!
Use LLMs to critique your writing, check its structure, vet your choice of topic sentences, check flow from graf to graf and section to section, look for passive voice and overused words. LLMs are fantastic for that. But don't use a single word an LLM suggests in your actual writing. If it suggests something really fucking good, too bad, those words are disqualified. It's an easy red line to adhere to, easier than it sounds, and it'll keep your writing human.
(You ended up somewhere around here anyways, but that was after you posted something with LLM-written language because you weren't confident enough in your own writing. The things you do "worse" than an LLM are what make you you; be protective of them!)
I think it will be up to some group in academia to make a real full blown study across several repositories.
There must be tons to learn on how LLMs have changed software development and perhaps the cleanest separation will simply be going by what repositories declare e.g. "No LLM involved" vs those that proudly do the opposite or are neutral.
Bugs is not the only variable of interest here. I am guessing someone is already doing this as we discuss it here...
The best approach I've tried that actually increases quality (and _may_ speed up development) is to write ~80% of the code yourself and then ask LLM to review it thoroughly. While it's doing its thing you're also thinking about the code and reviewing it yourself in parallel. You then merge the findings and fix stuff worth fixing. At this point the authorship of the code is still mostly yours, you _understand_ the system and you ship fewer bugs, slightly faster than otherwise. It's a moderate improvement to the workflow, but it actually doesn't cost nearly as much either, and definitely doesn't produce rage at the machine from the slop. The only downside is that it requires lots of discipline, and it's a relatively rare commodity among software engineers these days.
HN relatively, is a very intellectual part of the internet, yet even still, it's really common to see very uneducated opinions here. Not that everyone needs to be very educated, but posts with plainly wrong assumptions and biases shouldn't go completely unchecked so rampantly.
This is kind of a sad situation. Tridge is an excellect programmer and a very respected member of the community, and I totally get it. rsync, like most old C projects, has a lot of accumulated cruft, and things that would be nice to fix, and bugs. And those bugs come in at least three classes: semantic bugs, improper interactions with the OS, and memory safety bugs. And the author and long-time maintainer has the same problem as every other maintainer and team: not enough time to deal with everything. And now LLMs come along, and they are so, so seductive. They will fix your bugs if you ask them to. They will even find your bugs. And they're right a remarkably large fraction of the time. It's magic! You can write an agent loop or magic harness or swarm and let them do this on their own if you want. And so you start getting through your backlog, and it's fun, and you feel good, and you let your guard down. And you start having problems: - Your favorite LLM does not have the context that lives in your head. I use rsync because Tridge wrote a fine piece of software, and he knows how to write serious software, and I'm willing to accept that it's in C and therefore almost certainly has a safety bug or three. If I wanted to use claude-ersatz-rsync, I'd use that instead, but I really don't, TYVM. - Remember how LLMs are right a remarkable fraction of the time? The fraction is remarkable, but it's nowhere close to 100%. (Yet? Who knows. Right now, it's DEFINITELY nowhere near 100%.) - The training process for the current crop of LLMs does not adequately reinforce long-term maintainability of the outputs. And, for all the LLMs seem magic, they seem to love a workload in which they write code with poorly named functions and no docs and sort of assume that they can parse their own code down the road and figure out WTF is going on, and they are AT BEST only a tiny bit right. Because every project has interfaces where one module touches another, and every LLM has very limited context (larger than humans' in straight up verbatim working memory but MUCH MUCH WORSE than humans' (for now, anyway) in actual broad picture retention), and this workload doesn't work. If it did, we could give up on structured programming and just have the LLMs vomit up uncommented asm. And so, where humans have conventions and decently named functions and ideas that you shouldn't churn your code just for funsies (at least not in a production context), LLMs do this: https://github.com/RsyncProject/rsync/commit/30656c5e358b1c6... Most of that is blindly changing calls do functions like do_foo(args) (which makes sense) to do_foo_at(the same args), which makes no sense. Sorry, but the world of POSIXish-targetting programers (including, presumably, Claude) knows what _at means, and it means "at" the specified directory fd. Which is not specified in the call sites. It makes no sense at all. Buried in all that mess [0] is the implementations, which are sloppy. Seriously: - There's a function called do_utimensat_at. Is Claude stuttering? - There's a lovely comment in syscall.c:1660-1673 that's quite bad. It's handling strings that contain "/../" and such. If there's some actual contract that the function makes to its callers (and there surely is -- this is critical security-sensitive code), then SAY WHAT THE CONTRACT IS. Don't bury a partial explanation in a comment in the middle. - There's a repeated pattern: In do_foobar_at(path), there is, in effect: if (!path) do_foobar(path); Nice NULL pointer handling. Is NULL a valid argument or not? Why handle it by forwarding it to the less secure variant? - Those nice, supposedly secure "at" variants check for paths that start with '/' and forward to the raw insecure syscall. And they don't check for .. in the middle. So what, exactly, is the special code for .. promising to do? (See above.) I don't think more details are needed. But my take is that this whole thing is a mistake. I personally work on the sort of code where messes like this are entirely unacceptable. And using an LLM while maintaining the kind of oversight that prevents it is mentally taxing and not exactly fun. If you want to fix all the gunk in a C program like rsync by LLM magic, go rewrite it in Rust or something -- you're already exposing yourself to a massive rewrite and all the risks that entails, and you're pretty much guaranteeing a high level of sloppiness, so at least use a language that is more resistant to slop.
[0] Which GitHub doesn't even render by default because their diff viewer is so bad.
[There were follow-ups. See https://news.ycombinator.com/item?id=48352182]
Instead we have a shitstorm over presumably legit issue, for which the only source is some mastodon post.
One command that used to work in 3.4.1 and stopped working in 3.4.3. Just one! We could have already bisected the living shit out of this and go home, but no.
By the way, I did find this a bit hard to read but, as instructed by OP, I'll go fuck myself.
For what it's worth, I find AI written prose easy to read, and am annoyed by all the constant HN comments which just point out the author was AI, without anything else substantive to add.
> v3.4.3 has been out long enough that its rate (5.00) is already comparable to historical releases. The "wait and see" argument is an appeal to an unknowable future that shifts the burden of proof away from the critics. If more bugs surface, they will enter the distribution like every other release. There is no reason to expect a regime break.
I mean, as someone who uses LLMs, it might be a good idea to consider how one might limit the amount of bugs that will appear in the future at least a little bit: parallel iterative code review loops would probably be the easiest and most applicable to LLMs, though I guess test coverage and other code analysis tools help too.
Also if you write a paper where you get statistical conclusions out of whole 2 datapoints you'd be laughed out of the room
So what? You've saved a significant amount of time for a decent number of humans, and if those humans are working on other projects, the overall net output for the world is net positive compared to without LLMs.
You have to broaden your perspective. It's not just about how rsync was affected.
Can someone explain why one would ever use rsync (pre vibecode version) instead of cp and dd?
Can't we just 'apt remove rsync' and save ourselves the time even spent on evaluating this dependency?
Thanks
You can write for an audience or you can write for yourself. Which is fine either way but you shouldn't pass the blame for bad results on to your audience.
> and recieving almost no substantive input, discussion, or response on the actual content of the article
Well did you write it for that purpose?
> "Just wait, more bugs will surface" -- v3.4.3 has been out long enough
Wait for _more releases_. As your own data shows the bug rate is not consistent between releases. So this is probably not a worthwhile metric. Perhaps systems touched, new features included, or attempted fixes would be a better way to contextualize releases and the goals of the author.
What followed was extraordinary: 329 comments and counting, ranging from thoughtful concern to outright harassment.
The thread did not stop at words. One user posted My Little Pony drawings of themselves strangling the "project janitor that pushed vibecoded commits":
It spread to Hacker News and Lobsters, generating hundreds more comments.
This is false, it did not appear on Lobsters. Here is the function in the codebase that prohibits this kind of brigading: https://github.com/lobsters/lobsters/blob/main/app/models/st...Please correct your article.
[see https://news.ycombinator.com/item?id=48416020 for how all this happened in the first place]
Yes, it did. Here is some math showing that you shouldn’t care about that.
$ apt-cache policy rsync | grep Installed
Installed: 3.4.1+ds1-7ubuntu0.2
$ sudo apt-mark hold rsync
rsync set on hold.It's open source, no one is forcing you to use it.
If you don't trust the newer versions; use the old versions.
If you no longer like the maintainer because of reasons, fork it/start your own.
It's not that hard.
Storm in a teacup.