It has also enabled a few people to write code or plan out implementation details who haven't done so in a long (sometimes decade or more) time, and so I'm getting some bizarre suggestions.
Otherwise, it really does depend on what kind of code. I hand write prod code, and the only thing that AI can do is review it and point out bugs to me. But for other things, like a throwaway script to generate a bunch of data for load testing? Sure, why not.
Last year I was working on implementing a pretty big feature in our codebase, it required a lot of focus to get the business logic right and at the same time you had be very creative to make this feasible to run without hogging to much resources.
When I was nearly done and worked on catching bugs, team members grew tired of waiting and starting taking my code from x weeks ago (I have no idea why), feeding it to Claude or whatever and then came back with a solution. So instead of me finishing my code I had to go through their version of my code.
Each one of the proposals had one or more business requirements wrong and several huge bugs. Not one was any closer to a solution than mine was.
I had appreciated any contribution to my code, but thinking that it would be so easy to just take my code and finishing it by asking Claude was rather insulting.
At work, the devs up the chain now do everything with AI – not just coding – then task me with cleaning it up. It is painful and time consuming, the code base is a mess. In one case I had to merge a feature from one team into the main code base, but the feature was AI coded so it did not obey the API design of the main project. It also included a ton of stuff you don’t need in the first pass - a ton of error checking and hand-rolled parsing, etc, that I had to spend over a week unrolling so that I could trim it down and redesign it to work in the main codebase. It was a slog, and it also made me look bad because it took me forever compared to the team who originally churned it out almost instantly. AI tools are not good at this kind of design deconflicting task, so while it’s easy to get the initial concept out the gate almost instantly, you can’t just magically fit it into the bigger codebase without facing the technical debt you’ve generated.
In my personal projects, I get to experience a bit of the fun I think others are having. You can very quickly build out new features, explore new ideas, etc. You have to be thoughtful about the design because the codebase can get messy and hard to build on. Often I design the APIs and then have Claude critique them and implement them.
I think the future is bleak for people in my spot professionally – not junior, but also not leading the team. I think the middle will be hollowed out and replaced with principals who set direction, coordinate, and execute. A privileged few will be hired and developed to become leaders eventually (or strike gold with their own projects), but everyone in between is in trouble.
I know my mind fairly well, and I know my style of laziness will result in atrophying skills. Better not to risk it.
One of my co-workers already admitted as much to me around six months ago, and that he was trying not to use AI for any code generation anymore, but it was really difficult to stop because it was so easy to reach for. Sounded kind of like a drug addiction to me. And I had the impression he only felt comfortable admitting it to me because I don't make it a secret that I don't use it.
Another co-worker did stop using it to generate code because (if I'm remembering right) he can tell what it generates is messy for long-term maintenance, even if it does work and even though he's new to React. He still uses it often for asking questions.
A third (this one a junior) seemed to get dumber over the past year, opening merge request that didn't solve the problem. In a couple of these cases my manager mentioned either seeing him use AI while they were pairing (and it looked good enough so the problems just slipped by) or saw hints in the merge request with how AI names or structures the code.
Most of my gripes are with the harness, CC is way better.
In terms of productivity I'm def 2-4X more productive at work, >10x more productive on my side business. I used to work overtime to deliver my features. Now I work 9-5 and am job hunting on the side while delivering relatively more features.
I think a lot of people are missing that AI is not just good for writing code. It's good for data analysis and all sorts of other tasks like debugging and deploying. I regularly use it to manage deployment loops (ex. make a code change and then deploy the changes to gamma and verify they work by making a sample request and verifying output from cloudwatch logs etc). I have built features in 2 weeks that would take me a month just because I'd have to learn some nitty technical details that I'd never use again in my life.
For data analysis I have an internal glue catalog, I can just tell it to query data and write a script that analyzes X for me.
AI and agents particularly have been a huge boon for me. I'm really scared about automation but also it doesn't make sense to me that SWE would be automated first before other careers since SWE itself is necessary to automate others. I think there are some fundamental limitations on LLMs (without understanding the details too much), but whatever level of intelligence we've currently unlocked is fundamentally going to change the world and is already changing how SWE looks.
Professionally, I have had almost no luck with it, outside of summarizing design docs or literally just finding something in the code that a simple search might not find: such is this team's code that does X?
I am yet to successfully prompt it and get a working commit.
Further, I will add that I also don't know any ICs personally who have successfully used it. Though, there's endless posts of people talking about how they're now 10x more productive, and everyone needs to do x y an z now. I just don't know any of these people.
Non-professionally, it's amazing how well it does on a small greenfield task, and I have seen that 10x improvement in velocity. But, at work, close to 0 so far.
Of the posts I've seen at work, they typically tend to be teams doing something new / greenfield-ish or a refactor. So I'm not surprised by their results.
This year I grudgingly bit the bullet and began using AI tools, and to my dismay they've been a pretty big boon for me, in this case. Not just for code generation - they're really good at probing the monolith and answering questions I have about how it works. Before I'd spend days pouring over code before starting work to figure out the right way to build something or where to break in, pinging people over in India or eastern Europe with questions and hoping they reply to me overnight. AI's totally replaced that, and it works shockingly well.
When I do fall back on it for code generation, it's mostly just to mitigate the tedium of writing boilerplate. The code it produces tends to be pretty poor - both in terms of style and robustness - and I'll usually need to take at least a couple of passes over it to get it up to snuff. I do find this faster than writing everything out by hand in the end, but not by a lot.
For my personal projects I don't find it adds much, but I do enjoy rubber ducking with ChatGPT.
In 10m Gemini correctly diagnosed and then fixed a bug in a fairly subtle body of code that I was expecting to have to spend a couple hours working on.
I spent much of the past week using Gemini to build a prototype of a clean new (green field) system involving RPCs, static analysis, and sandboxing. I give it very specific instructions, usually after rounds of critical design discussions, and it generates structurally correct code that passes essentially valid tests. Error handling is a notable weakness. I review the code by hand after each step and often make changes, and I expect to go over the over the whole thing very carefully at the end, but it has saved me many hours this week.
Perhaps more valuable than the code has been the critical design conversation, in which it mostly is fluent at the level of an experienced engineer and has been able explain, defend, and justify design choices quite coherently. This saved time I would otherwise have spent debating with coworkers. But it’s not always right and it is easily led astray (and will lead astray), so you need a clear idea in mind, a firm hand, and good judgment.
A couple "win" examples: add in-text links to every term in this paragraph that appears elsewhere on the page, plus corresponding anchors in the relevant page parts. Or, replace any static text on this page with any corresponding dynamic elements from this reference URL.
Lose examples: constant, but edit format glitches (not matching searched text; even the venerable Opus 4.6 constantly screws this up), unnecessary intermediate variables, ridiculously over-cautious exception-handling, failing to see opportunities to isolate repeated code into a function, or to utilize an existing function that exactly implements said N lines of code, etc.
I'm enjoying myself so much. Projects I've been thinking about for years are now a couple of hours of hacking around. I'm readjusting my mental model of what's possible as a single developer. And I'm finally learning Go!
The biggest challenge right now is keeping up with the review workload. For low stakes projects (small single-purpose HTML+JS tools for example) I'm comfortable not reviewing the code, but if it's software I plan to have other people use I'm not willing to take that risk. I have a stack of neat prototypes and maybe-production-quality features that I can't ship yet because I've not done that review work.
I mainly work as an individual or with one other person - I'm not working as part of a larger team.
I have 10 years of experience. I am a reasonable engineer. I can tell you that about half of the hype on twitter is real. It is a real blessing for small teams.
We have 100k DAU for a consumer crud app. We built and maintain everything in-house with 3 engineers. This would have taken atleast 10 engineers 3-4 years back.
We don't have a bug list. We are not "vibe coding" , 2 of us understand almost all of the codebase. We have processes to make sure the core integrity of codebase doesn't go for a toss.
None has touched the editor in months.
Even the product folks can raise a PR for small config changes from slack.
Velocity is through the roof and code quality is as good if not better than when we write by hand.
We refactor almost A LOT more than before because we can afford to.
I love it.
I have a lot of experience, low and high level. These AI tools allow me to "discuss" possibilities, research approaches, and test theories orders of magnitude faster than I could in the past.
I would roughly estimate that my ability to produce useful products is at least 20x. A good bit of that 'x' is because of the elimination of mental barriers. There have always been good ideas I had which I knew could work, but I also knew that to prove that they could work would take a lot of focus and research (leveling up on specific things). And that takes human energy - while I'm busy also trying to do good things in my day job.
Now I have immensely powerful minions and research assistants. I can test any theory I have in an hour or less.
While these minions are being subsidized in the wonderful VC way, I can get a lot of done. If the real costs start to bleed through, I'll have to scale back my explorations. (Because at a point, I'll have to justify testing my theories against spending 2-300$.)
To your questions, I'm usually a solo builder anyway. I've built serious things for serious companies, but almost always solo. So that's quite a burden. And now I'm weary of all that corporate stuff, so I build for myself. And what a joy it is, having these powertools.
If I were in a company right now, I could absolutely replace a team of 5 people with me + AI... assuming the CTO wasn't the (usual) limiting factor.
We got broad and wide access to AI tools maybe a month ago now. AI tools meaning claude code, codex, cursor and a set of other random AI tools.
I use them very often. They've taken a lot of the fun and relaxing parts of my job away and have overall increased my stress. I am on the product side of the business and it feels necessary for me to have 10 new ideas and now the ones with the most ideas will be rewarded, which I am not as good at. Ive tried having the agents identify opportunities for infra improvements and had no good luck there. I haven't tried it for product suggestions but I think it would be poor at that too.
I get sent huge PRs and huge docs now that I wasnt sent before with pressure to accept them as is.
I write code much faster but commit it at the same pace due to reviews taking so long. I still generate single task PRs to keep them reviewable and do my own thorough review before hand. I always have an idea in ny head about how it should work before getting started, and I push the agent to use my approach. The AI tools are good at catching small bugs, like mutating things across threads. I like to use it to generate plans for implementation (that only I and the bots read, I still handwrite docs that are broadly shared and referenced).
Overall, AI has me nervous. Primarily because it does the parts that I like very well and has me spending a higher portion of my job on the things I dont like or find more tiresome.
We have cursor with essentially unlimited Opus 4.6 and it’s fundamentally changed my workflow as a senior engineer. I find I spend much more time designing and testing my software and development time is almost entirely prompting and reviewing AI changes.
I’m afraid my coding skills are atrophying, in fact I know the are, but I’m not sure if the coding was the part of my job I truly enjoyed. I enjoy thinking higher-level: architecture, connecting components, focusing on the user experience. But I think using these AI tools is a form of golden handcuffs. If I go work at a startup without the money I pay for these models, I think for the first time in my career I would be less likely to be able to successfully code a feature than I could last year.
So professionally there are pros and cons. My design and architecture skills have greatly improved as I am spending more time doing this.
Personally it’s so much fun. I’ve made several side projects I would have never done otherwise. Working with Claude code on greenfield projects is a blast.
I use the code generation heavily in my day to day, though verification is a priority for me, as is gaining an understanding of the business logic + improving my skills as a developer. There’s a healthy balance between deploying 100% generated code and not using the tools at all.
It’s useful for research tasks, identifying areas I’ll be working in when developing a feature. However, this team has a gigantic backlog and there are TONS of things we are behind on, so it does feel like AI isn’t moving the needle for us, though it is helpful. I’d like to apply it in different areas, but my senior engineer is very anti-AI, so he doesn’t find the tools useful and is actively against using them. Like I said, there’s surely a balance…
I see us using / relying on them more in the future, due to pressure from above, along with the general usefulness of them.
Copilot completions are amazingly useful. chatting with the chatbot is a super useful debugging tool. Giving it a function or database query and asking the ai to optimize it works great. But true vibe coding is still, imho, more of a party trick than an actual productivity multiplier. It can do things that look useful, and it can do things that solve immediate self-contained problems. but it can’t create launchable products that serve the needs of multiple users.
For this specific use case, LLMs and their integrations with tools like VSCode have been excellent. A simple instruction file dictating what libraries to use, and lines about where to look for up-to-date API docs, increases the chances of one-shots significantly.
My favorite part has been that I'm able to use libraries I wouldn't have used previously like openpyxl. A use case like "get data from an API, transform it, and output it to an excel file with these columns" is super fast, and outputs data to a stakeholder/non-techy format.
It made me chuckle when Claude etc. release Excel integrations, since working with Excel files seems to have been at a great stage for people who've already worked with Excel/CSV libraries.
The number 1 suggestion I'd have for people eager to work with text is to use models to learn about old unix tools like grep/sed etc. With these powerful tools + modern tools + code you can build quite complex integration code for many uses. Don't sleep on the classic unix cli commands and download stuff from github to achieve things that have already been solved 40 years ago :)
These past months I've been working with two agents developing two things practically in parallel. And i've experienced the fastest, most motivating development sessions i ever had. Together with these two agents i was able to build two very complex systems that use all sorts of data gathering, then ETL into a format that can be queried and maintained, and it all ends up in some awesome web UIs. I used Them not only to write code, but to do the design and architecture, discussed the front end, the business reqs.
And what i can say, is that it felt like a conversation with a crazy fast person who did everything i needed in seconds. AS a tech guy i know what i want and i know how to describe it. That helped A LOT! I know when we lost context and yes, there were stupid consequences that we had to fix. But my impression is that many of these things i see criticized here refer to the people using it, less than to the AI and its output. From my point of view, the output is what i wanted, only 250x faster than i ever expected. And for the critiques targeting the AIs, after this i am sure that they will learn to fill in all those gaps. We will not be criticizing then. By then my only possible job will be to translate somebody's business reqs for an agent to implement as i speak.
The productivity comes from three main areas for me:
- Having the AI coding assistance write unit tests for my changes. This used to be by far my least favorite part of my job of writing software, mostly because instead of solving problems, it was the monotonous process of gathering mock data to generate specific pathways, trying to make sure I'm covering all the cases, and then debugging the tests. AI coding assistance allows me to just have to review the tests to make sure that they cover all the cases I can think of and that there aren't any overtly wrong assumptions
- Research. It has been extraordinarily helpful in giving me insight into how to design some larger systems when I have extremely specific requirements but don't necessarily have the complete experience to architect them myself - I know enough to understand if the system is going to correctly accomplish the requirements, but not to have necessarily come up with architecture as a whole
- Quick test scripts. It has been extremely useful for generating quick SQL data for testing things, along with quick one-off scripts to test things like external provider APIs
my team has largely avoided AI; our sister team has been quite gungho on it. i recently handed off a project to them that i'd scoped at about one sprint of work. they returned with a project design that involved four microservices, five new database tables, and an entirely new orchestration and observability layer. it took almost a week of back-and-forth to pare things down.
since then, they've spent several sprints delivering PRs that i now have to review. there's lots of things that don't work, don't make sense, or reinvent things we already have from scratch. almost half the code is dedicated to creating 'reusable' and 'modular' classes (read: boilerplate) for a project that was distinctly scoped as a one-off. as a result, this takes hours, and it's cut into my own sprint velocity. i'm doing all the hard work but receiving none of the credit.
management just told me that every engineer is now required to use AI. i'm tired.
I've started using it to write some code, which I then use further prompting to review before my own final review. I feel a lot more productive, I can focus on high level ideas and not think about tiny implementation details.
Having instrumentation code magically created in minutes and being able to validate assumptions before/after making changes by doing manual testing and feeding AI logs has been a great use for me - this kind of stuff is boring and would kill my motivation and productivity in the past. AI helps here so I can move on to the fun stuff, helps me stay engaged and interested.
It's great for writing unit tests and doing log analysis. The usual AI pitfalls apply like going into loops that lead nowhere and hallucinating things, but I've gotten better at spotting it and steering it away. I try not to take what it gives me at face value and use follow up prompts to challenge assumptions or verify things.
So overall, it's been an immense help for me. I've got some interesting projects coming up that are more greenfield work, we'll see if this holds up compared to an existing codebase.
I have a lot of worry that I will end up having to eventually trudge through AI generated nightmares since the major projects at work are implemented in Java and Typescript.
I have very little confidence in the models' abilities to generate good code in these or most languages without a lot of oversight, and even less confidence in many people I see who are happy to hand over all control to them.
In my personal projects, however, I have been able to get what feels like a huge amount of work done very quickly. I just treat the model as an abstracted keyboard-- telling it what to write, or more importantly, what to rewrite and build out, for me, while I revise the design plans or test things myself. It feels like a proper force multiplier.
The main benefit is actually parallelizing the process of creating the code, NOT coming up with any ideas about how the code should be made or really any ideas at all. I instruct them like a real micro-manager giving very specific and narrow tasks all the time.
The suggestions are correct about 40% of the time, so I'm actually surprised when they're right, rather than becoming reliant on them. It saves me maybe 10 minutes a day.
In my day job I’m currently a PM/operations director at a small company. We don’t have programmers. I have used AI to build about 12 internal tools in the past year. They’re not very big, but provide huge productivity gains. And although I do not fully understand the codebase, I know what is where. Three of these tools I’m now recreating based on our usage and learnings.
I have learned a ton about all kinds of development concepts in a ridiculously short timeframe.
On the other hand, I have tried them a number of times in greenfield situations with Python and the web stack and experienced the simultaneous joy and existential dread of others. They can really stand new projects up quick.
As a founder, this leaves me with what I describe as the "generation ship" problem. Is it possible that the architecture we have chosen for my project is so far out of the training data that it would be faster to ditch the project and reimplement it from scratch in a Claude-yolo style? So far, I'm convinced not because the code I've seen in somewhat novel circumstances is fairly mid, but it's hard to shake the thought.
I do find chatting with the models incredibly helpful in all contexts. They are also excellent at configuring services.
Cursor and Claude Code have undoubtedly accelerated certain aspects of my technical execution. In particular, root causing difficult bugs in a complicated codebase has been accelerated through the ability to generate throwaway targeted logging code and just generally having an assistant that can help me navigate and understand complex code.
However, overall I would say that AI coding tools have made my job harder in two other ways:
1. There’s an increased volume of code that requires more thorough review and/or testing or is just generally not in keeping with the overall repo design.
2. The cost is lowered for prototyping ideas so the competitive aspect of deciding what to build or which experiment to run has ramped up. I basically need to think faster and with more clarity to perform the same as I did before because the friction of implementation time has been drastically reduced.
I'm getting tired, honestly. I'd prefer the simpler "I don't know" of old to six pages of bullshit I have to review.
In my domain (signal processing, high load systems, embedded development, backend in Go) it doesn’t do great for coding tasks, and I’m very opposed to giving it lead to create files, do mass edits, et cetera. I found it to fail even on recent versions of Go, imagining interfaces, not knowing changes in some library interfaces (pre-2024 changes at least). Both ChatGPT and Claude failed to create proper application for me (parsing incoming messages and drawing real time graphics), both getting stucked at some point. Application worked more or less, but with bugs and huge performance issues.
I found it useful to quickly create skeletons for scripts/tools, that I can then fill up with actual logic, or making example of how a library is used.
So there is usability for me, it replaced stackoverflow and sometimes reading actual documentation.
I own a few repositories of our system, and contribution guides I create explicitly forbid use of LLMs and agents to create PRs. I had some experience with developers submitting vibe coded PRs and I do not want to waste my time on them anymore.
These are usually internal tools, workflow improvements, and one off features. Anything really central to the game’s code gets human coded.
I think the further you are from the idea part, the less fun AI coding will be for you. Because now you need to not just translate some spec to code, you have to translate it to a prompt, which ups the chances of playing the telephone game. At least when you write the code yourself you are getting real with it and facing all the ambiguities as a matter of course. If you just pass it to an LLM you never personally encounter the conflicts, and it might make assumptions you would not… but you don’t even realize it because they are assumptions!
All I can say for sure is that it is absolutely useful, it has improved my quality of life without a doubt. I stick to the principle that it's here to improve my work life balance, not increase output for our owners.
And that it has done, so far. I can accomplish things that would have taken me weeks of stressful and hyperfocused work in just hours.
I use it very carefully, and sparingly, as a helpful tool in my toolbox. I do not let it run every command and look into every system, just focused efforts to generate large amounts of boilerplate code that would require me to have a lot of docs open if I were to do it myself.
I definitely don't let it read or write my e-mails, or write any text. Because I always loved writing, and will never stop loving it.
It's here to stay, because I'm not alone in feeling this way about it. So the staunch AI-deniers are just wasting their time. Just like any other tech, it's going to be used against humans, against the already oppressed.
I definitely recognize that the tech has made some people lose their minds. Managers and product owners are now vibe coding thinking they can replace all their developers. But their code base will rot faster than they think.
When you hit a runtime bug, the agent's only tool is "let me add a print statement and restart". That works for simple cases but it's the exact same log-and-restart loop we fall back to in cloud and containerized environments, just with faster typing.
Where it breaks down: timing-sensitive code, Docker services, anything where restarting changes the conditions you need to reproduce.
I've had debugging sessions where the agent burned through 10+ restart cycles on a bug that would've been obvious if it could just watch the live values.
We've given agents the ability to read and write code. We haven't given them the ability to observe running code. That's a pretty big gap.
We have this one performance-critical 3D reconstruction engine part, that just just has to go FAST through billions of voxels. From time to time we try to improve it, by just a bit. I have probably wasted at least 2 full days with various models trying out their suggestions for optimizations and benchmarking on real-world data. NONE produced an improvement. And the suggested changes look promising programming-wise, but all failed with real-world data.
These models just always want to help. Even if there is just no way to go, they will try to suggest something, just for the sake of it. I would just like the model to say "I do not know", or "This is also the best thing that I can come up with"... Niche/expert positions are still safe IMHO.
On the other hand - for writing REST with some simple business logic - it's a real time saver.
We use a mix of agentic and conversational tools, just pick your own and go with it.
For Unity development (our main codebase and source of value) I give current gen tools a C- for effectiveness. For solving confined, well modularisable problems (eg refactor this texture loader; implement support for this material extension) it’s good. For most real day to day problems it’s hopelessly confused by the large codebase full of state, external dependency on chunks of Unity, implicit hardware-dependent behaviours, etc. It has no idea how to work meaningfully with Unity’s scene graph or component model. I tried using MCP to empower it here: on a trivial test project it was fine. In a real project it got completely lost and broke everything after eating 30k tokens and 40 minutes of my time, mostly because it couldn’t understand the various (documented) patterns that straddled code files and scene structure.
For web and API development I give it an A, with just a little room for improvement. In this domain it’s really effective all the way down the logical stack from architectural and deployment decisions all the way down to implementation details and debugging including digging really deep in to package version incompatibilities and figuring out problems in seconds that would take me hours. My one criticism would be the - now familiar - “junior developer” effect where it’ll often run ahead with an over engineered lump of machinery without spotting a simpler more coherent pattern. As long as you keep an eye on it it’s fine.
So in summary: if what you’re doing is all in text, nothing in binary, doesn’t involve geometric or numerical reasoning, and has billions of lines of stack overflow solutions: you’ll be golden. Otherwise it’s still very hit and miss.
I use Opus 4.6 with pi.dev (one agent). I give detailed instructions what to do. I essentially use it to implement as I do it manually. Small commits, every commit tested and verified. I don’t use plan mode, just let the agent code - code review is faster than reading plan. This approach works only if you make small changes. I create mental model of the code same way, as when writing it manually.
Some people on my team codes with AI without reading code. That’s mostly a mess. No mental model, lower quality. They are really proud of it though and think they are really smart. Not sure how this will turn out.
My main work is training Text-to-Speech models, and the friction of experimenting with model features or ideas has dropped massively. If I want to add a new CFG implementation, or conditioning vector, 90% of the time Opus can one-shot it. It generally does a good job of making the model, inference and training changes simultaneously so everything plays nicely. Haven't had any major regressions or missed bugs yet, but we'll see!
The downside is reviewing shitty PRs where it's clear the engineer doesn't fully understand what they're doing, and just a general attitude of "I dunno, Claude suggested it" that's getting pretty exhausting.
When it comes to personal projects I'm feeling extremely unmotivated. Things feel more in reach and I've probably built ten times the number of throwaway projects in the past year than I have in previous years. Yet I feel no inspiration to see those projects through to the end. I feel no connection to them because I didn't build them. I have a feeling of 'what's the point' publishing these projects when the same code is only a few prompts away for someone else too. And publishing them under my name only cheapens the rest of my work which I put real cognitive effort into.
I think I want to focus more on developing knowledge and skills moving forward. Whatever I can produce with an LLM in a few hours is not actually valuable unless I'm providing some special insight, and I think I'm coming to terms with that at the moment.
Also some not so nice moments (small rust changes were OK, but with a big one claude fumbled + I couldn't really verify that it worked so I didn't merge to code to master even when it seemingly worked)
I think it really helps to break the ice so to say. You no longer feel the tension, the pain of an empty page. You ask claude to write something, and improving something is so mentally easier
Also I mostly use claude as a spell checker / linter for the projects I'm too lazy to install proper tools for that. vim + claude, what else would you need
Luckily my company pays for the subscription, speding personal money on LLMs (especially on US LLMs) would feel strange for some reason. Ideally I want to own an LLM, have it at home but I am too lazy
For throw away code, I might let the agent do some stuff. For example, we needed to test timing on DNS name resolution on a large number of systems to try and track down if that was causing our intermittent failures. I let an agent write that and was able to get results faster than if I did it myself, and I ultimately didn’t have to care about the how… I just needed something to show to the network team to prove it was their problem.
For larger projects that need to plugin to the legacy code base, which I’ll need to maintain for years, I still prefer to do things myself, using AI here and there as previously mentioned to help with little things. It can also help finding bugs more quickly (no more spending hours looking for a comma).
I had an agent refactor something I was making for a larger project. It did it, and it worked, but it didn’t write it in a way that made sense to my brain. I think others on my team would have also had trouble supporting it too. It took something relatively simple and added so many layers to it that it was hard to keep all the context in my head to make simple edits or explain to someone else how it worked. I might borrow some of the ideas it had, but will ultimately write my own solution that I think will be easier for other people to read and maintain.
Borrowing some of these ideas and doing it myself also allows me to continue to learn and grow, so I have more tools in my tool belt. With the DNS thing that was totally vibe coded, there were some new things in there I hadn’t done before. While the code made sense when I skimmed through it, I didn’t learn anything from that effort. I couldn’t do anything it did again without asking AI to do it again. Long-term, I think this would be a problem.
Other people on my team have been using AI to write their docs. This has been awful. Usually they don’t write anything at all, but at least then I know they didn’t writing anything. The AI docs are simply wrong, 100% hallucinations. I have to waste time checking the doc against the code to figure that out and then go to the person that did it to make them fix it. Sometimes no doc is better than a bad doc.
I really liked writing code, so this is all a big negative for me. I genuinely think we have built a really bad thing, that will take away jobs that people love and leave nothing but mediocrity. This thing is going to make the human race dumber and it's going to hold us back.
2. Incremental cleanup: I also use it as a fancier upgrade of Visual Studio's Code Analysis feature and aid me in finding areas to refactor.
3. Treating the model as a corpus of prior knowledge and discussions, I can form a 'committee of agents' (Security, Reliability, UX engineer POVs) to help me view my work at a more strategic level.
My additional twist to this is to check against my organization's mission statement. That way, I hope I can help reduce mission drift that I observe was a big issue behind dysfunctional companies.
"Implement JWT token verification and role checking in Spring Boot. Secure some endpoints with Oauth2, some with API key, some public."
C# and Java are so old, whatever solutions you find are 5 years out of date. Having an agent implement and verify the foundation is the perfect fit. There's no design, just ever-chaning framework magic. I'd do the same "Google and debug" cycle, but 10 times slower.
I have moved away from using an LLM now before having figured out the specifications, otherwise it's very very risky to go down a wrong rabbit hole the LLM seduced you into via its "user engagement" training.
What works: I stay in the driver's seat. I own the architecture, make the decisions, validate everything. But I don't need a team to execute — Claude does the implementation. I went from being a solo dev limited by time to running a complex project (multi-agent system, Docker, Synology integration, PHP API) that would normally need 2-3 people.
The key is a good CLAUDE.md file with strict rules, and pushing Claude to think ahead and propose multiple options instead of just doing the first thing that comes to mind. Claude is also surprisingly powerful for audits — security audits, config audits, log analysis.
What doesn't work: it confidently generates plausible-looking code that's subtly wrong. Never trust it on things you can't verify. It also over-engineers everything if you don't rein it in.
The biggest shift: I went from "write code" to "review and direct code." Not sure it's making me a better engineer, but it's making me a more effective one. It extends me.
1. Gemini as a replacement for Stack Overflow, but I always have to check the source because it sometimes gives examples that 10 or even 15+ years old, as if that’s a definitive answer. We cannot and should not trust that anything AI produces is correct.
2. Co-Pilot to assist in code snippets and suggestions, like a better Intellisense. Comes in handy for CLI tools such as docker compose, etc.
3. Co-Pilot to help comprehension of a code base. For example, to ask how a particular component works or to search for the meaning of a term of reference to it, especially if the term is vague or known by another name.
Believe it or not, we have just recently received guidance on AI-assisted work in general, and it’s mostly “it’s ok to use AI, but always verify it”, which of course seems completely reasonable, as you should do this with any work that you wouldn’t have done yourself.
I write stuff for free. It's definitely "professional grade," and lots of people use the stuff I ship, but I don't earn anything for it.
I use AI every day, but I don't think that it is in the way that people here use it.
I use it as a "coding partner" (chat interface).
It has accelerated my work 100X. The quality seems OK. I have had to learn to step back, and let the LLM write stuff the way that it wants, but if I do that, and perform minimal changes to what it gives me, the results are great.
I also use it as a final check on all my manually written code before sending it for code review.
With all that said, I have this weird feeling that my ability to quickly understand and write code is no longer noticeable, nor necessary.
Everyone now ships tons of code and even if I do the same without any LLM, the default perception will be that it has been generated.
I am not depressed about it yet, but it will surely take a while to embrace the new reality in its entirety
Important things I've figured out along the way:
1. Enable the agent to debug and iterate. Whatever you'd do to test and verify after you write your first pass at an implementation, figure out a way for an agent to do it too. For example: every API call is instrumented with OpenTelemetry, and the agent has a local collector to query.
2. Make scripts or skills to increase the reliability of fallible multi-step processes that need to be repeated often. For example: getting an oauth token to call some api with the appropriate user scopes for the task.
3. Continually revise your AGENTS.md. I'll often end a coding session by asking the agent whether there's anything from this session that should be captured there. That adds more than it removes, so every few days I'll compact it by having an agent reword the important stuff for conciseness and get rid anything obvious from implementation.
I'm still learning how to make the most of it but my current state is one of total amazement. I can't believe how well this works now.
One game-changer has been custom agents and agent orchestration, where you let agents kick off other agents and each one is customized and keeps a memory log. This lets me make several 1000 loc features in large existing codebases without reaching context limits, and with documentation that lets me review the work with some confidence.
I have delivered several features in large legacy codebases that were implemented while I attended meetings. Agents have created greenfield dashboards, admin consoles and such from scratch that would have taken me days to do myself, during daily standups. If it turned out bad, I tweaked the request and made another attempt over lunch. Several useful tools have been made that save me hours per week but I never took the time to make myself.
For now, I love it. I do feel a bit of "mourning the craft" but love seeing things be realized in hours instead of days or weeks.
What works:
-Just pasting the error and askig what's going on here.
-"How do I X in Y considering Z?"
-Single-use scripts.
-Tab (most of the time), although that doesn't seem to be Claude.
What doesn't:
-Asking it to actually code. It's not going to do the whole thing and even if, it will take shortcuts, occasionally removing legitimate parts of the application.
-Tests. Obvious cases it can handle, but once you reach a certain threshold of coverage, it starts producing nonsense.
Overall, it's amazing at pattern matching, but doesn't actually understand what it's doing. I had a coworker like this - same vibe.
I'm learning all the time and it's fun, exasperating, tremendously empowering and very definitely a new world.
For personal projects and side company, I get to join in on some of the fun and really multiply the amount of work I can get through. I tend to like to iterate on a project or code base for awhile, thinking about it and then tearing things down and rebuilding it until I arrive at what I think is a good implementation. Claude Code has been a really great companion for this. I'd wager that we're going to see a new cohort of successful small or solo-founder companies that come around because of tools like this.
For work, I would say 60% of my company's AI usage is probably useless. Lots of churning out code and documents that generate no real value or are never used a second time. I get the sense that the often claimed "10x more productive" is not actually that, and we are creating a whole flood of problems and technical debt that we won't be able to prompt ourselves out of. The benefit I have mostly seen myself so far is freeing up time and automating tedious tasks and grunt work.
I feel it made me better and other people worse.
GOOD:
I feel that I’m producing more and better code even with unfamiliar and tangled codebases. For my own side projects, it’s brought them from vague ideas to shipped.
I can even do analyses I never could otherwise. On Friday I converted my extensive unit test suite into a textual simulation of what messages it would show in many situations and caught some UX bugs that way.
Cursor’s Bugbot is genuinely helpful, though it can be irritatingly inconsistent. Sometimes on round 3 with Bugbot it suddenly notices something that was there all along. Or because I touch a few lines of a library suddenly all edge cases in that library are my fault.
NOT GOOD:
The effect on my colleagues is not good. They are not reading what they are creating. I get PRs that include custom circular dependency breakers because the LLM introduced a circular dependency, and decided that was the best solution. The ostensible developer has no idea this happened and doesn’t even know what a circular dependency breaker is.
Another colleague does an experiment to prove that something is possible and I am tasked to implement it. The experiment consists of thousands of lines of code. After I dig into it I realize the code is assuming that something magically happened and reports it’s possible.
I was reflecting on this and realized the main difference between me and my current team is that I won’t commit code I don’t understand. So I even use the LLMs to do refactors just for clarity. while sometimes my colleagues are creating 500-line methods.
Meanwhile our leaders are working on the problem of code review because they feel it’s the bottleneck. They want to make some custom tools but I suspect they are going to be vastly inferior to the tools coming from the major LLM providers. Or maybe we’ll close the loop and we won’t even be reviewing code any more.
I'm lucky enough to have upper management not pressuring to use it this or that way, and I'm using mostly to assist with programming languages/frameworks I'm not familiar with. Also, test cases (these sometimes comes wrong and I need to review thoroughly), updating documentation, my rubber duck, and some other repetitive/time consuming tasks.
Sometimes, if I have a simple, self-contained bug scenario where extensive debug won't be required, I ask it to find the reason. I have a high rate of success here.
However, it will not help you with avoiding anti-patterns. If you introduce one, it will indulge instead of pointing the problem out.
I did give it a shot on full vibe-coding a library into production code, and the experience was successful; I'm using the library - https://youtu.be/wRpRFM6dpuc
My observations:
1. What works for me is the usual, work iteratively on a plan then implement and review. The more constraints I put into the plan the better.
2. The biggest problem for me is LLM assuming something wrong and then having to steer it back or redoing the plan.
3. Exploring and onboarding to new codebases is much faster.
4. I don’t see the 10x speedup but I do see that now I can discard and prototype ideas quickly. For example I don’t spend 20-30 minutes writing something just to revert it if I don’t like how it looks or works.
5. Mental exhaustion when working on multiple different projects/agent sessions is real, so I tend to only have one. Having to constantly switch mental model of a problem is much more draining than the “old” way of working on a single problem. Basically the more I give in into vibing the harder it is to review and understand.
To be clear, this is not vibecoding. I have a strong sense of the architecture I want, and explicitly keep Claude on the desired path much like I would a junior programmer. I also insist on sensible unit and E2E test coverage with every incremental commit.
I will say that after several months of this the signalling between UI components is getting a bit spaghettilike, but that would’ve happened anyway, and I bet Claude will be good at restructuring it when I get around to that.
I also work in a giant Rails monolith with 15 years of accumulated cruft. In that area, I don’t write a whole lot, but CC Opus 4.6 is fantastic for reading the code. Like, ask “what are all the ways you can authenticate an API endpoint?” and it churns away for 5 minutes and writes a nice summary of all four that it found, what uses them, where they’re implemented, etc.
Managment uses it to make mock websites then doesn't listen when we point out flows, so nothing new there
Some in digital marketing are using it for data collection/anlysis, but it reaches wrong conclusions 50% of the time (their words) so they are slowly dropping it and using it for meneal tasks and simple automations
In design we had a trial period but has the same issue as coding: either it makes something a senior designer could have made in 2 minutes or it introduces errors that take a long time to fix, to then do it again the next prompt
we are a senior dev team, although relative small, and to me it seems like it only really works as a subsitute for junior devs... but the point of junior devs is to grow someone into a senior with the knowledge you need in the company so i don't really get the usecase overall
Where it breaks down is state management. The suggestions look right but introduce subtle bugs in how data flows between views. I've learned to only use it for isolated, well-scoped tasks. Anything that touches multiple components, I write myself.
1. Workplace, where I work on a lot of legacy code for a crusty old CRM package (Saleslogix/Infor), and a lot of SQL integration code between legacy systems (System21).
So far I've avoided using AI generated code here simply because the AI tools won't know the rules and internal functions of these sets of software, so the time wrangling them into an understanding would mitigate any benefits.
In theory where available I could probably feed a chunk of the documentation into an agent and get some kind of sensible output, but that's a lot of context to have to provide, and in some cases such documentation doesn't exist at all, so I'd have to write it all up myself - and would probably get quasi hallucinatory output as a reward for my efforts.
2. Personally where I've been working on an indie game in Unity for four years. Fairly heavy code base - uses ECS, burst, job system, etc. From what I've seen AI agents will hallucinate too much with those newer packages - they get confused about how to apply them correctly.
A lot of the code's pretty carefully tuned for performance (thousands of active NPCs in game), which is also an area I don't trust AI coding at all, given it's a conglomeration of 'average code in the wild that ended up in the training set'.
At most I sometimes use it for rubber ducking or performance. For example at one point I needed a function to calculate the point in time at which two circles would collide (for npc steering and avoidance), and it can be helpful to give you some grasp of the necessary math. But I'll generally still re-write the output by hand to tune it and make sure I fully grok it.
Also tried to use it recently to generate additional pixel art in a consistent style with the large amount of art I already have. Results fell pretty far short unfortunately - there's only a couple of pixel art based models/services out there and they're not up to snuff.
For my side projects, I do like to offload the tedious steps like setup, scaffolding or updating tasks to Claude. Things like weird build or compile errors that I usually would have to spend hours Googling to figure out I can get sorted in a matter of minutes. Other than that, I still like to write my own code as I enjoy doing it.
Overall, I like it as a tool to assist in my work. What I dislike is how much peddling is being done to shove AI into everything.
What’s working:
Boilerplate & Layout Shifting: AI (specifically Claude 4.x/5) is excellent for generating Astro components and complex Tailwind layouts. What used to take 2 hours of tweaking CSS now takes 15 minutes of prompt-driven iteration.
Programmatic SEO (pSEO) Analysis: I use Python scripts to feed raw data into LLMs to generate high-volume, structured analysis (300+ words per page). For zero-weight niche sites, this has been a massive leverage point for driving organic traffic.
Logic "Vibe Checks": When building strategy engines (like simulators for complex games), I use AI to stress-test my decision-making logic. It’s not about writing the core engine—which it still struggles with for deep strategy—but about finding edge cases in my "Win Condition" algorithms.
The Challenges:
The "Fragment" Syntax Trap: In Astro specifically, I’ve hit issues where AI misidentifies <> shorthand or hallucinates attribute assignments on fragments. You still need to know the spec inside out to catch these.
Context Rot: As a project grows, the "context window" isn't the problem; it's the "logic drift." If you let the AI handle too many small refactors without manual oversight, the codebase becomes a graveyard of "almost-working" abstractions.
The Solution: I treat AI as a junior dev who is incredibly fast but lacks a "mental model" of the project's soul. I handle the architecture and the "strategy logic," while the AI handles the implementation of UI components and repetitive data transformations.
Stack: Astro, TypeScript, Python scripts for data. Experience: 10 years, independent/solo.
I think it's useful tool, but whenever I have a LLM attempt to develop an entire feature for me, the solution becomes to a pain to maintain (because I don't have the mental model around it or the solution has subtle issues).
Maybe people who are really deep into using AI are using Claude? Perhaps it's way better, I don't know.
It's going very poorly, where the engineers are emboldened by speed and are vacating their normal code-review responsibilities. I would also say they are shirking ethical behavior by domineering other people's time, energy, and open source projects. Moreover, these forays into generic packages are largely vanity projects, an excuse to play with LLMs.
My only solution is to increase my level of code-review, which aggravates everybody involved, including me. It is not a good solution.
I could definitely perceive hardline rules being valuable surrounding LLM use (e.g. "LLM PRs must be less that n logical statements, no exceptions", is just one example rule off the top of my head), especially if the LLM can be made to stridently follow those rules, but the idea of hashing those out sounds unproductive.
- a web-based app for a F500 client for a workflow they’ve been trying to build for 2 years; won the contract
- built an iPad app for same client for their sales teams to use
- built the engineering agent platform that I’m going to raise funding
- a side project to do rough cuts of family travel videos (https://usefirstcut.com, soft launch video: https://x.com/xitijpatel/status/2026025051573686429)
I see a lot of people in this thread struggling with AI coding at work. I think my platform is going to save you. The existing tools don’t work anymore, we need to think differently. That said, the old engineering principles still work; heck, they work even better now.
It is great for getting an overview on a pile of code that I'm not familiar with.
It has debugged some simple little problems I've had, eg, a complex regex isn't behaving so I'll give it the regex and a sample string and ask, "why isn't this matching" and it will figure out out.
I've used it only a little for writing new code. In those cases I will write the shell of a subroutine and a comment saying what the subroutine takes in and what it returns, then ask the LLM to fill in the body. Then I review it.
It has been useful for translating ancient perl scripts into something more modern, like python.
I am a data engineer maintaining a big data Spark cluster as well as a dozen Postgres instances - all self hosted.
I must confess it has made me extremely productive if we measure in terms of writing code. I don't even do a lot of special AGENTS.md/CLAUDE.md shenanigans, I just prompt CC, work on a plan, and then manually review the changes as it implements it.
Needless to say this process only works well because: A) I understand my code base. B) I have a mental structure of how I want to implement it.
Hence it is easy to keep the model and me in sync about what's happening.
For other aspects of my job I occasionally run questions by GPT/Gemini as a brainstorming partner, but it seems a lot less reliable. I only use it as a sounding board. I does not seem to make me any more effective at my job than simply reading documents or browsing github issues/stack overflow myself.
What it has done is replace my Googling and asking people looking up stuff on stack over flow.
Its also good for generating small boiler plate code.
I don't use the whole agents thing and there are so many edge cases that I always need to understand & be aware of that the AI honestly think cannot capture
- Figuring out the architecture of a project you just came into
- Tracing the root cause of a bug
- Quickly implementing a solution with known architecture
I figured out that above all, what makes or breaks success is context engineering. Keeping your project and session documentation in order, documenting every learning you've made along the way (with the help of AI), asking AI to compose a plan before implementing it, iterating on a plan before it looks good to you. Sometimes I spend several hours on a plan markdown document, iterating on it with AI, before pressing "Build" button and the AI doing it in 10 minutes.
Another important thing is verification harness. Tell the agent how to compile the code, run the tests - that way it's less likely to go off the rails.
Overall, since couple of month ago, I feel like I got rid of the part of programming that I liked the least - swimming in technicalities irrelevant for the overall project's objectives - while keeping what I liked the most - making the actual architectural and business decisions.
I wrote a blog recently about the approach that works for me: https://anatoliikmt.me/posts/2026-03-02-ai-dev-flow/
And this is a tool for context engineering I made specifically to support such a flow: https://ctxlayer.dev/
The manager & a senior dev on my first day told me to "Don't try to write code yourself, you should be using AI". I got encouraged to use spec-driven development and frameworks like superpowers, gsd, etc.
I'm definitely moving faster using AI in this way, but I legitimately have no idea what the fuck I am doing. I'm making PRs I don't know shit about, I don't understand how it works because there is an emphasis on speed, so instead of ramping up in a languages / technologies I've never used, I'm just shipping a ton of code I didn't write and have no real way to vet like someone who has been working with it regularly and actually has mastered it.
This time last year, I was still using AI, but using it as a pair programming utility, where I got help learn to things I don't know, probe topics / concepts I need exposure to, and reason through problems that arose.
I can't control the direction of how these tools are going to evolve & be used, but I would love if someone could explain to me how I can continue to grow if this actually is the future of development. Because while I am faster, the hope seems to be AI / Agents / LLMs will only ever get better and I will never need to have an original thought or use crtical thinking.
I have just about 4 years of professional experience. I had about 10 - 12 months of the start of my career where I used google to learn things before LLMs became sole singular focus.
I wake up every day with existential dread of what the future looks like.
I do wonder what will happen when real costs are billed. It might end up being a net positive since that will make you think more about what you prompt, and perhaps the results will be much better than lazily prompting and seeing what comes out (which seems to be a very typical case).
Experience level: very senior, programming for 25 years, have managed platform teams at Heroku and Segment.
Project type: new startup started Jan ‘26 at https://housecat.com. Pitch is “dev tools for non developers”
Team size: currently 2.
Stack: Go, vanilla HTML/CSS/JS, Postgres, SQLite, GCP and exe.dev.
Claude code and other coding harnesses fully replaced typing code in an IDE over the past year for me.
I’ve tried so many tools. Cursor, Claude and Codex, open source coding agents, Conductor, building my own CLIs and online dev environments. Tool churn is a challenge but it pays dividends to keep trying things as there have been major step functions in productivity and multi tasking. I value the HN community for helping me discover and cut through the space.
Multiple VMs available over with SSH with an LLM pre-configured has been the latest level up.
Coding is still hard work designing tests, steering agents, reviewing code, and splitting up PRs. I still use every bit of my experience every day and feel tired at end of day.
My non-programmer co-founder, more of a product manager and biz ops person, has challenges all the time. He generally can only write functional prototypes. We solve this by embracing the functional prototype and doing a lot of pair programming. It is much more productive than design docs or Figma wireframes.
In general the game changer is how much a couple of people can get done. We’re able to prototype ideas, build the real app, manage SOC2 infra, marketing and go to market better than ever thanks to the “willing interns” we have. I’ve done all this before and the AI helps with so much of the boilerplate and busywork.
I’m looking for beta testers and security researchers for the product, as well as a full time engineer if anyone is interested in seeing what a “greenfield” product, engineering culture and business looks like in 2026. Contact info in my profile.
It works really well (using Claude Code and Opus 4.6 primarily). Incremental changes tend to be well done and mostly one-shotted provided I use plan mode first, and larger changes are achievable by careful planning with split phases.
We have skills that map to different team roles, and 5 different skills used for code review. This usually gets you 90% there before opening a PR.
Adopting the tool made me more ambitious, in the sense that it lets me try approaches I would normally discard because of gaps in my knowledge and expertise. This doesn't mean blindly offloading work, but rather isolating parts where I can confidently assess risk, and then proceed with radically different implementations guided by metrics. For example, we needed to have a way to extract redlines from PDF documents, and in a couple of days went from a prototype with embedded Python to an embedded Rust version with a robust test oracle against hundreds of document.
I don't have multiple agents running at the same time working on different worktrees, as I find that distracting. When the agent is implementing I usually still think about the problem at hand and consider other angles that end up in subsequent revisions.
Other things I've tried which work well: share an Obsidian note with the agent, and collaboratively iterate on it while working on a bug investigation.
I still write a percentage of code by hand when I need to clearly visualise the implementation in my head (e.g. if I'm working on some algo improvement), or if the agent loses its way halfway through because they're just spitballing ideas without much grounding (rare occurrence).
I find Elixir very well suited for AI-assisted development because it's a relatively small language with strong idioms.
So now a lot of different parts of the company are trying to replicate their workflow. The process is showing what works, you need to have AI first documentation (readme with one line for each file to help manage context), develop skills and steering docs for your codebase, code style, etc,. And it mostly works!
For me personally, it has drastically increased productivity. I can pick up something from our infinitely huge backlog, provide some context and let the agent go ham on fixing it while i do whatever other stuff is assigned to me.
On the other hand I tried to get help debugging a test failure and Claude spit out paragraph after paragraph arguing with itself going back and forth. Not only did it not help none of the intermediate explanations were useful either. It ended up being a waste of time. If I didn't know that I could have easily been sent on multiple wild goose chases.
I'm using `auggie` which is their CLI-based agentic tool. (They also have a VS Code integration - that became too slow and hung often the more I used it.) I don't use any prompting tricks, I just kind of steer the agent to the desired outcome by chatting to it, and switch models as needed (Sonnet 4.6 for speed and execution, GPT 5.1 for comprehension and planning).
My favorite recent interaction with Augment was to have one session write a small API and its specification within the old codebase, then have another session implement the API client entirely from the specification. As I discovered edge cases I had the first agent document them in the spec and the second agent read the updated spec and adjust the implementation. That worked much, much better than the usual ad hoc back and forth directly between me and one agent and also created a concise specification that can be tracked in the repo as documentation for humans and context for future agentic work.
So far, it's been fantastic. I can do more things for clients, much faster, than I ever dreamed would be possible when I've attempted work like this before.
I think the biggest problem with AI coding is that it simply doesn't fit well into existing enterprise structures. I couldn't imagine being able to do anything productive when I'm stuck having to rely on other teams or request access to stuff from the internet like I did in previous jobs.
The good thing is that the work gets much quicker than before, and it's actually a boon for that
The issue is inflated expectations
For example: If a work item ideally would take two weeks before AI, it is expected now to be done in like 2 days.
So we still need to find a sweet spot so that the expectations are not unbelievable.
MS is a mature place so they're still working on it and take our feedback seriously. at least that's what I have seen
People having easy access to LLMs makes this job much harder. LLMs can create what looks at the surface like expert-written code, but suffers from below-the-surface issues that will reveal themselves as intermittent issues or subtle bugs after being deployed.
Inexperienced devs create huge commits full of such code, and then expect me to waste an entire day searching for such issues, which is miserable.
If the models don't improve significantly in the future, I expect that most high-stakes software teams will fire all the inexperienced devs and have super-experienced engineers work with the bots directly.
For main coding tasks, it is imho not suitable because you still have to read the code and I hate reading other people's code.
And also, the AI is still slow, so it is hard to stay focused on a task.
My solution was to write code to force the model down a deterministic path.
It’s open source here: https://codeleash.dev
It’s working! ~200k LOC python/typescript codebase built from scratch as I’ve grown out the framework. I probably wrote 500-1000 lines of that, so ~99.5% written by Claude Code. I commit 10k-30k loc per week, code-reviewed and industrial strength quality (mainly thanks to rigid TDD)
I review every line of code but the TDD enforcement and self-reflection have now put both the process and continual improvement to said process more or less on autopilot.
It’s a software factory - I don’t build software any more, I walk around the machine with a clipboard optimizing and fixing constraints. My job is to input the specs and prompts and give the factory its best chance of producing a high quality result, then QA that for release.
I keep my operational burden minimal by using managed platforms - more info in the framework.
One caveat; I am a solo dev; my cofounder isn’t writing code. So I can’t speak to how it is to be in a team of engineers with this stuff.
My workday is fairly simple. I spend all day planning and reviewing.
1. For most features, unless it's small things, I will enter plan mode.
2. We will iterate on planning. I built a tool for this, and it seems that this is a fairly desired workflow, given the popularity through organic growth. https://github.com/backnotprop/plannotator
- This is a very simple tool that captures the plan through a hook (ExitPlanMode) and creates a UI for me to actually read the plan and annotate, with qol things like viewing plan diffs so I can see what the agent changed.
3. After plan's approved, we hit eventual review of implementation. I'll use AI reviewers, but I will also manually review using the same tool so that I can create annotations and iterate through a feedback loop with the agents.4. Do a lot of this / multitasking with worktrees now.
Worktrees weren't something I truly understood the value of for a while, until a couple weeks ago, embarrassingly enough: https://backnotprop.com/blog/simplifying-git-worktrees/
I only use Claude Code with Opus 4.6 on High Effort.
I always, ALWAYS treat my “new job” as writing a detailed ticket for whatever it is I need to do.
I give the model access to a DB replica of my prod DB that I create manually.
I do NOT waste time with custom agents, Claude.md files or any of that stuff.
When I put ALL of the above together, the results ARE THE PROMISED LAND: I simply haven’t written a single line of code manually in the last 3 months.
At least at my company the problem is the business hasn’t caught up. We can code faster but our stakeholders can’t decide what they want us to build faster. Or test faster or grasp new modalities llms make possible.
That’s where I want to go next: not just speeding up and increasing code quality but improving business analytics and reducing the amount of meetings I have to be in to get business problems understood and solved.
It’s a lot of fun for exploring ideas. I’ve built things very fast that I would not have done at all otherwise. I have rewritten a huge chunk of semi-outdated docs into something useful with a couple of Prompts in a day. Claude does all the annoying dependency update breaks the build kinds of things. And the reviews are extremely useful and a perfect combination with human review as they catch things extremely well that humans are bad at catching.
But in the production codebase changes must be made with much more consideration. Claude tends to perform well ob some tasks but for other I end up wasting time because I just don’t know up front how the feature must look so I cannot write a spec at the level of precision that claude needs and changing code manually is more efficient for this kind of discovery for me than dealing with large chunks of constantly changing code.
And then there’s the fact that claude produces things that work and do the thing described in the prompt extremely well but they are always also wring in sone way. When I let AI build a large chunk of code and actually go through the code there’s always a mess somewhere that ai review doesn’t see because it looks completely plausible but contains some horrible security issue or complete inconsistency with the rest of the codebase or, you know, that custom yaml parser nobody asked for and that you don’t want your day job to depend on.
Personally, it’s been decent for generating tedious boilerplate. Though I’m not sure if reading the docs and just writing things myself would have been faster when it comes time to debug. I’m pretty fast at code editing with vim at this point. I’m also hesitant to feedback any fixes to the AI companies.
I’ve found “better google” to be a much more comfortable if not faster way to use the tools. Give me the information, I’ll build an understanding and see the big picture much better.
This is a key candidates to use AI as we have built hundreds of warehouses in the past. We have a standard product that spans over a hundred thousand lines of code to build upon. Still, we rely on copying code from previous projects if features have been implemented before. We have stopped investing in the product to migrate everything to microservices, for some reason, so this code copying is increasingly common as projects keep getting more complex.
Teams to implement warehouses are generally around eight developers. We are given a design spec to implement, which usually spans a few hundred pages.
AI has over doubled the speed at which I can write backend code. We've done the same task so many times before with previous warehouses, that we have a gold mine of patterns that AI can pick up on if we have a folder of previous projects that it can read. I also feel that the code I write is higher quality, though I have to think more about the design as previously I would realize something wouldn't work whilst writing the code. At GWT though, it's hopeless as there's almost no public GWT projects to train an AI on. It's also very helpful in tracing logs and debugging.
We use Cursor. I was able to use $1,300 tokens worth of Claude Opus 4.6 for a cost of $100 to the company. Sadly, Cursor discontinued it's legacy pricing model due to it being unsustainable, so only the non-frontier models are priced low enough to consistently use. I'm not sure what I'm going to do when this new pricing model takes affect tomorrow, I guess I will have to go back to writing code by hand or figure out how to use models like Gemini 3.1. GPT models also write decent code, but they are always so paranoid and strictly follow prompts to their own detriment. Gemini just feels unstable and inconsistent, though it does write higher quality code.
I'm not being paid any more for doubling my output, so it's not the end of the world if I have to go back to writing code by hand.
I have to think like micro-manager, coming up with discrete (and well-defined) tasks for the AI to do, and I periodically review the code to make it cleaner/more efficient.
But I'm confident that it is saving me time. And my love for programming has not diminished. I'm still driving the architecture and writing code, but now I have a helper who makes progress in parallel.
Honestly, I don't want to go back.
another teammate added a length check to an input field, and his request was merged near instantly, even though it had zero unit testing. this team is incredibly cooked in the long term, i just need to ensure that i survive the short term somehow.
If this is what the industry is now… this will be my last job in it.
Curse everyone involved with creating this nightmare.
Tasks where, in the past, I have thought “if I had a utility to do x it would save me y time” and I’d either start and give up or spend much longer than y on it are now super easy, create a directory, claude “create an app to do x” so simple.
Other areas of success have been just offloading the typing/prototyping. I know exactly how the code should look like so I rarely run into issues.
Stack: go, python Team size: 8 Experience, mixed.
I'm using a code review agent which sometimes catches a critical big humans miss, so that is very useful.
Using it to get to know a code base is also very useful. A question like 'which functions touch this table' or 'describe the flow of this API endpoint' are usually answered correctly. This is a huge time saver when I need to work on a code base i'm less familiar with.
For coding, agents are fine for simple straightforward tasks, but I find the tools are very myopic: they prefer very local changes (adding new helper functions all over the place, even when such helpers already exist)
For harder problems I find agents get stuck in loops, and coming up with the right prompts and guardrails can be slower than just writing the code.
I also hates how slow and unpredictable the agents can be. At times it feels like gambling. Will the agents actually fix my tests, or fuck up the code base? Who knows, let's check in 5 minutes.
IMO the worst thing is that juniors can now come up with large change sets, that seem good at a glance but then turn out to be fundamentally flawed, and it takes tons of time to review
I've become somewhat addicted to using coding agents, in the sense I've felt I can finally realize a lot of fantasies about code cleanup and modernization I've had during the decade, and also fulfill user requests, without spending a lot of time writing code and debugging. During the last few months I've been spending my weekends prompting and learning the ropes. I've been using GPT 5.x and GPT 4 before that.
I've tried both giving it big cleanup tasks, and big design tasks. It was ok but mentally very exhausting, especially as it tends to stick to my original prompt which included a lot of known unknowns, even after I told it I've settled on a design decision, and then I have to go over its generated code line-by-line and verify that earlier decisions I had already rejected aren't slipping into the code again. In some instances I've had to tell it again and again that the code it's working on is greenfield and no backwards compatibility should be kept. In other instances I had to tell it that it shouldn't touch public API.
Also, a lot of things which I take for granted aren't done, such as writing detailed comments above each piece of code that is due to a design constraint or an obscure legacy reason. Even though I explicitly prompt it to do so.
Hand-holding it is a chore. It's like coaching a junior dev. This is on top of me having 4 actual real-life junior devs sending me PRs to review each week. It's mentally exhausting. At least I know it won't take offense when I'm belittling its overly complicated code and bad design decision (which I NEVER do when reviewing PRs for the actual junior devs, so in this sense I get something to throw my aggression against).
I have tried using it to make 3 big tasks in the last 5 months. I have shelved the first one (modernizing an ancient codebase written more than 20 years ago), as it still doesn't work even after spending ~week on it, and I can't spare any more time. The second one (getting another huge C# codebase to stop rebuilding the world on every compilation) seemed promising and in fact did work, but I ended up shelving it after discovering its solution broke auto-complete in Visual Studio. A MS bug, but still.
The 3rd big task is actually a user-facing one, involving a new file format, a managed reader and a backend writer. I gave it a more-or-less detailed design document. It went pretty ok, especially after I've made the jump to GPT 5.2 and now 5.4. Both of them still tended to hallunicate too much when the code size passed a certain threshold.
I don't use it for bug fixing or small features, since it requires a lot of explaining, and not worth it. Our system has a ton of legacy requirement and backwards compatibility guarantees that would take many days to specify properly.
I've become disillusioned last week. It's all for the best. Now that my addiction has lessened maybe I can have my weekends back.
An example from last week:
Me: Do this.
AI: OK.
<Brings me code that looks like it accomplishes the task but after looking at it it’s accomplishing it in a monkey’s paw/spiteful genie kind of way.>
Me: Not quite, you didn’t take this into account. But I made the same mistake while learning so I can pull it back on track.
AI: OK
<It’s worse, and why are all the values hardcoded now?>
…
40 minutes go by. The simplest, smallest bit of code is almost right.
Me: Alright, abstract it into a Sass mixin.
AI: OK.
<Has no idea how to do it. It installed Sass, but with no understanding of what it’s working on so the mixin implementation looks almost random. Why is that the argument? What is it even trying to accomplish here?>
At which point I just give up and hand code the thing in 10 minutes.
It would be neat if AI worked. It doesn’t.
Professionally I hardly use the tools for coding, since I’m in an architecture role and mostly write design docs and do reviews. And I write the occasional prototype.
I have started building tools to integrate copilot (Opus) better with $CORP. This way I can ask it questions across confluence and github.
Leveraging Claude for a project feels very addictive to me. I have to make a conscious effort to stop and I end up working on multiple projects at the same time.
The negatives are that AI clearly loves to add code, so I do need to coach it into making nice abstractions and keeping it on track.
When I need to type stuff myself it's mostly just minor flavour changes like Claude adding docstrings in a silly way or naming test functions the wrong way - stuff that I fixed in the prompt for the next time.
And yes, I read and understand the code produced before I tag anyone to review the PR. I'm not a monster =)
This comment section is exactly the same, of course.
> I'd like to cut through the noise
Me too, but it's not happening here.
Mostly using Gemini Flash 3 at a FAANG.
- Think about requirement
- Spend 0-360 minutes looking through the code
- Start writing code
- Realize I didn't think about it quite enough and fix the design
- Finish writing code
- Write unit tests
- Submit MR
- Fix MR feedback
Until recently no LLM was able to properly disrupt that, however the release of Opus 4.5 changed that.
Now my workflow is:
- Throw as much context into Opus as possible about what I want in plan mode
- Spend 0-60 minutes refining the plan
- Have Opus do the implementation
- Review all the code and nitpick small things
- Submit MR
- Implement MR feedback
Einstein said something like: "To punish my distain for authority, God made me an authority". I feel like to push my distain for dev managers, techbro Jesus has made me a dev manager, of AI agents.
1. Correct, maintainable changes 2. Correct, not maintable changes 3. Correct diff, maintains expected system interaction 4. Correct diff, breaks system interaction.
In no way are they consistent or deterministic but _always_ convincing they are correct.
I have to very much be in the loop and constantly guiding it with clarifying questions but it has made running multiple projects in parallel much easier and has handled many tedious tasks.
The speed we can move at is astounding. We're going to finish our backlog next quarter. We're conservatively planning on launching 3x as many features next quarter.
Claude is far from perfect: it's made us reassess our coding standards since code is primarily for Claude now, not for humans. So much of what we did was to make code easier for the next dev, and that just doesn't matter anymore.
I find it the most exciting time for me as a builder, I can just get more things done.
Professionally, I'm dreading for our future, but I'm sure it will be better than I fear, worse than I hope.
From a toolset, I use the usual, Cursor (Super expensive if you go with Opus 4.6 max, but their computer use is game changing, although soon will become a commodity), Claude code (pro max plan) - is my new favorite. Trying out codex, and even copilot as it's practically free if you have enterprise GitHub. I'm going to probably move to Claude Code, I'm paying way too much for Cursor, and I don't really need tab completion anymore... once Claude Code will have a decent computer use environment, I'll probably cancel my Cursor account. Or... I'll just use my own with OpenClaw, but I'm not going to give it any work / personal access, only access to stuff that is publicly available (e.g. run sanity as a regular user). Playing with skills, subagents, agent teams, etc... it's all just markdowns and json files all the way down...
About our professional future:
I'm not going to start learning to be a plumber / electrician / A/C repair etc, and I am not going to recommend my children to do so either, but I am not sure I will push them to learn Computer Science, unless they really want to do Computer Science.
What excites me the most right now is my experiments with OpenClaw / NanoClaw, I'm just having a blast.
tl;dr most exciting yet terrifying times of my life.
POCC (Plain Old Claude Code). Since the 4.5 models, It does 90% of the work. I do a final tinkering and polishing for the PR because by this point it is easy for me to fix the code than asking the model to fix it for me.
The work: Fairly straightward UI + backend work on a website. We have designers producing Figma and we use Figma MCP to convert that to web pages.
POCC reduces the time taken to complete the work by at least 50%. The last mile problem exist. Its not a one-shot story to PR prompt. There are a few back & forths with the model, some direct IDE edits, offline tests, etc. I can see how having subagents/skills/hooks/memory can reduce the manual effort further.
Challenges: 1) AI first documentation: Stories have to be written with greater detail and acceptance criteria. 2) Code reviews: copilot reviews on Github are surprisingly insightful, but waiting on human reviews is still a bottleneck. 3) AI first thinking: Some of the lead devs are still hung up on certain best practices that are not relevant in a world where the machine generates most of the code. There is a friction in the code LLM is good at and the standards expected from an experienced developer. This creates busy work at best, frustration at worst. 4) Anti-AI sentiment: There is a vocal minority who oppose AI for reasons from craftsmanship to capitalism to global environment crisis. It is a bit political and slack channels are getting interesting. 5) Prompt Engineering: Im in EU, when the team is multi-lingual and English is adopted as the language of communication, some members struggle more than others. 6) Losing the will to code. I can't seem to make up my mind if the tech is like the invention of calculator or the creation of social media. We don't know its long term impact on producing developers who can code for a living.
Personally, I love it. I mourn for the loss of the 10x engineer, but those 10x guys have already onboarded the LLM ship.
Stack is a monolith SaaS dashboard in Vue / Typescript on the frontend, Node.js on the backend, first built in 2019, with something like 5 different frontend state management technologies. Everyone is senior level.
We use Cursor and Opus 4.6 mainly, and are trying to figure out a more agentic process so we can work on multiple tasks in parallel. Right now we are still mainly prompting.
Net negative for the ones who care and still need to work closely with others
Net positive for the ones who don't and/or are lone wolves
Maybe the future is lone wolves working on their thing without a care in the world. Accountable to no one but themselves. Bus factor dialed up to 11.
Answering your questions:
On my job we've been spoon fed to use GH copilot everywhere we can. It's been configured to review PRs, make corrections etc. - I'd say it's good enough but from time to time it will raise false positives on issues. I'd say it works fine but you still need to keep an eye on generated BS.
I've seen coworkers show me amazing stuff done with agentic coding and I've seen coworkers open up slop PRs with bunch of garbage generated code which is kind of annoying, but I'll let it slide...
Stack - .NET, Angular, SQL Server and ofc hosted in Azure.
Team is composed of about 100 engineers (devs, QA, devops etc.) and from what I can see there are no Juniors, which is sad to see if you ask me
I find the most use from it as a search engine the same way I’d google “x problem stackoverflow”.
When I was first tasked with evaluating it for programming assistance, I thought it was a good “rubber duck” - but my opinion has since changed. I found that if I documented my goals and steps, using it as a rubber duck tended to lead me away from my goals rather than refine them.
Outside of my role they can be a bit more useful and generally impressive when it comes to prompting small proof of concept applications or tools.
My general take on the current state of LLMs for programming in my role is that they are like having a junior engineer that does not learn and has a severe memory disorder.
I do a lot of green field research adjacent work, or work directly with messy code from our researchers. It's been excellent at building small tools from scratch, and for essentially brute forcing undocumented code. I can give it a prompt like "Here is this code we got from research, the docs are 3 months out of date and don't work, keep trying things until you manage to get $THING running".
Even for more production and engineering related tasks I'm finding it speeds up velocity. But my engineering is still closer to greenfield than a lot of people here.
I do however feel less connected to the code, even when reviewing thoroughly, I feel like I internalize things at a high level, rather than knowing every implementation detail off the dome.
The other downside is I get bigger and more frequent code review requests from colleagues. No on is just handing me straight up slop (yet...)
They simplify discrete tasks. Feature additions, bug fixes, augmenting functionality.
They are incapable of creating good quality (easily expandable etc) architecture or overall design, but that's OK. I write the structs, module layout etc, and let it work on one thing at a time. In the past few days, I've had it:
- Add a ribbon/cartoon mesh creator
- Fixed a logical vs physical pixel error on devices where they were different for positioning text, and setting window size
- Fixed a bug with selecting things with the mouse under specific conditions
- Fixing the BLE advertisement payload format when integrating with a service.
- Inputting tax documents for stock sales from the PDF my broker gives me to the CSV format the tax software uses
Overall, great tool! But I think a lot of people are lying about its capabilities.Software Engineering has never been more enjoyable.
Python, C++, Docker, ML infra, frontend, robotics software
I have 5 concurrent Claude Code sessions on the same mono repo.
Thank you Anthropic!
My prompts end to be in the pattern of "I am looking to implement <X>. <Detailed description of what I expect X to do.>. Review the code base to find similar examples of how this is currently done, and propose a plan for how to implement this."
These days I'm on Claude Code, and I do that first part in Plan mode, though even a few months ago on earlier, not-as-performant models and tools, I was still finding value with this approach. It's just getting better, as the company is investing in shared skills/tools/plugins/whatever the current terminology is that is specific to various use cases within the code base.
I haven't been writing so much code directly, but I do still very much feel that this is my code. My sessions are very interactive -- I ask the agent to explain decisions, question its plans, review the produced code and often revise it. I find it frees me up to spend more time thinking through and having higher level architecture applied instead of spending frustrating hours hunting down more basic "how does this work" information.
I think it might have been an article by Simon Willison that made the case for there being a way to use AI tooling to make you smarter, or to make you dumber. Point and shoot and blindly accept output makes you dumber -- it places more distance between you and your code base. Using AI tools to automate away a lot of the toil give you energy and time to dive deeper into your code base and develop a stronger mental model of how it works -- it makes you smarter. I keep in mind that at the end of the day, it's my name on the PR, regardless of how much Claude directly created or edited the files.
This fellow is one of the few mature software engineers I have ever met who is rigorously and consistently productive in a very challenging mature code base year in and year out. or WAS .. yes this is from coughgooglecough in California
In the end, I am generally using the search engine to find examples because I am too lazy to look at the source for the library I'm using, but if the choice is between an LLM that fabricates stuff some percentage of the time and just reading the fucking code like I've been doing for decades, I'd rather just take my chances with the search engine. If I'm unable to understand the code I'm reading enough to make it work, it's a good signal that maybe I shouldn't be using it at all since ultimately I'm going to be on the hook to straighten things out if stuff goes sideways.
Ultimately that's what this is all about- writing code is a big part of my career but the thing that has kept me employed is being able to figure out what to do when some code that I assembled (through some combination of experimentation, documentation, or duplication) is not behaving the way I had hoped. If I don't understand my own code chances are I'll have zero intuition about why it's not working correctly, and so the idea of introducing a bunch of random shit thrown together by some service which may or may not be able to explain it to me would be a disservice to my employers who trust me on the basis of my history of being careful.
I also just enjoy figuring shit out on my own.
I'm building out large multi-repo features in a 60 repo microservice system for my day job. The AI is very good at exploring all the repos and creating plans that cut across them to build the new feature or service. I've built out legacy features and also completely new web systems, and also done refactoring. Most things I make involve 6-8 repos. Everything goes through code review and QA. Code being created is not slop. High quality code and passes reviews as such. Any pushback I get goes back in to the docs and next time round those mistakes aren't made.
I did a demo of how I work in AI to the dev team at Math Academy who were complete skeptics before the call 2 hours later they were converts.
I just got started using Claude very recently. I have not been in the loop how much better it got. Now it's obvious that no one will write code by hand. I genuinely fear for my ability to make a living as soon as 2 years from now, if not sooner. I figure the only way is to enter the red queen race and ship some good products. This is the positive I see. If I put 30h/week into something, I have productivity of 3 people. If it's a weekend project at 10h/week, I have now what used to be that full week of productivity. The economics of developing products solo have vastly changed for the better.
- 1.5x more commits. - 2x more issues closed.
The commits are real. I'm not doing "vibe coding" or even agentic coding. I'm doing turn-by-turn where I micromanage the LLM, give specific implementation instructions, and then read and run the output before committing the code.
I'm more than happy with 2x issues closed. For my client work it means my wildly optimistic programmer estimates are almost accurate now.
I did have a frustrating period where a client was generating specs using ChatGPT. I was simply honest: "I have no idea what this nonsense means, let's meet to discuss the new requirements." That worked.
I basically don't use AI for coding at all. When I have tried it, it's just half working garbage and trying to describe what I want in natural language is just miserable. It feels like trying to communicate via smoke signals.
I'll be a classical engineer until they fire me and then go do something else. So far, that's working. We've had multiple rounds of large layoffs in the last year and somehow I'm still here.
Error messages were the "slop" of the pre-LLM era. This is where an LLM shines, filling in the gaps where software engineering was neglected.
As for writing code, I don't let it generate anything that I couldn't have written myself, or anything that I can't keep in my brain at once. Otherwise I get really nervous about committing.
The job of a software engineer does and always has relied upon taking responsibility for the quality of one's work. Whether it's auto-complete or a fancier auto-complete, the responsibility should rest on your shoulders.
I have been doing this work long enough to know how to increase human productivity. It’s not bullshit like frameworks or AI. The secret is smaller code and faster executing applications and the kind of people who prefer simple versus easy.
Some senior people that were in the AI pilot, have been using this for a while, and are very into it claimed that it can open PRs autonomously with minimum input or supervision (with a ton of MD files and skills in repos with clear architecture standards). I couldn't replicate this yet.
I'm objectively happy to have access to this tool, it feels like a cheat code sometimes. I can research things in the codebase so fast, or update tests and glue code so quickly that my life is objectively better. If the change is small or a simple bugfix it can truly do it autonomously quicker than me. It does make me lazier though, sometimes it's just easier to fire up claude than to focus and do it by myself.
I'm careful to not overuse it mostly to not reach the montlhy cap, so that I can "keep it" if something urgent or complex comes my way. Also I still like to do things by hand just because I still want to learn and maintain my skills. I feel that I'm not learning anything by using claude, that's a real thing.
In the end I feel it's a powerful tool that is here to stay and I would be upset if I wouldn't have access to it anymore, it's very good. I recently subscribed to it and use it on my free time just because it's a very fun technology to play with. But it's a tool. I'm paid because I take responsability that my work will be delivered on time, working, tested, with code on par with the org quality standards. If I do it by hand or with claude is irrelevant. If i can do it faster it will likely mean I will receive more work to do. Somebody still has to operate Claude and it's not going to be non-technical people for sure.
I genuinely think that if anyone still believes today that this technology is only hype or a slop machine, they are in denial or haven't tried to use a recent frontier model with the correct setup (mostly giving the agent a way to autonomously validate it's changes).
On the one hand, past some threshold of criticality/complexity, you can’t push AI unreviewed, on the other, you can’t relegate your senior best engineers to do nothing but review code
It doesn’t just not scale, it makes their lives miserable
So then, what’s the best approach?
I think over time that threshold I mentioned will get higher and higher, but at the moment the ratio of code that needs to be reviewed to reviewers is a little high
I have also done the agentic thing and built a full CLI tool via back-and-forth engagement with Claude and that worked great - I didn't write a single line of code. Because the CLI tool was calling an API, I could ask Claude to run the requests it was generating and adjust based on the result - errors, bad requests etc, and it would fairly rapidly fix and coalesce on a working solution.
After I was done though, I reckon that if instead of this I had just done the work myself I would have had a much smaller, more reliable project. Less error handling, no unit tests, no documentation sure, but it would have worked and worked better - I wouldn't need to iterate off the API responses because I would have started with a better contract-based approach. But all of that would have been hard, would have required more 'slow thinking'. So... I didn't really draw a clean conclusion from the effort.
Continuing to experiment, not giving up on anything yet.
But in the end, the thing that pisses me off was a manager who used it to write tickets. If the product owner doesn't give a shit about the product enough to think and write about what they want, you'll never be successful as a developer.
Otherwise it's pretty cool stuff.
One of the things that has helped the most is all the documentation I wrote inside the repository before I started using AI. It was intended for consumption by other engineers, but I think Cursor has consumed it more than any human. I've even managed to make improvements not by having AI update it, but asking AI "What unanswered questions do you have based on reading the documentation?" It has helped me fill in gaps and add clarity.
Another thing I've gotten a ton of value with is having it author diagrams. I've had it create diagrams with both the mermaid syntax and AWSDAC (Diagram-as-Code). I've always found crafting diagrams a painstaking process. I have it make a first pass by analyzing my code + configuration, then make corrections and adjustments by explaining the changes I want.
In my own PRs, I have been in the habit of posting my Cursor Plan document and Transcript so that others can learn from it. I've also encouraged other team members to do the same.
I feel bad for any teams that are being mandated to use a certain amount of AI. It seems to me that the only way to make it work is by having teams experiment with it and figure out how to best use it given their product and the team's capacity. AI is like a pair of Wile-E-Coyote rocket skates. It'll get you somewhere fast, but unless you've cleared the road of debris and pointed in exactly the right direction, you're going to careen off a cliff or into a wall.
other half: i keep fixing everything from other teams using AI otherwise i destroy my carrer.
this thread is very eye opening on how things are going.
Claude Code is the best CLI tool by a mile.
Even at its best it’s wildly inconsistent from session to session. It does things differently every time. Sometimes I get impressed with how it works, then the next day, doing the exact same thing, and it flips out and goes nuts trying to do the same thing a totally different, unworkable way.
You can capture some of these issues in AGENTS.md files or the like, but there’s an endless future supply of them. And it’s even inconsistent about how it “remembers” things. Sometimes it puts in the project local config, sometimes in my personal overall memory files, sometimes instead of using its internal systems, it asks permission to search my home directory for its memory files.
The best way to use it is for throwaway scripts or examples of how to do something. Or new, small projects where you can get away with never reading the code. For anything larger or more important, its inconsistencies make it a net time loser, imo. Sure, let it write an annoying utility function for you, but don’t just let it loose on your code.
When you do use it for new projects, make it plan out its steps in advance. Provide it with a spec full of explicit usage examples of the functionality you want. It’s very literal, so expect it to overindex on your example cases and treat those as most important. Give it a list of specific libraries or tools you want it to use. Tell it to take your spec and plan out its steps in a separate file. Then tell it to implement those steps. That usually works to allow it to build something medium-complex in an hour or two.
When your context is filling up in a session in a particular project, tell it to review its CLAUDE.md file and make sure it matches the current state of the project. This will help the next session start smoothly.
One of the saddest things I’ve found is when a whole team of colleagues gets obsessed with making Claude figure something out. Once it’s in a bad loop, you need to start over, the context is probably poisoned.
Another thing is auditing and code polishing. I asked Claude to polish a working, but still rough browser pluging, consisting of two simple Javascript files. It took ten iterations and a full day of highly intensive work to get the quality I wanted. I would say the result is good, but I could not do this process vey often without going insane. And I do not want to do this with a more complex project, yet.
So, yes, I am using it. For me it's a tool, knowledge resource, puzzle solver, code reviewer and source for inspiration. It's not a robot to write my code.
And never trust it blindly!
- create unit tests and benchmark tests that required lots of boiler plate , fixtures
- add CI / CD to a few projects that I didn't have motivation to
- freshen up old projects to modern standards (testing, CI / CD, update deps, migrations/deprecations)
- add monitoring / alerting to 2 projects that I had been neglecting. One was a custom DNS config uptime monitor.
- automated backup tools along with scripts for verifying recovery procedure.
- moderate migrations for deprecated APIs and refactors within cli and REST API services
- auditing GCP project resources for billing and security breaches
- frontend, backend and offline tiers for cloud storage management app
https://burakku.com/blog/tired-of-ai-coders/
I think the addendum to that is that I've since left.
I run a small lab that does large data analytics and web products for a couple large clients. I have 5 developers who I manage directly, I write a lot of code myself and I interact directly with my clients. I have been a web developer for long enough to have written code in coldfusion, php, asp, asp.net, rails, node and javascript through microsoft frontpage exports, to jquery,to backbone, angular and react and in a lot of different frameworks. I feel this breadth of watching the internet develop in stages has given me a decent if imperfect understanding of many of the tradeoffs that can be made in developing for the web.
My work lately is on an analytics / cms / data management / gis platform that is used by a couple of our clients and we've been developing for a couple of years before any ai was used on it all. Its a react front end built on react-router-7 that can be SPA or SSR and a node api server.
I had tried AI coding a couple times over the past few years both for small toy projects and on my work and it felt to me less productive than writing code by hand until this January when I tried Claude Code with Opus 4.5. Since then I have written very few features by hand although I am often actively writing parts of them, or debugging by hand.
I am maybe in a slightly unique place in that part of my job is coming up with tasks for other developers and making sure their code integrates back, I've been doing this for 10 years plus, and personally my sucess rate with getting someone to write a new feature that will get use is maybe a bit over 50%, that is maybe generous? Figuring out what to do next in a project that will create value for users is the hard part of my job whether I am delegating to developers or to an AI and that hasn't changed.
That being said I can move through things significantly faster and more consistently using AI, and get them out to clients for testing to see if they are going to work. Its also been great for tasks which I know my developers will groan if I assign to them. In the last couple months I've been able to
- create a new version of our server that is free from years of cruft of the monorepo api we use across all our projects. - implement sqlite compatablity for the server (in addition to original postgres support) - Implement local first sync from scratch for the project - Test out a large number of optimization strategies, not all of which worked out but which would have taken me so much longer and been so much more onerous the cost benefit ratio of engaging them would have been not worth it - Tons of small features I would have assigned to someone else but are now less effort to just have the AI implement.
I think the biggest plus though is the amount of documentation that has accrued in our repo since using starting to use these tools. I find AI is pretty great at summarizing different sections of the code and with a little bit of conversation I can get it more or less exactly how I want it. This has been hugely useful to me on a number of occasions and something I would have always liked to have been doing but on a small team that is always under pressure to create results for our clients its something that didn't cross the immediate threshold of the cost benefit ratio.
In my own use of AI, I keep the bottleneck at my own understanding of the code, its important to me that I maintain a thorough understanding of the codebase. I couple possibly go faster by giving it a longer leash, but that trade off doesn't seem wise to me at this point, first because I'm already moving so much faster than I was very recently and secondly because it doesn't seem very far from the next bottleneck which is deciding what is the next useful thing to implement. For the most part, I find the AI has me moving in the right direction almost all the time but I think this is partly for me because I am already practiced in communicating the programmers what to implement next and I have a deep understanding of the code base, but also because I spend more than half of the time using AI adding context, plans and documentation to the repo.I have encouraged my team to use these tools but I am not forcing it down anyone's throat, although its interesting to give people tasks that I am confident I could finish much quicker and much more to my personal taste than assigning it. The reactions from my team are pretty mixed, one of the strongest contributors doesn't find a lot of gains from it. One has found similar productivity gains to myself, others are very against it and hate it.
I think one of the things it will change for me is, I can no longer just create the stories for everyone, learning how to choose on what to work on is going to be the most important skill in my opinion so over the next couple months I am going to be shifting so everyone on my team has direct client interactions and I am going to try to shift away from writing stories to having meetings where I help them decide on their own what to work on. Still part of the reason that I can afford to do this is because I can now get as much or more work done than I was able to with my whole team at this time last year.
That's a big difference in one way, and I am optimistic that the platform I am working on will be a lot better and able to compete with large legacy platforms that it wouldn't have been able to compete with in the past, but still it just tightens the loop of trying new things and getting feedback and the hardest part of the business is still communication with clients and building relationships that create value.
At work we use one of the less popular solutions, available both as a plugin for vscode and as a claude code-like terminal tool. The code I work on is mostly Golang and there's some older C++ using a lot of custom libraries. For Golang, the AI is doing pretty good, especially on simple tasks like implementing some REST API, so I would estimate the upper boundary of the productivity gain to be maybe 3x for the trivial code.
Since I'm still responsible for the result, I cannot just YOLO and commit the code, so whenever I get to work on simple things, I'm effectively becoming a code reviewer for the majority of time. That is what probably prevents me from going above 3x productivity; after each code review session I still need a break so I go get coffee or something, so it's still much faster than writing all the code manually, but the mental load is also higher which requires more breaks.
One nontrivial consequence is that the expectations are adapting to the new performance, so it's not like we are getting more free time because we are producing the code faster. Not at all.
For the C++ codebase though, in the rare cases when I need to change something there, it's pretty much business as usual; I won't trust the code it generates, and would rather write what I need manually.
Now, for personal projects, it's a completely different story. For the past few months or so, I haven't written any code for my personal projects manually, except for maybe a few trivial changes. I don't review the generated code either, just making sure that it works as I expect. Since I'm probably too lazy to configure the proper multi-agent workflow, what I found works great for me is: first ask Claude for the plan, then copy-paste the plan to Codex, get its feedback back to Claude, repeat until they agree; this process also helps me stay in the loop. Then, when Claude implements the plan and makes a commit, I copy-paste the commit sha to Codex and ask it to review, and it very often finds real issues that I probably would've missed.
It's hard to estimate the productivity gain of this new process mostly because the majority of the projects I worked on these past few months I would've never started without Claude. But for those I would've started, I think I'm somewhere near 4-5x compared to manually writing the code.
One important point here is that, both at work and at home, it's never a "single prompt" result. I think about the high level design and have an understanding of how things will work before I start talking to the agent. I don't think the current state of technology allows developing things in one shot, and I'm not sure this will change soon.
My overall attitude towards AI code generation is quite positive so far: I think, for me, the joy of having something working so soon, and the fact that it follows my design, outweighs the fact that I did not actually write the code.
One very real consequence of that is I'm missing my manual code writing. I started going through the older Advent of code years where I still have some unsolved days, and even solving some Leetcode problems (only interesting ones!) just for the feeling of writing the code as we all did before.
- Continue.dev (kind of broken regardless of models)
- Aider (unpleasant for me to use, too much busywork)
- GitHub Copilot (tbh nice plugins and generous quotas + the only working autocomplete that's actually good I've tried)
- JetBrains AI and Junie (since I already pay for their IDEs, that came bundled), also nice but quotas are quite limiting
- local models with Ollama or llama.cpp - cool conceptually but always really limited
- OpenRouter for cloud models - ended up being kind of expensive and I didn't need those various models that much in the end
- Cerebras Code - really generous token limits and amazing speed, but recently more downtime and not as stable, and I realized I need SOTA models
- OpenCode - honestly pretty good
- Codex - also pretty good
Right now: - Anthropic Max (100 USD a month) subscription has pretty much replaced everything else
- Claude Code, both the CLI and GUI version has replaced everything else, good enough, also doesn't have *as many* file path issues as OpenCode (e.g. on Windows)
- still using Docker containers for builds, but also running it directly on my system because I'm lazy and stupid, no claws of any sort though
Overall thoughts on development: - even the good models will create untold amounts of slop, unless controlled
- that's why I'm creating a tool called ProjectLint for my own needs, where you can write rules in ECMAScript (Go + goja) for what a project needs - stack agnostic rules in regards to the code structure, architecture, utilities that must or must not be used, file lengths and where to put them, which tbh in practice ends up being a shitload of regexes instead of ASTs, but at the same time that's good enough - there's consistent output with suggestions of what to do for each error; LLMs love that shit
- other than that, Opus 4.6 for everything currently, really nice tool use, good web search for referencing stuff (no documentation MCPs yet to keep it light), digging into node_modules or other source code to realize what's up, often MULTIPLE parallel code review agents, since just one often isn't enough
- also you really, really need code tests and the ability to stand up a local environment - I used to hate projects before that don't have these, now I just hate them with an even more of a burning passion
- I've done in a few weeks than people do in a month, not 10x but definitely an improvement with any work that has friction in it (I probably have unmedicated ADHD tbh), though the context switching will absolutely burn people out and having the ability to write code will athropy when you're just having more work thrown at you and outsource more and more of the development to these tools, plus if they hike the platform prices that's gonna be painful too
- In plain words, for a while it's gonna be great but long term we're cooked, also interesting to see that if you try to use these tools without a modicum of actual engineering in regards to how to approach these, you will often still get shit results long term, even with good modelsIt takes a bit of hand holding and multiple loops to get things right sometimes, but even with that, it's pretty damn good. I don't usually walk away from it, I actively monitor what it's doing, peek in on the sub-agents, and interject when it goes down a wrong path or writes messy code. But more often than not, it goes like this:
- Point at a GH issue or briefly describe the task
- Either ask it to come up with a plan, or just go straight to implementation
- When done, run *multiple* code review loops with several dedicated code review agents - one for idiomatic Rails code, one for maintainabilty, one for security, and others as needed
These review loops are essential, they help clean up the code into something coherent most times. It really mirrors how I tend to approach tasks myself: Write something quickly that works, make it robust by adding tests, and then make it maintainable by refactoring. Just way faster.I've been using this approach on a side project, and even though it's only nights an weekends, it's probably the most robust, well-tested and polished solo project I've ever built. All those little nice-to-have and good-to-great things that normally fall by the wayside if you only have nights and weekends - all included now.
And the funny thing is - I feel coding with AI like this gets me in the zone more than hand-coding. I suspect it's the absence of all those pesky rabbit holes that tend to be thrown up by any non-trivial code base and tool chain which can easily distract us from thinking about the problem domain and instead solving problems of our tools. Claude deals with all that almost as a side effect. So while it does its thing, I read through it's self-talk while thinking along about the task at hand, intervening if I disagree, but I stay at the higher level of abstraction, more or less. Only when the task is basically done do I dive a level deeper into code organisation, maintainability, security, edge cases, etc. etc.
Needless to say that very good test coverage is essential to this approach.
Now, I'm very ambiguous about the AI bubble, I believe very firmly that it is one, but for coding specifically, it's a paradigm shift, and I hope it's here to stay.
Internally, we have a closed beta for what is basically a hosted Claude Code harness. It's ideal for scheduled jobs or async jobs that benefit from large amounts of context.
At a glance, it seems similar to Uber's Minion concept, although we weren't aware of that until recently. I think a lot of people have converged on the same thing.
Having scheduled roundups of things (what did I post in Slack? what did I PR in Github etc) is a nice quality of life improvement. I also have some daily tasks like "Find a subtle cloud spend that would otherwise go unnoticed", "Investigate an unresolved hotfix from one repo and provide the backstory" and "Find a CI pipeline that has been failing 10 times in a row and suggest a fix"
I work in the platform space so your mileage may vary of course. More interesting to me are the second order effects beyond my own experience:
- Hints of engineering-adjacent roles (ie; technical support) who are now empowered to try and generate large PRs implementing unscoped/ill-defined new internal services because they don't have any background to know is "good" or "bad". These sorts of types have always existed as you get people on the edge of technical-adjacent roles who aspire to become fully fledged developers without an internal support mechanism but now the barrier is a little lower.
- PR review fatigue: As a Platform Engineer, I already get tagged on acres of PRs but the velocity of PRs has increased so my inbox is still flooded with merged PRs, not that it was ever a good signal anyway.
- First hints of technical folk who progressed off the tools who might now be encouraged to fix those long standing issues that are simple in their mind but reality has shifted around a lot since. Generally LLMs are pretty good at surfacing this once they check how things are in reality but LLMs don't "know" what your mental model is when you frame a question
- Coworkers defaulting to asking LLMs about niche queries instead of asking others. There are a few queries I've seen where the answer from an LLM is fine but it lacks the historical part that makes many things make sense. As an example off the top of my head, websites often have subdomains not for any good present reason but just because back in the day, you could only have like 6 XHR connections to a domain or whatever it was. LLMs probably aren't going to surface that sort of context which takes a topic from "Was this person just a complexity lover" to "Ah, they were working around the constraints at the time".
- Obviously security is a forever battle. I think we're more security minded than most but the reality is that I don't think any of this can be 100% secure as long as it has internet access in any form, even "read only".
- A temptation to churn out side quests. When I first got started, I would tend to do work after hours but I've definitely trailed off and am back to normal now. Personally I like shipping stuff compared to programming for the sake of it but even then, I think eventually you just normalise and the new "speed" starts to feel slow again
- Privileged users generating and self-merging PRs. We have one project where most everyone has force merge and because it's internal only, we've been doing that paired with automated PR reviews. It works fairly well because we discuss most changes in person before actioning them but there are now a couple historical users who have that same permission contributing from other timezones. Waking up to a changed mental model that hasn't been discussed definitely won't scale and we're going to need to lock this down.
- Signal degradation for PRs: We have a few PRs I've seen where they provide this whole post-hoc rationalisation of what the PR does and what the problem is. You go to the source input and it's someone writing something like "X isn't working? Can you fix it?". It's really hard to infer intent and capability from PR as a result. Often the changes are even quite good but that's not a reflection of the author. To be fair, the alternative might have been that internal user just giving up and never communicating that there was an issue so I can't say this is strictly a negative.
All of the above are all things that are actively discussed internally, even if they're not immediately obvious so I think we're quite healthy in that sense. This stuff is bound to happen regardless, I'm sure most orgs will probably just paper over it or simply have no mechanism to identify it. I can only imagine what fresh hells exist in Silicon Valley where I don't think most people are equipped to be good stewarts or even consider basic ethics.
Overall, I'm not really negative or positive. There is definitely value to be found but I think there will probably be a reckoning where LLMs have temporarily given a hall pass to go faster than the support structures can keep up with. That probably looks like going from starting with a prompt for some work to moving tasks back into ticket trackers, doing pre-work to figure out the scope of the problem etc. Again, entirely different constraints and concerns with Platform BAU than product work.
Actually, I should probably rephase that a little: I'm mostly positive on pure inference while mostly negative on training costs and other societal impacts. I don't believe we'll get to everyone running Gas Town/The Wasteland nor do I think we should aspire to. I like iterating with an agent back and forth locally and I think just heavily automating stuff with no oversight is bound to fail, in the same way that large corporations get bloated and collapse under their own weight.
Honestly, the question may have been a bit more on the programming (generating lines) side, but I've always described programming as a lot like cleaning. You enter the room, figure out the nature of the mess (the interesting part) and come up with your strategy for solving it, then spend ages cleaning, sweeping or mopping up which is largely boring. Now you don't have to bother. Thanks, LLMs.
For me personally - beautifully. Saves me a ton of time. Keep in mind however that I am an old fart who was originally scientist in physics, started programming with entering machine codes and designed electronics to facilitate research and after moving to Canada switching to programming completely. So I understand how everything works starting from the very bottom and am able to see good stuff from the bullshit in what I get from AI.
I however have no idea how would youngsters train their brains when they would not have any solid foundations. I think there is a danger of collapse with people having answers to all the questions but with zero ability to validate those and the AI itself degenerating into noise as a result of more and more being able to train off it's own results.
Or the AI will eventually have intelligence, motivation and agency.
Either way we are fucked.
Sadly though my manager uses Claude for EVERYTHING and is completely careless with it. Hallucinations in performance reviews, hallucinations in documentation, trash tier PRs. He's so gung-ho that some of my peers are now also submitting Claude written PRs that they haven't even bothered to check whether they build correctly.
So the social aspect is very bad. I'm stuck correcting other people's slop and reading nonsense PRs a few times a week.
Where it consistently fails: anything involving the interaction between systems. If a bug spans a queue producer and its consumer, or the fix requires understanding how a frontend state change propagates through API calls to a cache invalidation - the model gives you a confident answer that addresses one layer and quietly ignores the rest. You end up debugging its fix instead of the original issue.
My stack: Claude Code (Opus) for investigation and bug triage in a ~60k LOC codebase, Cursor for greenfield work. Dropped autocomplete entirely after a month - it interrupted my thinking more than it helped.