Turns out we weren't opposed to bad metrics! We were just opposed to being measured! Given the chance to pick our own, we jumped straight to the same nonsense.
I have started using Claude to develop an implementation plan, but instead of making Claude implement it and then have me spend time figuring out what it did, I simply tell it to walk me through implementing it by hand. This means that I actually understand every step of the development process and get to intervene and make different choices at the point of time where it matters. As opposed to the default mode which spits out hundreds of lines of code changes which overloads my brain, this mode of working actually feels like offloading the cognitive burden of keeping track of the implementation plan and letting me focus on both the details and the big picture without losing track of either one. For truly mechanical sub-tasks I can still save time by asking Claude to do them for me.
Unless you don't review every generated line manually, and instead rely on, let's say, UI e2e testing, or perhaps unit testing (that the agents also wrote). I don't know, perhaps we are past the phase of "double check what agents write" and are now in the phase of "ship it. if it breaks, let agents fix it, no manual debugging needed!" ?
I'm so conflicted about this. On the one hand I love the buzz of feeling so productive and working on many different threads. On the other hand my brain gets so fried, and I think this is a big contributor.
the assumption to this workflow is that claude code can complete tasks with little or no oversight.
If the flow looks like review->accept, review->accept, it is manageable.
In my personal experience, claude needs heavy guidance and multiple rounds of feedback before arriving at a mergeable solution (if it does at all).
Interleaving many long running tasks with multiple rounds of feedback does not scale well unfortunately.
I can only remember so much, and at some point I spend more time trying to understand what has been done so far to give accurate feedback than actually giving feedback for the next iteration.
Where I find it incredible - learning new things, I recently started flutter/dart dev - I just ask Claude to tell me about the bits, or explaining things to me, it's truly revolutionary imho, I'm building things in flutter after a week without reading a book or manual. It's like a talking encyclopaedia, or having an expert on tap, do many people use it like this? or am I just out of the loop, I always think of Star Trek when I'm doing it. I architected / designed a new system by asking Claude for alternatives and it gave me an option I'd never considered to a problem, it's amazing for this, after all it's read all the books and manuals in the world, it's just a matter of asking the right questions.
This one's interesting to me. For a lot of my career, the act of writing the PR is the last sanity check that surfaces any weirdness or my own misgivings about my choices. Sometimes there would be code that felt natural when I was writing it and getting the feature working, and maybe that code survived my own personal round of code review... but having to write about it in plain english for the benefit of someone doing review with less context was a useful spot to do some self-reflection.
Mentioning LLM usage as a distinction is like bragging about using a modern compiler instead of writing assembly. Yeah it's faster, but so is everyone else code... Besides, I wouldn't brag about being more productive with LLMS because it's a double edge sword: it's very easy to use them, and nobody is reviewing all the lines of code you are pushing to prod (really, when was the last time you reviewed a PR generated by AI that changed 20+ files and added/removed thousands of lines of code?), so you don't know what's the long game of your changes; they seem to work now but who knows how it will turn out later?
but a chart of commits/contribs is such a lousy metric for productivity.
It's about on par with the ridiculousness of LOC implying code quality.
> I switched the build to SWC, and server restarts dropped to under a second.
What is SWC? The blog assumes I know it. Is it https://swc.rs/ ? or this https://docs.nestjs.com/recipes/swc ?
I've started to use git worktrees to parallelize my work. I spend so much time waiting...why not wait less on 2 things? This is not a solved problem in my setup. I have a hard time managing just two agents and keeping them isolated. But again, I'm the bottleneck. I think I could use 5 agents if my brain were smarter........or if the tools were better.
I am also a PM by day and I'm in Claude Code for PM work almost 90% of my day.
Solving new problems is a thing engineers get to do constantly, whereas building an agent infrastructure is mostly a one-ish time thing. Yes, it evolves, but I worry that once the fun of building an agentic engineering system is done, we’re stuck doing arguably the most tedious job in the SDLC, reviewing code. It’s like if you were a principal researcher who stopped doing research and instead only peer reviewed other people’s papers.
The silver lining is if the feeling of faster progress through these AI tools gives enough satisfaction to replace the missing satisfaction of problem-solving. Different people will derive different levels of contentment from this. For me, it has not been an obvious upgrade in satisfaction. I’m definitely spending less time in flow.
Is that how it works? Do managers claim credit for the work of those below them, despite not doing the work?
I hope they also get penalised when a lowly worker does a bad thing, even if the worker is an LLM silently misinterpreting a vague instruction.
Like thinking about it a pr skill is pretty much an antipattern even telling ai to just create a pr is faster.
I think some vibe coders should let AI teach them some cli tooling
Helped me surface an important distinction on why it doesn't really happen for me. I think there's three parts to it:
1. I work on only one thing at a time, and try to keep chunks meaty
2. I make sure my agents can run a lot longer so every meaty chunk gets the time it deserves, and I'm not babysitting every change in parallel, that would be horrible! (how I do this is what this post focuses on)
3. New small items that keep coming up / bug fixes get their own thread in the middle of the flow when they do come up, so I can fire and forget, come back to it when I have time. This works better for me because I'm not also thinking about these X other bugs that are pending, and I can focus on what I'm currently doing.
What I had to figure out was how to adapt this workflow to my strengths (I love reviewing code and working on one thing at a time, but also get distracted easily). For my trade-offs, it was ideal to offload context to agents whenever a new thing pops up, so I continue focusing on my main task.
The # of PRs might look huge (and they are to me), but I'm focusing on one big chonky thing a day, the others are smaller things, which together mean progress on my product is much faster than it otherwise would be.
This is an honest as someone who is also now doing this.
A colleague has been using Claude for this exact purpose for the past 2-3 months. Left alone, Claude just kept spewing spammy, formulaic, uninteresting summaries. E.g. phrases like "updated migrations" or "updated admin" were frequent occurrences for changes in our Django project. On the other hand, important implementation choices were left undocumented.
Basically, my conclusion was that, for the time being, Claude's summaries aren't worthy for inclusion in our git log. They missed most things that would make the log message useful, and included mostly stuff that Claude could generate on demand at any time. I.e. spam.
Overstating things of course. But paying off technical debt never felt so good. And the expected decrease in forward friction has never been so achievable so quickly.
However, I agree with you that commits are a terrible (or an unreliable) metric; more commits do not necessarily equal higher productivity.
Meanwhile in the real world the expectations shift to normalise the 10x and your boss wants to know why your output isn’t 12x like that of Max
Oh really? I enjoy doing one thing at the time, with focus.
AI, as you're using it OP, isn't make you faster, it is making you work more for the same amount of money. You burn yourself for no reason.
Is that the end game? Well why can’t the agents orchestrate the agents? Agents all the way down?
The whole agent coding scene seems like people selling their soul for very shiny inflatable balloons. Now you have twelve bespoke apps tailored for you that you don’t even care about.
Why do people do this? Why do they outsource something that is meant to have been written by a human, so that another human can actually understand what that first human wanted to do, so why do people outsource that to AI? It just doesn't make sense.
If you have the tokens for it, having a team of agents checking and improving on the work does help a lot and reduces the slop.
What I want from a PR is what's not in the patch, especially the end goal of the PR, or the reasoning for the solution represented by the changes.
> SWC removed the friction of waiting - the dead time between making a change and seeing it.
Not sure how that relates to Claude Code.
> The preview removed the friction of verifying changes - I could quickly see what’s happening.
How Claude is "verifying" UI changes is left very vague in the article.
> The worktree system removed the friction of context-switching - juggling multiple streams of work without them colliding.
Ultimately, there's only one (or two) main branches. All those changes needs to be merged back together again and they needs to be reviewed. Not sure how collisions and conflicts is miraculously solved.
Who are you creating PR descriptions for, exactly? If you consider it "drudgery", how do you think your coworkers will feel having to read pages of generic "AI" text? If reviewing can be considered "drudgery" as well, can we also offload that to "AI"? In which case, why even bother with PRs at all? Why are you still participating in a ceremony that was useful for humans to share knowledge and improve the codebase, when machines don't need any of it?
> My role has changed. I used to derive joy from figuring out a complicated problem, spending hours crafting the perfect UI. [...] What’s become more fun is building the infrastructure that makes the agents effective. Being a manager of a team of ten versus being a solo dev.
Yeah, it's great that you enjoy being a "manager" now. Personally, that is not what I enjoy doing, nor why I joined this industry.
Quick question: do you think your manager role is safe from being automated away? If machines can write code and prose now better than you, couldn't they also manage other machines into producing useful output better than you? So which role is left for you, and would you enjoy doing it if "manager" is not available?
Purely rhetorical, of course, since I don't think the base premise is true, besides the fact that it's ignoring important factors in software development such as quality, reliability, maintainability, etc. This idea that the role of an IC has now shifted into management is amusing. It sounds like a coping mechanism for people to prove that they can still provide value while facing redundancy.
Some says features. Well. Are they used. Are they beneficial in any way for our society or humanity? Or are we junk producing for the sake of producing?