For reasons which it would take a while to unpack, if is often the case that the best (or sometimes only) way to find out what programming actually needs to be done, is to program something that's not it, and then replace it. This may need to be done multiple times. Programming is only occasionally the final product, it is much more often the means of working through what it is that is actually needed. This is very difficult for the people who ask for the software, to understand, and it is quite often very difficult for the people doing the programming to understand.
Most of what is being done, during programming, is working through the problem space in a way which will make it more obvious what your mistakes are, in your understanding of the problem and what a solution would look like. Once you have arrived at that understanding, then there are a variety of ways to make what you need, but that is not the rate-limiting step.
Then I'd wager it's the same for the courses and workshop this guy is selling...an LLM can probably give me at least 75% of the financial insights for not even .1% of what this "agile coach" is asking for his workshops and courses.
Maybe the "agile coach LLM" can explain to the "coding LLM's" why they're too expensive, and then the "coding LLM's" can tell the "agile coach LLM" to take the next standby shift then, if he knows so much about code?
And then we actual humans can have a day off and relax at the pool.
People who say that haven't used today's agents enough or haven't looked closely at what they produce. The code they write isn't messy at all. It's more like asking the agent to build a building from floorplans and spec, and it produces everything in the right measurements and right colours and passes all tests. Except then you find out that the walls and beams are made of foam and the art is load-bearing. The entire construction is just wrong, hidden behind a nice exterior. And when you need to add a couple more floors, the agents can't "get through it" and neither can people. The codebase is bricked.
Today's agents are simply not capable enough - without very close and labour-intensive human supervision - to produce code that can last through evolution over any substantial period of time.
I’ve been on 2 failed projects that have been entirely AI generated and it’s not that agents slow down and you can just send more agents to work on projects for longer, it’s that they becoming completely unable to make any progress whatsoever, and whatever progress they do make is wrong.
The article is definitely written from a "high tech" industry lens. A mid-sized utility might spend $80-$150 million USD on IT capital projects in a year, but $2b on power pole maintenance. Utilities are a strong example, but any large enterprise manufacturing company is spending more on factory upgrades that programming.
> [...] built a functional replica of approximately 95% of Slack’s core product in fourteen days using LLM agents.
IT and Finance leadership and asset heavy companies are currently trying to wrap their head around the current economics of their 100+ SaaS contracts, and if it still makes sense with LLM powered developers. Can they hire developers in house to build the fraction of the tool they use from many of these companies, save on total cost and Opex?
I work with these companies a lot, and won't weigh in on the right decision. Bottom line "it depends" on many factors, some of which are not immediately obvious. The article still holds weight regardless of industries, but there is some nuance (talent availability, internal change cost, etc.) that also have to be considered.
The copy doesn’t even remotely grasp the scale of what the actual Slack sofware does in terms of scale, relaiability, observability, monitorability, maintability and pretty sure also functionality.
Author only writes about the non-dev work as difference, which seems like he doesn’t know what he’s talking about in all, and what running an application at that scale actually means.
This "clone" doesn’t get you any closer to an actualy Slack copy than a white piece of paper
I suspect this is most apparent on things like meeting culture. Something happens and all of a sudden there is another recurring meeting on the calendar, with 15 attendee's, costing x dollars in wages, that produces no value for the customers because the lesson was already learned.
Or when reacting to an incident of some sort, it's so easy to have a long list of action items that may theoretically improve the situation, but in reality are incredibly expensive for the value they produce (or the risks they reduce). It's too easy to say, we'll totally redesign the system to avoid said problem. And what worries me, is often those very expansive actions, then cause you to overlook realistic but small investments that move the needle more than you would think.
And as a hot topic I also think the costs are an input into taking on tech debt. I know we all hate tech debt with a passion, but honestly, I think of it as a tool that can be wielded responsibly or irresponsibly. But if we don't know what our attention costs, we're going to have difficulty making the responsible choices about when and where to take on this debt. And then if we're not conscious about the debt, when it comes do it stings so much harder to pay down.
Of late, I've come across a lot of ideas from Rory Sutherland and my conclusion from listening to his ideas is that there are some people, who're obsessed with numbers, because to them it's a way to find certainty and win arguments. He calls them "Finance People" (him being a Marketing one). Here's an example
"Finance people don’t really want to make the company money over time. They just thrive on certainty and predictability. They try to make the world resemble their fantasy of perfect certainty, perfect quantification, perfect measurement.
Here’s the problem. A cost is really quantifiable and really visible. And if you cut a cost, it delivers predictable gains almost instantaneously."
> Choosing to spend three weeks on a feature that serves 2% of users is a €60,000 decision.
I'd really want to hire the Oracle of a PM/ Analyst that can give me that 2% accurately even 75% of the time, and promise nothing non-linear can come from an exercise.
Unfortunately, even with all the management techniques in the world, there are just some projects that are impossible to care about. There’s simply a significantly lower cap on productivity on these projects.
Consider a team of eight engineers whose mission is to build and maintain an internal developer platform serving one hundred other engineers. This is a common organizational structure, and it is one where the financial logic is rarely examined carefully.
The team costs €87,000 per month. To justify that cost, the platform they build needs to generate at least €87,000 per month in value for the engineers who use it. The most direct way to measure that value is through time saved, since the platform’s purpose is to make other engineers more productive.
At a cost of €130,000 per year, one engineer costs approximately €10,800 per month, or around €65 per working hour. For the platform team to break even, their platform needs to save the hundred engineers they serve a combined total of 1,340 hours per month. That is 13.4 hours per engineer per month, or roughly three hours per week per person.
There's a fungibility assumption which is pervasive here. In most cases, a platform team is there not "to save time".It's there to deal with cross concerns that would be not only time consuming but could be business threatening, and in some cases, you keep there more expensive engineers that ensure that certain critical things are done right.
Too much snake oil for my taste.
Its like min-maxing a Diablo build where you want the quality of the product to be _just_ above the "acceptable" threshold but no higher because that's wasting money. Then, you're free to use all remaining points to spec into revenue.
The direct and indirect financial impact of technical decisions are indeed hard to measure. But some technical decisions definitely have greater financial impact than others. Even if it's hard to precisely quantify the financial costs/benefits of every decision. It is possible to order them relatively. X is likely to make more money than Y. So we do X first and Y later.
There is a significant amount of chance involved in whether a product/feature will even make money at all. So even good plans with measurably positive expected value could end up losing money.
Just because it's impossible to be 100% certain of the outcome of any decision. Doesn't mean we should throw the baby out with the bathwater.
I do like this bit though:
> A large codebase also carries maintenance costs that grow over time as the system becomes more complex, more interconnected, and more difficult to change safely. Every engineer added to maintain it increases coordination costs, introduces new dependencies, and adds to the organizational weight that slows decision-making. The asset and the liability exist simultaneously, and for most of the past twenty years, the financial environment masked the liability side of that equation.
And the insight that LLMs are exposing this reality is absolutely true. The funny thing is they are exposing it by accelerating both good and bad engineering practices. Teams with good engineering judgement will move faster than ever with fewer people, and teams with bad engineering judgment will bury themselves in technical debt so fast the wheels will come off.
For me, running an engineering org is primarily about talent acquisition and empowering those ICs with judgment to move quickly. How well systems and teams scale depends on the domain, product, and how it allows you to decouple things. With the right talent and empowerment there are often creative ways to make product and system tradeoffs and iterate quickly to change the shape of ROI. Any mapping to financial metrics is a hugely lossy operation that can't account for such changes. It might work in mature companies that are ossified and in the second half of their lifecycle, but in growing companies I think it's fundamentally misguided would amount to empowering the wrong people.
There totally is such a cohort. There are plenty of bootstrapped companies or startups that took only an angel round and did not benefit from the low rate environment, in fact they suffered because of the very high price of SWE labor. But those engineering managers exist and are out there right now still building efficiently, quietly growing, passionately serving customers, and keeping a close eye on the bottom line and risks because that’s their livelihood.
More common so in larger organizations than smaller ones
Here's the problem I see with how this particular article is moving though: the context of these projects are often highly technical connecting back to the human problem space. Developers sit on the technical end but they also usually have a mental model for how it connects back to the non-technical. A product manager is another addition to compensate for the user connection. Between all of these folks they can only hold so much in their head about the problem space on a day-to-day basis. And that headspace for the problem is what is critical. Management wants to try a new idea for sales? They need to take it to the team with that problem space to translate it into working code. Even with the assistance of agents, one needs to hold the important patterns in their head. And my company certainly isn't going to vibe code its way through anything regulatory, mistakes there might cost us a ton in fees and bad PR. Hell I've seen product managers sweat over the possibility of getting a few 1 star reviews on the app store.
Anyway, you still need people with context to break things down and get them out the door, the agents can just assist with the speed of the In Progress stage. And clever teams can figure out how to automate their validation (but they could already do that).
Rockstar developers often seem to be the ones who can parachute in, gain context, make changes, and leave to find another problem space. They get bogged down when they've visited 10 or more problem spaces and then they start getting called back into service. Again the agents don't change any of that, the human involved has a finite capacity for context.
Teams who structure around maintaining context might be best suited for the new world of code.
If your company runs well: won't hurt you much that you're not doing this. Otherwise this will be your end. And that really hurts because you lose the economical impact of the product and the jobs.
The argument to always go for the biggest return works OK for the first few years of high growth (though the timeline is probably greatly compressed the more you use AI), but it turns into a kind of quicksand later.
How could they not? When I penciled this out ~18 years ago, I included the amortized cost of all the interviews it took to hire a given engineer as well. It's not rocket surgery, as they say.
Money can be exchanged for goods and services.
Not sure. Because it totally depends on what they do instead. Are they utilizing two hours more every week now doing meaningful work? Or are they just taking things a bit more easy? Very hard to determine and it just makes it harder to reason about the costs and wins in these cases.
Why don't we instead focus our energies on the customer and then work our way backward into the technology. There are a lot of ways to solve problems these days. But first you want to make sure you are solving the right problem. Whether or not your solution represents a "liability" or an "asset" is irrelevant if the customer doesn't even care about it.
Maybe there’s some new paradigm that makes this true. But it doesn’t seem obviously true to me.
Humans make the best code long term when everything orbits a vision of the underlying problem space.
LLMs seem to only consider the deeper problem space when I explicitly flag it for them, otherwise they write “good enough for this situation” type code. And that stack of patches type code is exactly how the code becomes messy and complicated in the first place.
If you've ever been in a meeting with multiple L8's arguing over features, you should be able to estimate how much each hour of that meeting is costing the org.
Everyone wanted to copy Meta/Google/Oracle and have internal teams, and to me internal teams have been accountability vacuums.
People want an internal team so they can go "well if we had better tooling!" when instead they should make best with what they have.
The LLM-agent team argument also misses the core point that the engineering investment (which actually encompasses business decisions, design and much more than just programming) is what actually got Slack (or any other software product) to the point where is it is now and where it's going in the future and creating a snapshot of the current status is, while maybe not absolutely trivial, still just a tiny fraction of the progress made over the years.
Whereas Whatsapp with its 30 software engineers was the exception etc.
A chat with friends showed how there are parallels with how LLMs will happen in the short-term future - say the next 5 years - and the whole MapReduce mess. Back when Hadoop came along you built operators and these operators communicated through disk. It took years even after Spark was about for the hadoop userbase as a whole to realise that it is orders of magnitude more efficient to only communicate through disk when two operators are not colocatable on the same machine and that most operators in most pipelines can be fused together.
So for a while LLMs will be in the Hadoop phase where they are acting like junior devs and making more islands that communicate in bigger bloated codebases and then there might be a realisation in about 2030 that actually the LLMs could have been used to clean up and streamline and fuse software and approach the Whatsapp style of business impact.
"Most organizations improperly account for engineering teams and incorrectly consider both code and team growth to be assets when in fact they increase complexity..... but LLMs can fix all of this"
Wtf?
Measuring things that actually matter is a great way to improve clarity on a team, you can probably just stop reading this article at the halfway point.
EDIT:
Specifically this paragraph is insane
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
We do proxy measurements because having exact data is hard because there is more to any feature than just code.
Feature is not only code, it is also customer training, marketing - feature might be perfectly viable from code perspective but then utterly fail in adoption for reasons beyond of Product Owner control.
What I saw in comments — author is selling his consultancy/coaching and I see in comments that people who have any real world experience are also not buying it.
I do agree with his thesis in the middle, about how the ZIRP decade and the cultures that were born from that period were outrageous and cannot survive the current era. It's a brave new world, and it's not because of AI. It's because there's just not enough money flowing anymore, and what little is left is sucked up by the big boys (AI).
What experience is this guy basing this on? My guess is absolutely none at all.
Maybe this will be the case in the future, but as of right now if I cut 10 agents loose for 10 days one of our repos at work and tell them to clean it up but but keep the tests passing, we’d be drowning in support tickets.
Tests don’t cover all observable behavior. Every single production bug we’ve had made it through the test suite.
Also this guy only had a vague idea of how platform engineering teams work in large organizations.
Platform teams are the engineering org’s immune system. They’re how we fight back against the tech debt accumulated by the relentless march of features of the week.
If anything the extra code people are cranking out with AI make them more necessary.
Cost of delay: calculating the cost of delaying by a few weeks in terms of lost revenue (you aren't shipping whatever it is you are building), total life value of the product (your feature won't be delivering value forever), extra cost in staffing. You can slap a number on it. It doesn't have to be a very accurate number. But it will give you a handle on being mindful that you are delaying the moment where revenue is made and taking on team cost at the cost of other stuff on your backlog.
Option value: calculating the payoff for some feature you add to your software as having a non linear payoff. It costs you n when it doesn't work out and might deliver 10*n in value if it does. Lean 1.0 would have you stay focused and toss out the option for that potential 10x payoff. But if you do a bit of math, there probably is a lot of low hanging fruit that you might want to think about picking because it has a low cost and a potential high payoff. In the same way variability is a good thing because it gives you the option to do something with it later. A little bit of overengineering can buy you a lot of option value. Whereas having tunnel vision and only doing what was asked might opt you out of all that extra value.
A bad estimation is better than no estimation: even if you are off by 3x, at least you'll have a number and you can learn and adapt over time. Getting wildly varying estimates from different people means you have very different ideas about what is being estimated. Do your estimates in time. Because that allows you to slap a dollar value on that time and do some cost calculations. How many product owners do you know that actually do that or even know how to do that?
Don't run teams at 100% capacity. Work piles up in queues and causes delays when teams are pushed hard. The more work you pile on the worse it gets. Worse, teams start cutting corners and take on technical debt in order to clear the queue faster. Any manufacturing plant manager knows not to plan for more than 90% capacity. It doesn't work. You just end up with a lot of unfinished work blocking other work. Most software managers will happily go to 110%. This causes more issues than it solves. Whenever you hear some manager talking about crunch time, they've messed up their planning.
Stretching a team like that will just cause cycle times to increase when you do that. Also, see cost of delay. Queues aren't actually free. If you have a lot of work in progress with inter dependencies, any issues will cause your plans to derail and cause costly delays. It's actually very risky to do that if you think about it like that. If you've ever been on a team that seemingly doesn't get anything done anymore, this might be what is going on.
I like this back of the envelope math; it's hard to argue with.
I used to be a salaried software engineer in a big multinational. None of us had any notion of cost. We were doing stuff that we were paid to do. It probably cost millions. Most decision making did not have $ values on them. I've since been in a few startups. One where we got funded and subsequently ran out of money without ever bringing in meaningful revenue. And another one that I helped bootstrap where I'm getting paid (a little) out of revenue we make. There's a very direct connection between stuff I do and money coming in.
"The obvious objection is that code produced at that speed becomes unmanageable, a liability in itself. That is a reasonable concern, but it largely applies when agents produce code that humans then maintain. Agentic platforms are being iterated upon quickly, and for established patterns and non-business-critical code, which is the majority of what most engineering organizations actually maintain, detailed human familiarity with the codebase matters less than it once did. A messy codebase is still cheaper to send ten agents through than to staff a team around. And even if the agents need ten days to reason through an unfamiliar system, that is still faster and cheaper than most development teams operating today. The liability argument holds in a human-to-human or agent-to-human world. In an agent-to-agent world, it largely dissolves."
LLMs are not conscious, that means left on their own devices they will drift. I think the single most important issue when working with LLMs is that they write text without a layer that are aware what's actually being written. That state can be present in humans as well, like for example in sleepwalking.
Everyone who's tried to to complete vibe coding a somewhat larger project knows that you only get to a certain level of complexity until the model stops being able to reason about the code effectively. It starts to guess why something is not working and cannot get out of that state until guided by a human.
That is not new state in the field, I believe all programmers has at points in their career come across code that's been written with developers needing to get over a hard deadline with the result of a codebase that cannot effectively be modified.
I think for a certain subsets of programming projects some projects could possibly be vibe coded as in that code can be merged without human understanding. But it has to be very straightforward crud apps. In almost everything else you will get stopped by slop.
I suspect that the future of our profession will shift from writing code to reading code and to apply continuous judgement on architecture working together with LLMs. Its also worth keeping in mind that you cannot assign responsibility to an LLM and most human organization requires that to work.
I keep seeing this assumption that "unmanageable" caps out at "kinda hard to reason about", and anyone with experience in large codebases can tell you that's not so. There are software components I own today which require me to routinely explain to junior engineers (and indeed to my own instances of Claude) why their PR is unsound and I won't let them merge it no matter how many tests they add.
Citation needed. A human engineer can grok a lot in 10 days, and an agent can spend a lot of tokens in 10 days.
I guess his students get to relearn that on their own.
Also, any post talking about building software and then contains the suggestion that "cost per unit" is an efficiency metric needs to come to the red courtesy phone, Taylorism would like to have a chat about times gone by.
In many companies there are 3 to 5 other people per developer (QA, agile masters, PO, PM, BA, marketing, sales, customer support etc.). The costs aren't driven just by the developer salaries.
A CEO can cost as much as 10 developers, sometimes more.