The solution usually isn't "better people." It's engaging people on the same goals and making sure each of them knows how their part fits with the others. It's also recognizing when hard stuff is worth doing. Yeah you've got a module with 15 years of tech debt that you didn't create, and no-one on the team is confident in touching anymore. Unlike acne, it won't get better if you don't pick at it. Build out what that tech debt is costing the company and the risk it creates. Balance that against other goals, and find a plan that pays it down at the right time and the right speed.
I also think they tend to be the older ones among us who have seen what happens when it all goes wrong, and the stack comes tumbling down, and so want to make sure you don't end up in that position again. Covers all areas of IT from Cyber, DR, not just software.
When I have moved between places, I always try to ensure we have a clear set of guidelines in my initial 90-day plan, but it all comes back to the team.
It's been 50/50: some teams are desperate for any change, and others will do everything possible to destroy what you're trying to do. Or you have a leader above who has no idea and goes with the quickest/cheapest option.
The trick is to work this out VERY quickly!
However, when it does go really wrong, I assume most have followed the UK Post Office saga in the UK around the software bug(s) that sent people to prison, suicides, etc. https://en.wikipedia.org/wiki/British_Post_Office_scandal
I am pretty sure there would have been a small group (or at least one) of tech people in there who knew all of this and tried to get it fixed, but were blocked at every level. No idea - but suspect.
"But what about using a message queue.."
"Candidate did not use microservices.."
"Lacks knowledge of graph databases.." (you know, because I took a training last week ergo it must be the solution).
* Conway's law causing multiple different data science toolchains, different philosophies on model training, data handling, schema and protocol, data retention policies, etc.
* Coming up with tech solutions to try to mitigate the impact of multiple silos insisting on doing things their own way while also insisting that other silos do it their way because they need to access other silos' data.
And the reason standardization won't happen: the feudal lords of each of those branches of the hierarchy strongly believe their way is the only way that can meet their business/tech needs. As someone who gets to see all of those approaches - most of their approaches are both valid and flawed and often not in the way their leaders think. A few are "it's not going to work" levels of flawed as a result of an architect or leadership lacking operating experience.
So yeah, it might look like technical problems on the surface, but it's really people problems.
"The First Law of Consulting: In spite of what your client may tell you, there’s always a problem.
The Second Law of Consulting: No matter how it looks at first, it’s always a people problem." [0]
Everything he wrote is worth the time to read.
[0] Weinberg, Gerald. "The Secrets of Consulting: A Guide to Giving and Getting Advice Successfully", 1986
The old system assigned work cases out in a plain round robin system - Person 1 got Case 1, Person 2 got Case 2, etc, regardless of what people already had on their plate.
The new system looked at a number of factors and assigned a new case to people who had the least amount of overall work in their queue. So if Person 1 had 2 cases and Person 2 had 10, then Person 1 was getting the next case.
Management in one division came to us after a while and said the method of assigning cases was broken, and cases were not being assigned out "fairly." They wanted us to implement the old system's round-robin assignment method in the new system.
After some investigation I determined that workers had figured out ways to game the system in order to seem more busy than they actually were and therefore receive less new cases. As a result efficient workers who were actually doing their jobs were getting punished with new cases while inefficient workers were getting rewarded.
I, another analyst from that division, and my management laid out a very clear case that if employees were not properly handling their cases, and not being monitored on their progress (by all the new monitoring tools the new system provided) then changing the method of distributing cases wouldn't fix the underlying problem.
We were overruled and forced to implement the technical solution to the human problem.
They have no idea what's going on technically. But they know where the money is and the words that have to be spoken to certain people to get and defend that money. I have been handed a problem that was estimated to cost $6M and solved it with a text message, in the meeting. Shoulda taken the money. I have also had a project poached from me, watched the new team burn $35M and come out the other end with nothing but bruised egos.
The sponsors with the budget are definitely folks who prioritize politics over everything else. They have generally have bachelor's or master's degrees, rarely doctorates. You look at their career and wonder how they got there. Their goal is not mission success. Their goal is the next job. They've been dressing for the next job their whole career. The financial folks are afraid of them, or at least very wary.
To play Devil's Advocate here, there's a big, big difference between JavaScript-ecosystem-style framework/library/fad-of-the-month where you are nagged on a daily basis that your libraries and tools are out of date, to building everything in Go on the back of the standard library, deploying to some LTS distribution.
The benefits of technical stability for product agility are real. Yes, it may mean your codebase is in a subset of C++ and is the domain of a 50+-year-old, genuinely-Senior Engineer who's been writing C++ for more than thirty years, who you will need to sit and negotiate with to make product changes. C'est la vie. Calling that tech debt, in and of itself from the outside, is not really fair, that's ageist.
There are precisely two people who can determine whether or not there is technical debt: (a) the lead IC responsible for the codebase, according to their innate understanding of the state of the codebase, (b) the manager who is responsible for the IC who both (1) observes that release agility is at risk and (2) is willing to invest in non-functional work to get future increases to release agility.
That describes so many projects that I've seen, over the years.
One of my first programming projects, was maintaining a 100KLoC+ FORTRAN IV email program, circa 1975.
No comments, no subroutines, short, inscrutable, variables, stepped on by dozens of junior programmers, and the big breadwinner for the company.
Joy.
It was probably the single biggest motivation for my uptight coding style, since. I never want to do to others, what was done to me[0].
[0] https://littlegreenviper.com/miscellany/leaving-a-legacy/
This article has a stink of self importance that rubs me the wrong way.
But even in the case of magically fixing people problems - for example, if you are working on a solo project - you will still have technical debt because you will still have lack of knowledge. An abstraction that leaks. A test that doesn't cover all the edge cases. A "simple" function that was not indeed that simple.
The mistake you want to avoid at all costs is believing you don't have a knowledge gap. You will always have a knowledge gap. So plan accordingly, make sure you're ready when you will finally discover that gap.
Outdated may sometimes be a euphemism for one of the above but usually when I see it in a discussion it just means "old" or "out of fashion" instead.
The irony is that this is a classic engineer's take on the root cause of technical debt. Engineers are happy to be heads-down building. But when you get to a team size >1, you actually need to communicate - and ideally not just through a kanban board.
Then the author suggests that senior leadership without a tech background will usually need to be persuaded by a value proposition - the numbers.
I'm seeing these as the same thing - the risks of specific tech debt just needs to be understood before it gets addressed. Senior leaders with a development background might be better predictors of the relationship between tech debt and its impact on company finances. Non technical leaders just require an extra translation step to understand the relationship.
Then considering that some level of risk is tolerated, and some risk is consciously taken on to achieve things, both might ultimately choose to ignore some tech debt while addressing other bits.
I used to be a "stay out of politics" developer. After a few years in the industry and move to a PM role, I have had the benefit of being a bit more detached. What I noticed was that intra-developer politics are sometimes way more entrenched and stubborn than other areas of the business.
Sure, business divisions have infighting and politics but at the end of the day those are tempered by the market. It's far harder to market test Ruby Versus Java in a reasonable manner, especially when you have proponents in both camps singing the praises of their favored technology in a quasi-religious manner. And yes, I have also seen the "Why would I learn anything new, <Technology X> works for me, why would I take the effort to learn a new thing" attitudes in a large number of coworkers, even the younger Gen-Z ones.
>Why does technical debt exist? Because requirements weren't properly clarified before work began.
I hate this line of thinking and the expectations that come along with this style of work. The idea that developers need to be spoon fed requirements and only then can they start working because they fundamentally lack an understanding of the desired business outcome and their work output is so valuable that it can’t evolve as their understanding of the problem evolves _is problematic_. To be clear I’m not blaming developers but the style of work that often goes by names like waterfall, agile, SAFE, agile 2.0, transformation, etc. is all hot garbage.
> The code was calcified because the developers were also. Personality types who dislike change tend not to design their code with future change in mind.
Reasons vary widely. Code can also get calcified because people lack the vision, tech skills, or time/budget to update it. On the opposite side of the spectrum, personality types who do like change sometimes rip out everything out and start from scratch, effectively destroying well written code, which is no better.
> Why does technical debt exist? Because requirements weren't properly clarified before work began.
Not necessarily: it can also exist because code wasn't well written to begin with, libraries weren't updated to work with OS updates, feature-creep, etc.
> In my opinion, anyone above senior engineer level needs to know how to collaborate cross-functionally, regardless of whether they choose a technical or management track.
Collaboration is a skill everyone needs, and the ability to explain things to people at other levels shouldn't be limited to senior engineers. Even junior devs would do well to be able to explain things to higher-ups.
My point is, we have often discovered technical solutions for things that used to be regarded as people problems.
So maybe a lot of things are just problems, which may be solvable through either technical or people means.
1. Solving tech debt is not going to get you promotions and visibility as the article right said there is no visible difference
2. Its going to accrue continuously
3. There is no dedicated role that owns the tech debt so its not really anyones explicit responsibility as a part of job
You list the options, e.g. A combine codebases, B leave as is. Get comments and look at it logically and come to agreed decision.
You have the reason and tradeoffs documented.
The discussion may prompt other wider discussions etc. More senior people may spot patterns. E.g. lets not pay down tech debt X because Y is coming.
Culturally though if no one cares and most people are happy to manually work through the bugs and hate even discussing it then it may be a bad fit for a developer who cant stand working like that.
There are lots of good reasons tech debt exists, and it's worrying that this person seems to think that they all boil down to "I don't know how but someone, somewhere, fucked up"
https://www.amazon.com/Peopleware-Productive-Projects-Tom-De...
Just assume the other person knows, and avoid one extra people problem.
I've been on both sides. Having to beg a manager to get permission to fix a thing that I thought needed fixing. And now I'm on both sides where as a CTO it's my responsibility to make sure the company delivers working products to customers that are competitive enough that we actually stand a chance to make money. And I build our product too.
Two realities:
1) Broken stuff can actually slow down a lot of critical feature development. My attitude as a CTO is that making hard things easier is when things can move forward the fastest. Unblocking progress by addressing the hardest things is valuable. Not all technical debt is created equally. There's a difference between messing with whatever subjective esthetics I might have and shit getting delayed because technical problems are complicating our lives.
2) We're a small company. And the idiot that caused the technical debt is usually me. That's not because I'm bad at what I do but I simply don't get it right 100% of the time. Any product that survives long enough will have issues. And my company is nearly six years old now. The challenge is not that there are issues but prioritizing and dealing with them in a sane way.
How I deal with this is very simple. I want to work on new stuff that adds value whenever I can. I'm happy when I can do that and it has a high impact. Whenever some technical debt issue is derailing my plans, I get frustrated and annoyed. And then I sit down and analyze what the worst/hardest thing is that is causing that. And then I fix that. It's ultimately my call. But I can't be doing this all the time.
One important CTO level job is to keep the company ready for strategic goals and make sure we are ready for likely future changes. So I look at blocking issues from the point of view of the type of change that they block that I know I will need to do soon. This is hard, nobody will tell me what this is. It's my job to find out and know. But getting this right is the difference between failing or succeeding as a technology company.
Another perspective here is that barring any technical moat, a well funded VC-funded team could probably re-create whatever you do in no time at all. If your tech is blocking you from moving ahead, it can be sobering to consider how long it would take a team unburdened by technical debt to catch up with you and do it better. Because, if the answer is "it wouldn't be that hard" you should probably start thinking about abandoning whatever you are trying to fix and maybe rebuilding it better. Because eventually somebody else might do that and beat you. Sometimes deleting code is better than fixing it.
But in seriousness it's management failure to build up debt like that. Either self management, middle management or out of touch management. There's a reason that good managers are needed. And unfortunately most management is dealing with people and/or real-world, not a fixed in stone RFC or list of vendor requirements from legal.
It's actually quite awful to work in such an environment.
Leaders who know that it's a people problem and who have read the Jerry Weinberg book know both sides of the problem.
> Most technical problems are people problems
Certainly explains Microsoft Teams and Windows 11.
[note there is no /s -- it's 100% a people problem, because the wrong people are steering the ship]
It comes as no surprise that a worker unit who makes this conscious decision might have problems interfacing with a Homo sapiens unit.
If however, we were more technical about things during the entirety of evolution, we would exclusively have technical problems now.
So maybe it is good to start taking the technical angle.
hence the explosion in communication channels & those channels breaking down.
you don't have communication problems if say max 3 engineers are working on a product line.
Management claims to want to understand and fix the problem, and their "fixes" reveal the real problems. Fix 1 - schedule a lot of group meetings for twice a week. After week 1, management drops off and fails to show up anymore for most of them. The meetings go off track. The answer? More meetings!
We now have that meeting daily. And have even less attendance.
Fix 2 - we don't know what people are doing, let's create dashboards. A slapdash, highly incorrect and problematic dashboard is created. It doesn't matter, because none of the managers ever checks the dashboard. The big boss hears we are still behind, and commandeers a random product person to be his admin assistant and has her maintain several spreadsheets in semi-secret tracking everyone's progress.
This semi-secret spreadsheet becomes non-secret and people find a million and one problems with it (not surprising as the commandeered admin assistant nee product person was pulling the data from all sorts of random areas with little direction with little coordination with others). We then have the spreadsheet war of various managers having their own spreadsheets.
Fix 3 - we are going to have The Source of Truth for product intake and ongoing development, with a number of characteristics (and these are generally not terrible characteristics). These are handed off to a couple of junior people with no experience to implemented with zero feedback. The net result is we still don't have a Source of Truth, but more of an xkcd situation that now we have 4 or 5 sources of truth strung together with scripts, duct tape, bandaids and prayer.
This continues on and on over years. Ideas are put forth, some good, some bad, some indifferent, but none of them matter because the leaders lack the ability to followup or demonstrate even basic understanding of what our group actually does.
It is truly soul crushing, but in this jobs environment, what are you going to do?
The real cost isn't the time lost - it's decision avoidance. Teams stop touching certain modules. New features get built around the problem instead of through it. You end up with architectural scar tissue that shapes every future decision.
I've seen this play out where a 2-week refactor that everyone knows needs to happen gets deferred for years because nobody can attach a dollar figure to "we're scared to change this code." Meanwhile every sprint planning becomes a creative exercise in routing around the scary parts.
The tell is when your estimates have a silent "...assuming we don't have to touch X" attached to them.
Calling them 'people problem' is a convenient catch-all that lacks enough nuance to be a useful statement. What constitutes good communication? Are there cross purposes?
> Non-technical people do not intuitively understand the level of effort required or the need for tech debt cleanup; it must be communicated effectively by engineering - in both initial estimates & project updates. Unless leadership has an engineering background, the value of the technical debt work likely needs to be quantified and shown as business value.
The engineer will typically say that the communication needed is technical, but in fact the language that leadership works with is usually non-technical, so the translation into this field is essential. We do not need more engineers, we need engineer who know how to translate the details.
I realise that, here on HN, most will probably take the side of the rational technologist, but this is a self-validating cycle that can identify the issue, but cannot solve it.
IMO, we need more generalists that can speak both languages. I have worked hard to try and be that person, but it turns out that almost no-one wants to hire this cross-discipline communicator, so there's a good chance that I'm wrong about all of this.
That's not a people problem though. That's failure to recognize that a company pays its employees money to make more money, not to have a pretty code base.
Yes, that means communicating the value, but that's not a people problem. That's a skills issue.
"anyone above senior engineer level needs to know how to collaborate cross-functionally"
If you can't collaborate xfn and deal with other people in general, you are not a senior engineer, despite the title inflation.
Someone at some point said: ok we’re going to duplicate code, we’ll have a windows version and a Linux version, and yes it’ll be painful - for a while - but at this stage, it is the better option.
At some point getting shit done might be more important than getting it right.
Whether that is smart or not is a people problem.
Working on large legacy codebase is extremely annoying indeed, but sooner or later, in everyone’s career one has to make those sort of tradeoffs, and when that day comes maybe you’ll forgive those who came before you.
Edit: I want to add this:
Also those tradeoffs are often required because of business problems. It’s difficult to see, 10 years down the road, how shitty the business may have been when those decisions were made. And perhaps, it’s some of those business-driven decisions - like rushing the product out the door no matter what - that kept the company afloat and made it so that you have a job (albeit to fix the mess) today.
* The team used a monorepo for (nearly) all its code. The upshots of this include the ability to enforce contracts between services all in one commit, the ability to make and review cross-cutting changes all in one PR, the increased flexibility in making large-scale architecture changes, and an easier time making automations and tools which work across the system.
* We used Go, which turned out to be a really excellent fit for working within a monorepo and a large-ish codebase. Also, having the Go philosophy to lean back on in a lot of code decisions, which favors a plain and clear style, worked out well (IMO). And its great for making CLI tools, especially ones which need to concurrently chew through a big data dump.
* Our team was responsible for integrations, and we took as a first principle that synchronous commands to our API would be the rare exception. Being async-first allowed us to cater for a lot of load by spreading it out over time, rather than scaling up instances (and dealing with synchronization/timing/load explosion issues).
* We converted the bulk of our microservices into a stateless monolith. Our scalability did not suffer much, because the final Go container is still just a couple MB, and we can still easily and cheaply scale instances up when we need. But being able to just make and call a function in a domain, rather than making an api and calling another service (and dealing with issues thereof), is so much easier.
* Our team was small - for most of when I was involved, it consisted of 3 developers. Its pretty easy to talk about code stuff and make decisions if you only have to discuss it with 2 other people.
* All of us developers were open to differing ideas, and generally speaking the person who cared the most about something could go and try it. If it didn't work, there would be no love lost in replacing it later.
* We had a relatively simple architecture that was enforced generally but not stringently. What I mean by that is that issues could be identified in code review, but the issue would be a suggestion and not a blocker. Either the person changes it and its fine, or they don't, in which case you could go and change it later if you still really cared about it.
* We benefited from having some early high-impact wins in terms of productivity improvements, and we used a lot of the spare sprint time to tackle ongoing tech debt, rather than accelerate feature work (but not totally, the business gets some wins too).
* Big tech debt endeavors were discussed and planned in advance with the whole team, and we made dilligent little chips at these problems for months. Once an issue was chipped away enough to not be painful anymore, then it didn't get worked on (getting microservices into the monolith, for example, died down as an issue once we refactored most of them).
* Tech debt items were prioritized by a ranked vote made by everyone, using a tool I built: https://github.com/liampulles/go-condorcet. This did well to ensure that everyone got the opportunity to have something they cared about, get tackled. Often times our votes were very similar, which means we avoided needless arguments when we actually agreed, and recognized a common understanding. I think this contributed to continued engagement from the team on the whole enterprise.
* Our tech stack was boring and reliable, which was basically Postgres, Redis, and NATS. Though NATS did present a few issues in getting the config right (and indeed, its the least boring piece). We also used Kubernetes, which is far from boring, but we benefited from having a few people who really understood it well.
* We built a release tool CLI, and built reasonably good general error alerting for our system (based on logs mostly, but with some sentry and infra alerts as well), that made releasing things become easy. This generally increased productivity, but also meant that more releases were small releases, and were easier to revert if there were issues.
* We had a fantastic PM, who really partnered with us on the enterprise and worked hard to make our project actually Agile, even though the rest of the business was not technical.
This might be true. But I hate it. I think I should quit software engineering.
This is why I laugh when I hear someone say tech is a meritocracy. It is if you consider manipulation, exploitation, subterfuge, sabotage, and backstabbing to be of merit; otherwise, there is no meritocracy out here in the real world, not so long as any given individual of power can destroy your career or livelihood over hurt feelings.
As much as I’d love everything to be a technical problem to solve, that’s just not reality at the moment. We gotta listen to people beyond our silos and find a way to get them to our side in things if we want to progress forward on something. I’m doing that right now in a company stuck firmly in the 1990s, and it sucks.
"Death solves all problems, no man, no problem."
This article hits on a pet peeve of mine.
Many companies and individuals can benefit from better processes, communication skills, and training.
And also people who proclaim "Most technical problems are people problems" and "It's not what you know, it's who you know" are disproportionately those who are attempting to influence others that "My skillset is more valuable than your skillset." The people who believe the opposite are heads-down building.
The truth is that nearly all problems are multifactorial and involve long chains of causality. They can be patched at multiple points.
And so while there are standard stories about "If you do the 5 Why's and trace back causality, the issue becomes a deeper human problem," you can nearly always do something else and find an alternative technical solution.
The standard story goes: "This thing failed because this this other thing crashed, because this thing was misconfigured, because the deployment script was run incorrectly, because no-one trained Bob how to use it." See, the human problem is the deepest one, right?
But you can find an alternate technical fix: why was it possible to run the deployment script incorrectly?
Or you can ping-pong it back into a technical problem: he wasn't trained because everyone is stressed with no time because things keep breaking because of bad architecture and no CI. So actually the technical problem is deepest.
But no, because that only happened because the CEO hired the wrong CTO because he didn't know anyone who could evaluate it properly....
...which only happened because they didn't use <startup that helps you evaluate engineers> (technical problem)
...which only happened because said startup didn't have good enough marketing (human problem)
...which only happened because they were too slow to build from their own tech debt and didn't have the money (technical problem...)
And so on. Ping, pong.
The article says: we had too much tech debt because humans weren't trained enough.
One can also say: we had too much tech debt because we didn't have good enough linters and clone detectors to keep out the common problems, and also we had made some poor technical choices that required hiring a much larger team to begin with.
If you have a pet problem, you can always convince yourself that it's responsible for all woes. That just keeps you from seeing the holistic view and finding the best solution.
> Most technical problems are really people problems. Think about it. Why does technical debt exist? Because requirements weren't properly clarified before work began. Because a salesperson promised an unrealistic deadline to a customer. Because a developer chose an outdated technology because it was comfortable. Because management was too reactive and cancelled a project mid-flight. Because someone's ego wouldn't let them see a better way of doing things.
I mean true, technical debt is people's problem. Why it exists? Because there are not enough people in the team. Because they are not skilled enough. Because the devloper has promised they'll finish up the task before Christmas but failed to deliver.
I don't really like marketing, but they serve an important function: they convert code to money. Code itself isn't worth anything, only marketed code is worth something. That's why it's so hard to refactor.
Also there's this unofficial law in programming: code that is easy to refactor at some point is going to be replaced by code that is hard to refactor. Sometimes people misidentify what exactly is code debt and convert blocks of code that aren't code debt into exactly code debt which is later impossible to remove, because they thought they knew better.