E.g, "we need to come up with a way to implement X". Person A gives their idea, person B gives another idea and so on until everybody shared their thoughts. Then someone would say "I think what person C said makes the most sense" and everybody would agree and that was it. 30 minutes to hear everybody out, 3 minutes to discuss who will do it and when and the meeting was over.
I think the biggest testament to this code base was that when junior members joined the team, they were able to follow the existing code for adding new features. It was that easy to navigate and understand the big picture of.
It was in C, but the process was expected to run for months without crashing or running out of memory; if it had an abnormal exit, you'd (the owner as set in some code thing) get an email with the core backtrace. If it had a memory leak, ops would be on you quickly. Ops was considered a primary driver of requirements to dev, and absolutely everything could be reloaded into a running server without restarting it. There was a TCP control port where a TCL interpreter inside the service was exposed and generally you wrote the (very simple, CS 101 style) TCL commands to manage the server. It was a "No Threads Kernel", scaled to the dozens or hundreds of physical machines communicating over a very well managed network, and most 1 process per core, and 1 core for the OS. The 200 or so unix developers (as we were called) had a common understanding of how the framework worked and if you were just writing app code it was basically impossible to write slow services. We had technical writers that would interview the developers and write books that could be handed to outside developers and lead to a successful integration with no developer time spent.
The NTK was primarily for sending msgs over the network - we had a principle to never write to disk (which were pretty slow in the 1990s), so everything was just a server to get network messages and then send out other messages in reponse and then assemble the replies/timeouts and send back a msg to the caller. All done over persistent connections established by the infrastructure, the applications just registered callbacks for msg type 'X' which would present one with the caller information, the msg as a buffer, and potentially a "user word" which would be a way to keep server state around between calls.
The layering, from the main select loop thru different layers of TCP connection, SSL handling (if used, not 100% was SSL in the 90s), thru persistent link handling, application msg handling, timers, memory allocation, etc. was done with such art that I felt it to be a thing of beauty.
https://github.com/cjwl/cocotron
I was looking for a way to port my native Mac Cocoa apps to Windows. I had been already disappointed by the aimless sprawl of GNUstep.
This one-person project implemented all the essential APIs for both Foundation and AppKit. Reading the code was a revelation: can it really be this simple and at the same time this effortlessly modular for cross-platform support?
I contributed a few missing classes, and successfully used Cocotron for some complex custom GUI apps that needed the dual-platform support.
Cocotron showed me that one person with a vision can build something that will rival or even outgun large teams at the big tech companies. But focus is essential. Architecture astronauts usually never get down from their high orbits to ship something.
This was ~10y ago, so my memory might not serve me well. A bit of context:
- proprietary service, written in Python, maaany KLOC,
- hundreds of engineers worked on it,
- before this framework, writing the integration tests was difficult -- you had a base framework, but the tests had no structure, everyone rolled out their own complicated way of wiring things -- very convoluted and flaky.
The new integration tests framework was build by a recently joined senior engineer. TBF, it's wrong to say that it's was a framework, if you think in the xUnit sense. This guy built a set of business components that you could connect & combine in a sound way to build your integration test. Doesn't sound like much, but it significantly simplified writing integration tests (it still had rough edges, but it was 10x improvement). It's rare to see the chaos being tamed in such elegant way.
What this guy did:
- built on top of the existing integration tests framework (didn't rollout something from zero),
- defined a clear semantic for the test components,
- built the initial set of the test components,
- held a strong ownership over the code -- through the code review he ensured that the new components follow semantics, and that each test component is covered by its own test (yep, tests for the test doubles, you don't see that very often).
Did it work well longterm? Unfortunately, no. He stayed relatively short (<2y). His framework deteriorated under the new ownership.
Travis, if you are reading this and you recognized yourself, thank you for your work!
The docco, the culture, the clarity and simplicity of design from first principles. The coherency of code across kernel, user-space, accompagnying material. The vibe.
You may know NetBSD (if, at all) as the BSD that wants to be "portable" and may dismiss it against FreeBSD's focus-claim of "performance" and OpenBSD's focus-claim of "security". Interestingly, trying to remain wildly portable requires a cleanliness inside that is very soothing.
We got a small team of competent people, with domain experts to peer code with the devs.
It was wonderful. We could test, document and clean up. Having people who knew the trade and users at hand removed second guessing.
The result was so good we found bugs even in competitors' implementations.
We also got x5 in perfs compared to the system it was replacing and more features.
It was written by one engineer, and then later refactored by a team to look like this (and many other files): https://github.com/apache/couchdb/blob/main/src/chttpd/src/c...
It's an interesting exercise to see how something grows and changes as it transitions from inspiration to real-world usage.
A large system that was originally written by only two super-productive engineers (I mean real engineers, both with PhDs in an area of Engineering). And a comparably capable and essential IT person.
The reasons for the super-productivity include one of the developers choosing great technology and using it really well, to build a foundation with "force multiplier" effects, and the other developer able to build out bulk with that, while understanding the application domain.
Another reason was understanding and being pretty fully in control of the code base, so that, as needs grew and changed, over years, someone could figure out how to do whatever was needed.
One of the costs was that most things had to be built from scratch. Over time that also proved to be an advantage, because whenever they needed (put loosely) a "framework" to something it couldn't do, they effectively owned the framework, and could make dramatic changes.
When I said "costs", I mean things like, many times they needed to make a component from scratch that would be an off-the-shelf component in some other ecosystem. So if someone looked closely at how time was sometimes spent, without really understanding it or knowing how that panned out, it would look like a cost that they could optimize away. But if they looked at the bigger picture, they'd see a few people consistently, again and again, accomplishing what you'd think would take a lot more people to do.
It helped that the first programmer also became the director for that area of business, and made sure that smart engineering kept happening.
Someone might look for a reason this couldn't work, and think of bus factor. What I think helped there was the fact that the work involved one of those niche languages that attract way more super programmers than there are jobs. "Gosh, if only we had access to a secret hiring pool of super programmers who were capable of figuring out how to take up where the other person left off, and we had a way to get them to talk with us...")
It was easy to imagine a competitor with 100 developers, not able to keep up, and at many points getting stuck with a problem that none of them were able to solve.
I consider the core Postgres codebase to be the gold standard in development even though it's in a language I do not prefer to write in if given the choice.
Shout out to the pgrx folks. You're awesome! https://github.com/pgcentralfoundation/pgrx
The free Pascal RTL seems opaque in comparison. Their reliance on and archaic help file build system keeps contributors away. Thus it's poorly documented at best.
* Creating a mutable snapshot of the entire codebase takes a second or two.
* Builds are perfectly reproducible, and happen on build clusters. Entire C++ servers with hundreds of thousands of lines of code can be built from scratch in a minute or two tops.
* The build config language is really simple and concise.
* Code search across the entire codebase is instant.
* File history loads in an instant.
* Line-by-line blame loads in a few seconds.
* Nearly all files in supported languages have instant symbol lookup.
* There's a consistent style enforced by a shared culture, auto-linters, and presubmits.
* Shortcuts for deep-linking to a file/version/line make sharing code easy-peasy.
* A ton of presubmit checks ensure uniform code/test quality.
* Code reviews are required, and so is pairing tests with code changes.
I have a Laravel project that I have maintained for a customer for seven years. The app is straightforward and allows users to create portals that list files and metadata, such as expiration dates and tags.
Every other year, they ask me to add a new batch of features or update the UI to reflect the business's branding. As the app is so small, I have the opportunity to review every part of the app and refactor or completely rewrite parts I am not happy with.
It is a joy to work on and I always welcome new requests.
Q: is it actually the code that you loved, or simply the tooling that exists?
(and if it's tooling, why can't that type of tooling be replicated for other codebases outside of google?)
The code wasn't simple, at all. It took active training of new arrivals for them to understand it. But it was very well thought out, with very few warts given the complexity, and extremely easy to extend (that was the main requirement, given constant changes in APIs and clients).
We had an API, with multiple concurrent versions, that transformed requests into an intermediate model, on which our business logic operated, later targetted external APIs (dozens of them, some REST, some SOAP, some under NDAs, some also with multiple versions), whose responses turned again into the intermediate model, with more business logic on our end, and a final response through our API. Each transaction got its context serialized so we could effectively have what was an, again improvised, "async/await"-like syntax in what was (trigger warning) C++03 code.
The person who engineered it didn't have formal CS background.
Qualitatively, I experience this in a few ways: * Codebase quality improves over time, even as codebase and team size rapidly increase * Everything is easy to find. Sub-packages are well-organised. Files are easy to search for * Scaling is now essentially solved and engineers can put 90% of their time into feature-focused work instead of load concerns
I think there are a few reasons for this:
* We have standard patterns for our common use cases * Our hiring bar is high and everyone is expected to improve code quality over time * Critical engineering decisions have been consistently well-made. For example, we are very happy to have chosen our current DB architecture, avoided GraphQL and used Rust for some performance-critical areas * A TypeScript monorepo means code quality spreads across web/mobile/backend * Doing good migrations has become a core competency. Old systems get migrated out and replaced by better, newer ones * GCP makes infra easy * All the standard best practices: code review, appropriate unit testing, feature flagging, ...
Of course, there are still some holes. We have one or two dark forest features that will eventually need refactoring/rebuilding; testing needs a little more work. But overall, I'm confident these things will get fixed and the trajectory is very good.
I've legitimately left jobs over bad code. We're talking about code that did nothing in reality. The best code bases have been ones where I've been able to lead the direction. I get to know exactly how things work. I'm privileged to have a job where I essentially created the initial framework right now .
Plus I'm fully remote, life is pretty good.
The best work codebases are ones where I may disagree with a lot of the style, but the lack of enforcement at the level of classes and functions gives us creative freedom, builds mutual respect, and enables ego-less, pragmatic, tradeoff-driven discussions about things which are actually important. If these discussions end with a majority but not universal agreement, then its fine, the minority is not offended, and we happily continue with our work.
That might also be because we're all pragmatists and somewhat cynical of shiny things, or the latest fad.
A lot of effort goes into language design and tooling to enable continuous migration of code. Rather than re-writing entire repos, existing code is continuously upgraded through semi- & fully-automated code-mods. Every day thousands of patches are landed to upgrade APIs with new type safety, security restrictions, deprecations and other code maintenance.
Most other company repos I worked on had major re-writes outside of the mainline until one day there was a sudden and often troublesome switch-over to the new code.
Code is constantly changing, and unless you have a continuous process for landing and testing those changes, you are going to suffer when you try to apply the changes needed to address the accumulated tech-debt.
My conclusion: You know the claim "any medication that really has an effect must also have side effects". I would like to adapt that for code: Any code that does a lot of useful and complex things must be an arcane, barely maintainable mess that can only be understood by deep study.
About once a year roughly, for the last couple years, the opportunity has arisen to greenfield a Go micro-service with pretty loose deadlines.
Each time I have come into it with more knowledge about what went well and what I wasn't particularly happy with the last time. Each one has been better than the last.
I've been building software professionally for twenty years, and these micro-services have been one of the few projects in that time that have had clear unified vision and time to build with constant adjustments in the name of code quality.
It was by far the most impressive piece of software engineering I’ve ever had privilege of perusing.
The other code base I really liked was QRes, ITA Software's airline reservation system written in Common Lisp. I contracted at Google a few years ago doing maintenance on this code base before it was shuttered for good. Aside from being just high-quality Lisp code, it had its own testing language that allowed new tests to be written in just a few lines or so -- tests that sent XML queries with generated data to the running system and checked for specific results in specific XPath paths. Because the system was Lisp, implementing and extending this testing language was relatively easy. Using it even more so. Truly the only system I was happy to write tests for, rather than seeing it as a necessary chore.
Everything was working against it. No RTOS. Subversion. GCC 5-point-something (I think?).
It was an incredible mass of preprocessor crimes. I'm talking about #including the same file 10 times and redefining some macros each time to make the file do something different.
It used a stackless coroutine library called Protothread, which itself is a preprocessor felony.
And yet? It was brilliant. Compilation was lightning quick. F5, lean back in your chair, and boom, you're running your code. I understand that this kind of thing is normal for web/backend/etc folks, but I yearn for the days of sub-15 second firmware compile times.
It was easy to flip a couple flags and compile it to a Win32-based simulator. Preprocessor felonies are felonies, but when you stick to a small handful of agreed-upon felonies, you can actually reap the benefits of some very sophisticated metaprogramming, while staying close enough to the hardware that it's easy to look at a datasheet and understand exactly what your code is doing.
https://git.einval.com/cgi-bin/gitweb.cgi?p=abcde.git;a=summ...
https://git.einval.com/cgi-bin/gitweb.cgi?p=abcde.git;a=blob...
...it's just so damned... like... it _is_ 5000 lines of bash, but given the constraints and the problem domain, it's an incredible (and well-organized) feat.
If I ever question "How would I do something like XXX in shell?", I can usually find some clues, inspirations, or answers in the `abcde` codebase.
It was the smallest, and most bizarre interleaved task-structure I'd seen fit into a $0.17 micro-controller. I kind of admired that the code itself formed the delays for other areas of the multiple tasks, and essentially achieved near perfect resource utilization for actual work.
Never saw anything that efficient again for several decades. =3
About codebases I’ve written code for, the best one strived for simplicity, and was driven by very strong engineers who actively considered code hygiene ( in the broadest possible sense ) a first class citizen.
And it's only had a couple of vulnerabilities, in nearly 30 years of being on the Internet. That's not quite like DJB code, but it's darn close.
My understanding is that it was written by a very small number of people who were experts in the field, as part of the standards process way back in the early 90s.
What made it great: the company is 7 years old, I work basically since inception, half year late.
The codebase had some serious problems due to migrating build systems, rapid team growth and lack of proper communication while developing important infrastructure tools, lack of foresight at the start of a crucial requirement for the whole system that became clear 3 years in.
Aaand… Due to unique sequence of events half a year ago we had to start everything from scratch! We could not use a line of code from the old repo !
This brought miracles out! We started designing the right things right. People started talking to each other on the global scale, without the code limiting them!
The old code not only had problems on the engineering level, but it held people and team responsibilities along suboptimal interface boundaries! This is the biggest magic! After the rewrite started, all the people in the team landed exactly where they wanted and their areas of responsibility became clear and highly efficient. Some people started shining like I could not imagine. Because their organizational and communication skills were limited by a huge suboptimal codebase.
Also, it is very empowering to design a system when your whole team has real experience 5 years into the future. Also, everyone turned out to be caring about code simplicity and quality, I finally feel at home.
I could not wish for a better developer experience. The sexiest C++ coding time in my life :)
When we have the opportunity to be in this context, keeping in mind what bothered us in the codebases with which we were able to work in the past, we can force ourselves not to reproduce the same errors. Like the unmaintained unit and integration tests, the lack of refactoring, other developers that use fancy technologies instead of simpler concepts more for the opportunity to play with technologies than real need..
And also, I guess, because we are more aware that the code is a reflection of the company that we want to have, that the simpler the better is a key point when we need to debug.
It is the only framework I have read top to bottom.
Also the FreeBSD kernel, if you want to see a C code base that's quite beautiful (for C).
I’m making a tool to convert data schemas to SQL via a UI for lay-users. Just like https://react-querybuilder.js.org/ which is basically a UI based SQL generator. For work.
Except my version extends the idea much further, blending Excel like functionality with functions that can act on Fields and Rules.
What makes it good?
For one, it’s a from scratch project with almost no third party libraries.
For two, I fortunately chose a recursive data structure to represent the data schema, and that has really worked out well. Early on I tried 4 other approaches to represent the data, but went back to recursive feeling it was the best choice.
Furthermore I’m using React, but it’s heavily leveraging reducers. Specifically Groups have a reduced, Rules have a reducer, and Fields have a reducer. The reducers are linked in a chain top to bottom, where changes on a Field change a Rule, and changes on a Rule change a Group. It’s been extremely clean to work with.
Because the base data schema is recursive (Groups contain groups contain groups), most of the functions that manipulate the schema are recursive. There is a real elegance to the code, each recursive function has a very obvious base case, and a very obvious recursive path.
And for the final outcome, walking the query data structure and spitting out the equivalent SQL is also recursive, and feels elegant, coming in at under 40 lines.
Literally as I’ve been writing this codebase, everything somehow perfectly fell into place. I was marvelling near the end that it felt like I chose all the best possible logic paths to build this somehow.
I’m hoping to get the okay from work to open source it (fully open source). The only cruft of the project is the types I’m using, the interface of the code could be improved with generics
For this reason I despise most modern [web] projects, which have a weak start, immediately drop into “services” and “components”, do one action per source file per 30-50 lines of code, which are mostly imports and boilerplate, and have hundreds of these files. You can never tell what it does because it does almost nothing except formalities.
I also noted a tendency to use wrong paradigms for a language. E.g. it may have no closures (imagine that in 202x) so they use events as continuations for asynchronicity, which results in a mess. Or it isn’t immutable/functional, but they pretend it is, which results in fragility.
The best projects are both close to their business and written in a paradigm of the language used.
Was there someone enforcing good practices top down?
Natural time pressure is the best bs cleaner, imo. You write effing code, maybe have few hours a week to refactor strange parts. With no time pressure a project naturally becomes massaged by all members into the “likeable” form of their age.
Being ringside for the fork of Sun's OpenSolaris, and watching absolute geniuses steward a thing like that into a useful product and a build process was foundational for my understanding of how and why process is important, as well as how to get things fixed in the real world, while not muddying the idea of the pure design principles. A balance can exist!
* software was relatively new, and well made. age is probably the main reason why it was good "at the time".
* there were good practices in place, yes, led by an opinionated team lead, which felt like it was the main cause for such a project to work well under medical standards.
I feel like I will never find another job like this one again.
No, but more seriously, I've found that familiarity with the codebase is more important than having it be perfectly engineered. Once you're really familiar with the codebase, you know where dragons be, and you can make changes more easily. And God (PM) forbid, if you ever find yourself with some extra free time you might even reduce the size of dragons over time.
This brings me to my final point. Any codebase that I really enjoyed working with was the one that was constantly evolving. I don't mean rewriting everything from scratch every few months, but as long as I have permission (and time) to refactor the things that have been bothering me for months as patterns emerge, I'm a happy bee.
It's not only just extremely well written in general (with your only chance of 'really' learning the engine to go through it, read comments, and the like) but it also defies the wisdom 'everybody knows.' That wisdom being that premature optimization is the root of all evil - like we're supposed to just benchmark things, see where the bottleneck is and then try to work on optimizing spots like that.
Unreal doesn't do this. There are countless thousands of micro-optimizations everywhere. For one striking example there's even a fairly substantial system in place that cache the conversion between a quaternion and a euler rotation. This would never, in a zillion years, be even close to a bottleneck. But with thousands of these little micro-optimizations everywhere you get a final system that just runs dramatically better than any comparable engine.
In more general terms, they've also taken advantage of the 'idiomatic flexibility' that C++ offers to create a sort of Unreal C++ language that is also just lovely to use and feels much closer to something like C# in terms of luxuries like the lack of manual memory management, garbage collection, reflection (!!), and so on. The downsides are that compile times are horrible (even though a cached compile might only take 10 or 15 seconds, it feels like forever when trying to work out one specific issue) and C++ intellisense, especially in a preprocessor heavy environment, is pretty mehhhh.
I built a few systems. The DOM, a build system that could cross compile to a completely new target phone while the phone executives were at lunch with our sales team (that’s great demo!), and an optimizing build step in Perl that rewrote the C++ to minimize bytes, cycles and joules of energy.
The rendering engine was built by a genius. It was the most efficient and accurate vector graphics renderer I have ever seen.
But of course no one could understand how this system worked.
I mean, Perl that transpiles C++ into more optimized C++ just is abusive. I was not popular with the senior devs.
The main coders including me were coop students who worked furiously at all hours for the fun of it.
But of course the product had no revenue. The wiser devs probably knew this and didn’t care.
Nevertheless, we eventually were let go during the dot.com bust and the remaining main engineers had to throw the code out and try to rewrite it in Java. That didn’t work because in that era Java was too slow to fit on a phone. Or maybe the market was dead.
I learnt a solid lesson.
What’s amazing technically is not an amazing code base.
What is an amazing code base isn’t an amazing product.
What is an amazing product isn’t an amazing business.
Literally every product I have ever seen that is profitable is not a good code base. But if it pays me to code, I love it.
I worked on Safari, and while I had to make changes to WebKit very infrequently, the quality of the code, tooling, and team were excellent.
I haven't seen that level of engineering excellence anywhere I've worked since -- it was consistent and pervasive. No one had to enforce that culture from the top because everyone bought into the vision.
[0] https://gitsell.dev/u/bitofbreeze/r/bitofbreeze/git-sell
The Mac GUSI BSD socket library is worth a mention too. I built a multithreaded Quark XPress XTension with it that used PowerPlant for UI. Quark even wanted to interview me.
Added to: a pre-Boost cross platform library for Mac (Classic) and Windows NT. It did much of what Boost did way back when.
I think a lot was just the restrictions built into the way development happens required a high level of discipline, care, and planning. Also, requirements were pretty tightly coupled to sensor platform capabilities, which are known well in advance and don't change unexpectedly, so waterfall development actually works and you don't have to deal with the chaos of not really knowing what will and won't work and customers constantly changing their minds.
Code base was overwhelmingly C++, some Fortran, a lot of it was very old. It was all developed on airgapped networks, and the difficulty of bringing in external dependencies meant there largely were not any. All of the library functionality we required was mostly developed in-house, so we had extremely well-documented and stable functions available to do damn near anything you could want, with a good chance that whoever first wrote it was still there. All development had always been tied to a ticketing system of some sort that included all of the original discussion, design documents, and that kind of thing might add process overhead upfront, but it means that forever new developers can simply read the history and learn exactly why everything works the way it works.
The system itself was very Unixy. In production, it was meant to be run as a server with many instances striped across high-performance compute nodes, but it did not have to be run that way. Every individual product flow could also be built as its own transient executable, so that working on a single component could easily be done locally. You didn't have to rebuild the world or spin up your own server. Performance requirements were enough that we had our own customized networking stack and filesystem in production, but nothing depended on this for function, so you could still develop and test on plain Linux with ext4.
The culture was terrific. We were part of one of the big five defense contractors, but an acquisition and this program was largely still staffed by employees of the original company that had been acquired. We were de facto exempted from most of the bullshit any other program had to deal with. I don't know if that was part of the original terms of being acquired or just a consequence of having so many long-time developers that couldn't afford to be lost if you subjected them to nonsense. This was the kind of project that people intentionally stayed on and retired from because the experience was so much better than any other project you could get assigned to.
Ironically, it had none of the characteristics that high-performing companies often tout. You work in private. The rest of the company, including your own management chain, doesn't even know what you're working on. You'll never get any recognition or publicity. The pay is mediocre. We weren't attracting the best and brightest from all of the world. You had to be American, have a top-secret clearance, and be geographically close enough to the facility to get there every day, so this was a pretty constrained hiring pool. I still worked with some of the smartest people and best engineers I've ever known. The upside of this kind of environment is you have no mercenaries or publicity hounds. Everyone who sticks around is a person who really loves and cares about what they're working on, and a lot of people did stick around. The sanity and organization of the code was heavily facilitated by having a whole lot of people working on it who'd been working on it for 30+ years.
The worst codebase by far was Outlook for Mac. It had code going back to Entourage, and was never properly cleaned up.
The code was fairly well organized and more importantly worked out of the box with a Makefile and IDE integration (GoLand). All it took was `git clone` and opening GoLand to get started.
For C (maybe it's C++), fluentbit seemed pretty straight forward (I don't have much experience in C though)
It is super easy to find what I am looking for.
What makes a project objectively good (from subjective experience) is a combination of code, design, documentation, and often the humans involved.
Other compilers don't even come close.
Most of the engineers were above average.
Everything was planned carefully in advance and documented carefully, down to individual requirements. We even had some nice to have, optional requirements. After literally weeks of just documents and iterating on the architecture, the code for it was written easily in a week basically in auto mode - if we had AI agents back then we could have had the AI agents write the implementation.
During the implementation we realised there were some technical shortcomings (eg. number of lights supported by the engine) and clients wanted some changes, so we had to update our bible documentation and do a few changes. Some were small, some required to swap completely the rendering engine but we managed to keep everything to the same interfaces we planned at the beginning because the architecture was rock solid and we thought carefully about extensibility.
It may have costed more in engineering time than if we threw requirements on a kanban board and iterated on them - but then we would end up with arguably a worse architecture and no documentation.
He learned from the best.
There were also lots of Ruby libraries back in the hayday of the language that were very clean for what they are. I've rarely seen programs structured so well since - some of this is essentially a forced practice by the language - if you don't do it, you'll suffer as there's no types, lsps and whatnot to otherwise save you, but still, it's worth calling out, as I think back occasionally with fond memories of being unimpeded anywhere I needed to go.
Tooling wise it's got to be noted as others have that the Google monorepo was well integrated, but it wasn't what I'd describe as "best" at all. I had a fairly broad time at Google having worked in most of the sub orgs over the years and in doing so saw a lot of the variance even with the chosen integrations. I saw teams who regularly built from no where near head cherry picks out of the repo, I ended up owning stuff that always built off of the main branches (most Googlers don't even know exist) setup before my time to avoid running into otherwise hardline constraints around single version policy that couldn't be adhered to for specific product reasons (hi cloud). Exactly how much and which parts of buganizer teams used varied hugely, which service management tools they used varied, which database systems, etc. The single patch system while also fairly well integrated combined with code review practices that often stymied medium scale refactors (large refactors within a single product area), though making large scale changes that touch every single top level directory, while a bit rough on the edges of the tooling, the tooling existed and worked really well - pushes out hundreds or even thousands of patches and automates pinging owners and informing you of failures, etc. In many ways a lot of smaller systems with far less tooling, and far far less integration were more productive - I've never seen such a bad way of _organizing work to be done_, or such a poor volume management of bugs as I have at Google in various teams/areas, despite all the tooling. Any delivery that involved client programs (desktop programs, mobile programs, etc) was always pretty remarkably awful, and only very very small teams were on the hook for making any of this better. They'd always plaster their new thing with "don't you dare make another or use the deprecated thing" meanwhile they never had enough time to make the new thing ready - a common trope in the company, but really highlights there are edges outside of the "pure saas" side of the business which were pretty bad. The best part of the monorepo and monotooling culture for the saas side is the security properties of it all, which were outstanding, and are among the reasons why I still trust Google as the holder of some main accounts. I hope they never start to lose those properties given the scale and impact they could have.
Is such ego. There is almost certainly someone in the group or behind the group enabling this greatness to take place.
Every one had good and bad features though. One or two were OS-sized and I think a codebase that compiles and links to 85GB of output for 20+ devices without being a total disaster inside is harder to do than a neat small python module or whatever.
GOOD FEATURES:
Maintenance of the build and test: I worked on tools that helped builds go faster so I saw a lot of codebases where people were not maintaining the build partly because nobody had that a s a responsibility. There was bad management of dependencies leading to build failures, poor performance, incorrect build output. Android would be a counter example to that - I don't know if people like developing in it but it was always hard to accelerate it as the maintainers fixed performance problems regularly leaving our tools with little to improve.
Using appropriate languages. Writing everything in C++ was a fad at one time. All projects work better, port better, have faster build times, are easier to test etc if they use memory safe "build once" languages to a maximum (e.g. java) and unsafe ones (e.g. C/C++ which have to be rebuilt and tested for each device/os) to a minimum. IMO Android beat Symbian amongst other reasons because it wasn't all C/C++ and that meant a lot of code didn't have to be rebuilt for different devices. This made builds faster and fast builds lead to better quality because of a short dev-test cycle.
Use of faster compilers over "better" compilers. Ultimate code performance and quality depends on a fast development cycle more (IMO) than on having the most optimizing compiler. GCC versus the older ARM compilers for example. Now the ARM compiler is based on LLVM and I know that happened indirectly from a suggestion I made to someone who then made it to ARM who then did it.
The setup and build of one codebase I worked on was as easy as one could expect, the build errored out if you tried to use the wrong tools so you never ended up debugging weird failures because of an incorrect tool in your PATH somewhere. I made this feature happen :-D. With big codebases the tools could be included in the version control system so you knew you had the right compiler, right object dumper etc. This is another strength of Android and yet I was in a project for Symbian to do the opposite because of some utter bonehead who never touched a build in his life who was trying to make a name for himself with his slimy bosses as a "doer" and "reformer."
Codebases (especially big ones) benefit a lot from some kind of codesearch/index where you could find out where some function/class/variable was defined and what version of the source base it was introduced in.
BAD FEATURES:
Exclusively Owned code - we need to know who understands code best and who is best to review it but I don't think anyone should have totally exclusive control. It was a nightmare for me at one job - trying to get another team to make some needed change (like fixing their stupid makefiles to work properly in parallel). We (build team) should have been able to do it ourselves - maybe including them in the PR. Sometimes ownership is entirely theoretical - nobody who wrote it is still employed and nobody among the notional owners understands it and none of them want to approach it within 100 metres in case it blows up and becomes their problem. I simply had to approach such code - no choice - but I kept having to send diffs to people who didn't want to bother to look at them. It was a case of pushing wet spaghetti and took forever to do very simple changes.
Insufficient tests that run infrequently. What else is there to say?
Complicated code with no "why" or "what this is for" type comments. The kind of thing you trawl around in for weeks and cannot make head nor tail of what is going on overall.
Code with so much dependency injection and general SOLID that you have to bounce all over the place to understand a very simple action.
Code where writing tests is an enormous ballache. In one Go codebase the reason was because somone decided that the standard Go practise of an array of test data being run through a kind of "test engine" was the only way anyone should be allowed to write tests. Hence you had to do lots of weird things to make your test cases into data. Generally we use a kind of "religious" approach to try to get consistency out of a group of people but then take it much too far.
codebases without automated reformatting - so everyone wastes time arguing about line spacing or camel-case names or whatever in their PRs.
The best code bases had these things in common:
1. Consistency: they had clear patterns that were followed. It doesn't matter what the pattern was; some were strictly object-oriented, some were purely functional, some used RESTful API, and others leveraged gRPC or GraphQL. The important thing was that it was all consistent across the codebase so even if you were looking at a part you'd never seen before you could still orient yourself and reason about it quickly.
2. Ownership: there was always a single individual person who was ultimately responsible for each section of code. That doesn't mean they wrote all the code themselves; they were the final arbiter on the rare occasion a truly contentious conflict arose. They also had the unilateral power to make changes within their section of the code if they felt it was needed. This was always a rarely exercised power but they could, if they had to, push a change through. There could be many such people spread out across the codebase but for each discrete part the number was always one.
3. Clear Boundaries: it was clear what each part of the codebase's purpose was, and the boundaries were rigidly enforced. Things were never tightly coupled across these boundaries. Business logic was always isolated from things like serialization/deserialization, each system was forced to maintain its own models of the world that it cared about.
The worst code bases had these things in common:
1. Lack of Eng Representation: Product controlling development schedules and insisting everything is high priority and needs done right now! Project managers who always wonder, "yes but what if 9 women could make a baby in a month. Why don't we try adding more resources? Can we just try?" Business types who see software as a cost center not a profit driver. This can also happen if your engineer leadership didn't come up through software engineering but rather QA or IT or started out in academia or is just plain and MBA type with no eng background.
2. "Time saving" Tech: We don't need a database schema, we can just go NoSQL and have JSON blobs, it will save so much time! We can share models between the front end and back end, it will save so much time! Don't write SQL, use this ORM, it will save so much time! Don't think about DevOps, just use this all in one hosting solution, it will save so much time! What about this low code/no code solution? It will save so much time!
3. Misaligned Incentives: This could be because they were contractors with no stake in the company or this could be because it was a large company and there was no realistic way they'd ever be fired. Either way there were no consequences for writing bad code.
On top of that the data which was read heavy went through several layers of cache that were very easy to understand (process, system, global - called L1, L2, L3 like a system as a CPU). The front-end had a "Konami code" that let you see the rendering hierarchy from the backend while viewing it from the frontend that let you also understand the caching state of each level of hierarchy that was rendered as well as all the CSS/JS handlers affecting the element, not unlike current "inspect" view of most browsers.
What made this all so good? A very strong foundation that was opinionated, but which was consistent. This was ~ 2012-2014 - they had Flask as an underlying framework, and much was built out on top of Python decorators.
Besides the day-to-day code - they had a bespoke packaging system, auto-scaler, and a number of other services that "just worked" - it was very different from any other startup I've worked for.
Where I work now I get to work in Haskell. And that’s been a pleasure. Recently rewrote over 1k lines. Deployed to production with one reviewer. No issues.
Just good test coverage and a great type system with no-nonsense code. Refactoring in Haskell is a heady drug.
If this was C++ or even Python there would have been much more intense reviews and probably some fail-forward patches to fix things we may have missed.
But I dunno. Maybe not. Maybe I’m just that good. /s
LOL. Obviously. Mine! :)