FRESH

Hacker News

Home

How Claude Code works in large codebases

241 points by shenli3514

by jwilliams

8 subcomments

> Claude Code navigates a codebase the way a software engineer would: it traverses the file system, reads files, uses grep to find exactly what it needs, and follows references across the codebase. It operates locally on the developer’s machine and doesn’t require a codebase index to be built, maintained, or uploaded to a server....
> Agentic search avoids those failure modes. There's no embedding pipeline or centralized index to maintain as thousands of engineers commit new code. Each developer's instance works from the live codebase.
The frame of "the way a software engineer would" and the conclusion seem at odds. I'd love to be schooled otherwise?
I use autocomplete/LSPs all the time and they're useful. That's an index? Why wouldn't Claude be able to use one? Also a "software engineer" remembers the codebase - that's definitely a RAG. I have a lot of muscle memory to find the file I need through an auto-completed CMD+P.
It doesn't need to particularly be real-time across thousands of engineers -- just the branch I'm on.
It's rare that I'd be navigating a codebase from first-principles traversal. It would usually be a new codebase and in those cases it's definitely not what I'd call an optimal experience.

by sinsudo

1 subcomments

Just an anecdote: I was designing a project for LLMs onboarding and orchestration. Claude chose to read only the first 40 lines of each file. Later, in another session, looking for causes of low quality result, Claude detected the fault and changed the code to perform an AST analysis, so now the analyzer takes documentation lines and functions signature (input/output) as input.
Claude's initial approach was really poor. One has to wonder how many times Claude code has to be modified/reviewed for improvement, or whether it is possible at all to make good code with it.
Edited: Generalization: Claude can fix a localized, identifiable poor decision (e.g., "only reading first 40 lines") because the fault is discrete and traceable to one piece of code.
But real software quality problems often arise from many small, individually reasonable decisions that collectively produce bad outcomes. No single one is obviously "the fault." In that scenario, a tool that generates low-quality building blocks piecemeal may never converge on good code, because each piece seems fine in isolation.

by wg0

1 subcomments

> How claud code works in large codebases?
Simple - It even eats up to 35% five hour usage limit in first prompt even on small projects and then there's 5 minutes time out for you to respond quickly or caches would go bust and you'll pay another 12% to 15% on the next prompt.

by jameson

5 subcomments

Why can't Claude Code generate effective harness for us by inspecting the code base?
I tried defining CLAUDE.md (or AGENTS.md), skills, plugins, but I'm not getting the effectiveness others claim to be. LSP plugin for example, CC doesn't to use LSP's symbol renaming and edits file one by one slowly, or it does not invoke the skill when I explicitly ask to remember to invoke when prompt contains a specific clue.
Am I using it wrong? Is there a robust example I can copy the harness?

by thinkindie

3 subcomments

I don’t agree with the statement about indexing codebase: it works pretty well for IDEs like PHPstorm or other jetbrains IDEs

by Plywood1

0 subcomment

Claude clearly wrote this. A lot of fluff, not much substance.

by belZaah

2 subcomments

How very interesting. In an industry, where things shift around in months if not weeks, there’s been not only enough time for clear patterns to emerge but also these patterns have proven successful on large codebases. What’s the success criteria? Didn’t delete production database? Team velocity has increased? Codebase TTL has increased? Operations guys are happier?

by lebski88

0 subcomment

> That also includes codebases running on languages that teams don't always associate with AI coding tools, such as C, C++, C#, Java, PHP.
What a strange comment for them to make. Why wouldn't I expect CC to work well with those languages? What languages would I associated it with? Python and Javascript?

by ufish235

2 subcomments

How important are Claude.MD files when they don’t even describe (with concrete terms) what should even go into each one?

by bicepjai

0 subcomment

All my stress these days comes from Claude Code not following instructions, and it’s gotten worse as my codebase has grown. Don’t get me wrong, Claude is awesome and I love it. But there’s no way I’d hire Claude Code alone to maintain or add features to my codebase. They keep adding memory entries about past mistakes, but the same issue of ignoring important instructions still happens about 90% of the time. The only way to avoid it is to babysit every job and review the hell out of the output. Claude Code is great at documentation and helping you understand a large codebase, but not at making changes that require understanding the whole thing. Example: I have a registry pattern used for different entities across the codebase — around 10 of them. Claude Code went and implemented 4 separate, independent registries for those entities, even with an explicit rule saying “use this one registry pattern.” It took half a day of shouting at Claude Code to get this simple task right, and I ended up editing it myself to save the stress and time.

by eithed

0 subcomment

I ask Claude to fix given test:
- runs the test what is failing | grep "x|failing" | tail 10
- runs the test again to get the why it's failing message | tail 10
- runs the test again because tail 10 cut off the message
every time. What developers do things like this?!
I have a skill for it to not do that = save output for whatever test you run into file, read from file using whatever commands you want. Ignores the skill.
Same for debugging - something is failing. Instead of debugging given issue to see why it's failing, looking at the results it will look at the code trying to deduce why it's failing. First trace it finds that looks suspicious? "THAT'S IT, I FOUND IT. But let me reconsider." and after 15m it produces summary that is wrong. Put a debug point, look at it, then make your decisions. You have a skill to use for debugging that is phrased to do exactly that! No. I've never seen a human do things like this either.
It's maddening. It's as if, puts on tinfoil hat, it's designed to waste your tokens, while eventually accomplishing its task.

by tex0

3 subcomments

If the developer can have a local copy of the monorepo it's not a "large" codebase.

by prymitive

0 subcomment

What I’m curious about is how well LLMs do when they create something from scratch, because so far my experience was with letting it fix issues or add features to existing codebase where I already shaped the general architecture and put in a lot of guardrails. But what if the architecture is unclear and there is nothing letting agent know if change breaks something or not? My only experience with tiny codebase where it did a lot of scaffolding was poor - it did what I asked for, not what I needed. If i did more of the thinking myself I would realise it’s a code that works but doesn’t solve the problem I’m after.

by martypitt

2 subcomments

I don't have any LSP's hooked up to CC yet (going to fix that today), or particularly sophisticated CLAUDE.md files.
So, if I've read this post correctly, that means that CC is navigating my codebase today by sending lots of it up to a model, and building an understanding. Is that correct? Did I misunderstand it?
I kinda suspected there was more local inference going on somehow -- partly because the iteration times are fairly fast.

by cdnsteve

1 subcomments

Small plug for what I built:
You need a code dependency graph: https://github.com/roboticforce/remembrallmcp Ask "what breaks if I change this?"
Saves 98% token usage. Saves 95% tools being called.
Runs as an MCP server, works for 8 languages.
It just works, you need to try it.

0 subcomment

by zihotki

0 subcomment

I wonder if Anthropic tested their claims on a pro, 5x, 20x subscriptions. When you have infinite amount of free tokens it sure makes sense, you just throw tokens at the problem. But not in a limited usage scenarios it doesn't fly far..

by hbarka

0 subcomment

Interesting that MCP was mentioned over CLI. For production or controlled environments, I would not make MCP the deployment path. I would let MCP help generate or choose commands, but have the actual deployment go through CLI scripts, Git commits, and CI/CD approval.

by nilirl

0 subcomment

So ... the better you explain the codebase to the LLM the better it explains it to you?

by jb3689

0 subcomment

Yeah, this is how it factually does work, but it’s not great. That’s why so many people are having to build custom tools to fill the gaps

by yc-kraln

0 subcomment

Claude does much better when I tell it to use ripgrep and ctags ;)

by ares623

2 subcomments

Lots of concepts. Release the harness that made it possible to port Bun to Rust in 9 days. That's what everyone really wants. Then everyone can go "do that but for this other goal".

by pouyaamreji

0 subcomment

A long article about nothing, seems written by Claude itself.

by whh

2 subcomments

A lot of words for not much. The harness taxonomy is fine, but anyone using Claude Code already knows CLAUDE.md exists.

by Tsarp

8 subcomments

Wondering if enterprises have a modified version of CC that doesnt have to optimize to stop bleeding on fixed cost subscription plans.
The article really does not align with the current sentiment. Everyone with a choice has mostly moved on to codex (ofc in this world all it takes is a model update/harness update to turn things around).
CC is great at a lot of things, but repeatedly misses out reading on crucial parts of the code base, hallucinates on the work that was done and a bunch of other issues.

by svara

3 subcomments

I use Claude Code quite a bit and quite enjoy it, so I'm a bit confused by how often it's mentioned that you should have CLAUDE.md.
I mean: If there was something you could add to the prompt to consistently increase performance why isn't it in the system prompt already?
If it's all about clarifying a couple of local idiosyncrasies, shouldn't it be able to quickly get them by looking through the repo?
Does anyone have an example of a CLAUDE.md that really makes a difference for them?
In general, this article would really have profited massively from examples of good applications of those patterns.

by wood_spirit

2 subcomments

I’m super interested to know what the back and forth between models and tools really looks like in practice.
Are there any much more detailed walkthroughs of how it works and how it decides the tools to use and the grep to use etc and what the conversations actually look like?
In the UI you see just enough to know it’s doing something but you don’t really see the jumps it’s making offscreen.

by luodaint

0 subcomment

As a result of shipping a working product for six months with Claude Code integrated into SaaS production, the only surviving constraints had all been less than 50 lines: migrations' generation vs execution conventions, authentication flow invariants, internationalization setup rules. Any longer constraints would get either selectively ignored or cause confusion when starting sessions.
The important distinction: CLAUDE.md will not explain how the model understands your architecture. Rather, it will prevent certain kinds of regression from happening. "Never create a user without calling the workspace provision step" is the right constraint. "This is how our entire system works" is not – the model learns it from the codebase.
The mistake is writing constraints based on an architecture constructed with slop. The sequence is important here.

by prodigycorp

2 subcomments

This is really a zero information blog post. I want to know how they use the LSP to improve their understanding of the code base. Would be great if it was open source for us to review.
A post like this should be providing people with some reassurance about Claude's ability to understand code at a large scale. It's mostly fluff.
Edit: so I did some googling to dig around for thoughts on LSP performance and integration. the author of bun has a tweet about saying that they are a big drag on performance for no real gain and virtually all of the replies agree. Anyone else have any experience/thoughts?
https://xcancel.com/jarredsumner/status/2017704989540684176

by hansmayer

3 subcomments

A lot of words about nothing.
Meanwhile we are still waiting for these statements to come true:
https://eu.36kr.com/en/p/3648851352018565
https://www.businessinsider.com/anthropic-ceo-ai-90-percent-...
https://www.reddit.com/r/Anthropic/comments/1nemhxb/futurism...
https://medium.com/@coders.stop/dario-amodei-said-90-of-code...
https://www.youtube.com/shorts/0j1HqEEDThc
Accountability, anyone?

0 subcomment

by Kryscekk

0 subcomment

[flagged]

by xiaosong001

0 subcomment

[flagged]

by sergiopreira

0 subcomment

[flagged]

by claud_ia

0 subcomment

[flagged]

by jahala

0 subcomment

[dead]

by nielsbosma

0 subcomment

[flagged]

by jdw64

0 subcomment

[dead]

by gameboy

0 subcomment

[flagged]

by deferredgrant

0 subcomment

[flagged]

by mrasong

0 subcomment

[dead]

by phoebe_builds

0 subcomment

[dead]

by kramit1288

0 subcomment

[dead]

by baochillchill

0 subcomment

[flagged]