I have found it is far better at understanding - and, with prodding, determining the root causes of bugs - big sprawling codebases than it is at writing anything in even simple code bases.
Recently I asked an AI to compare and contrast two implementations of the same API written in different languages to find differences, and it found some very subtle things and impressed me. It got a lot wrong, but that was because one of the implementations had lots of comments that it took at face value. I then wrote a rough spec of what the API should do and it compared the implementations to the API and found more problems. Was a learning experience for me writing specs too.
I repeated the exercise of comparing two implementations to track down a nasty one-line bug in a objc -> swift port. I wasn't familiar with the codebase, or even remember much about those languages, so it was a big boon and I didn't have to track down people who owned code until I was fairly sure that the bug had been found.
Also recently I asked an AI to compare two sets of parquet files and it did sensible things like downloading just bits of them and inspecting metadata and ended up recommending that I change some of the settings when authoring one of the sets of parquet files to dramatically improve compression. It needed esc and prodding at the halfway point but still it got there. Was great to watch.
And finally I've asked an AI a detailed question about database internals and vectorising predicates and it got talking about 'filter masks' and then, in the middle of the explanation, inserted an image to illustrate. Of 'filter masks' in the PPE sense. Hilariously wrong!
I tried using Context7 MCP with the SolidWorks docs but the results were not satisfactory. I ended up crawling SW's documentation in HTML, painstakingly translating it to markdown and organizing it to optimize greppability [1]. I then created a Claude skill to instruct CC to consult the docs before writing code [2]. It is still stubborn and sometimes does not obey my instructions but it did improve the quality of the code. Claude would need 5 to 10 rounds of debugging before getting code to compile. Now it gets it in 1 to 2 rounds.
[1] https://github.com/pedropaulovc/offline-solidworks-api-docs
[2] https://github.com/pedropaulovc/harmonic-analyzer/blob/main/...
The examples are at best loosely related to the points they're supposed to illustrate.
It's honestly so bad that I cynically suspect that this post was created solely as a way to promote click3, in the first bullet, and then 4 more bullets were generated to make it a "whole" post
> For example, the other day, it completely forgot about a database connection URL I had given it and started spitting someone else's database URL in the same session.
Something similar happened to me.
Our team managed multiple instances for our develop environment.
I was working with the instance named <product>-develop-2, and explicitly told Claude Code to use that one. $ aws ec2 describe-instances ...
<product>-develop # shared develop instance
<product>-develop-2 # a development instance where a developer can do anything
<product>-develop-3 # another development instance
<product>-staging
<product>-production
Claude used the correct instance for a while, and wrote multiple python one-off scripts to operate on it.
But at some point, without any reason, it switched the target to the shared one, <product>-develop.
I should have checked the code more carefully, but I didn't pay enough attention to the first few lines of dozens lines of code
where all config were written,
because it seemed always the same and I was mostly focuced the main function. import boto3
import os
...
AWS_REGION=xxx
AWS_PROJECT=yyy
EC2_INSTANCE=<product>-develop # <- at some point this changed without any reason
S3_BUCKET=zzz
...
def main(): # <- all my attention is here
# ~100 lines of code
As a result, it modified the shared instance and caused a little confusion to my team.
Luckily it wasn't a big issue.
But I was very scared if it targeted the production,
and now I'm paying most attention to the config part rather than the main logic.All this is awfully painful to manage with current frameworks and SDKs, somehow a weird mix of over-engineered stuff while missing the actual point of making things traceable and easy changeable once it gets complex (my unpopular personal opinion, sorry). So I have built something out of my own need and started to offer it (quite successfully so far)to family & friends to get a handle on it: Have a look: https://llm-flow-designer.com
...I've been negative on LLM use recently. I've somewhat mentally decided an LLM is a google search which tries to make you feel good (like you're collaborating with other people), and if you strip that away, you get essentially a (admittedly decent) wikipedia search on a topic. The data correlation can give new insights, but I'm struggling to see how an LLM is creating anything _new_. If the LLM is fed it's own correlated data, it gets confused after a while (e.g. context poisoning or whatever).
So if I strip away the platitudes, isn't an LLM just a wikipedia search which gets confused after a time to most people, and a research assistant which might lie to you (by also having the context get confused) after a time to researchers?