The kind of code that Claude produces looks almost exactly like the code I would write myself. It's like it's reading my mind. This is a game changer because I can maintain the code that Claude produces.
With Claude Code, there are no surprises. I can pretty much guess what its code will look like 90% to 95% of the time but it writes it a lot faster than I could. This is an amazing innovation.
Gemini is quite impressive as well. Nano banana in particular is very useful for graphic design.
I haven't tried Gemini with coding yet but TBH, Claude Code does such a great job; if I could code any faster, I would get decision fatigue. I don't like rushing into architecture or UX decisions. I like to sit on certain decisions for a day or two before starting implementation. Once you start in a particular direction, it's hard to undo and you may try to double down on the mistake due to sunk cost fallacy. I try hard to avoid that.
> it's not just a website you go to like Google, it's a little spirit/ghost that "lives" on your computer
> it's not just about the image generation itself, it's about the joint capability coming from text generation
There would be no reaction from me on this 3 years ago, but now this sentence structure is ruined for me
You think every Electron app out there re-inventing application UX from scratch is bad, wait until LLMs are generating their own custom UX for every single action for every user for every device. What does command-W do in this app? It's literally impossible to predict, try it and see!
I’m also sold on his take on "vibe coding" leading to ephemeral software; the idea of spinning up a custom, one-off tokenizer or app just to debug a single issue, and then deleting it, feels like a real shift.
Karpathy hints at one major capability unlock being UI generation, so instead of interacting with text the AI can present different interfaces depending on the kind of problem. That seems like a severely underexplored problem domain so far. Who are the key figures innovating in this space so far?
In the most recent Demis interview, he suggests that one of the key problems that must be solved is online / continuous learning.
Aside from that, another major issues is probably reducing hallucinations and increasing reliability. Ideally you should be able to deploy an LLM to work on a problem domain, and if it encounters an unexpected scenario it reaches out to you in order to figure out what to do. But for standard problems it should function reliably 100% of the time.
The idea of jaggedicity seems useful to advancing epistemology. If we could identify the domains that have useful data that we fail to extract, we could fill those holes and eventually become a general intelligence ourselves. The task may be as hard as making a list of your blind spots. But now we have an alien intelligence with an outside perspective. While making AI less jagged it might return the favor.
If we keep inventing different kinds of intelligence the sum of the splats may eventually become well rounded.
What is he referring to here? Is nano banana not just an image gen model? Is it because it's an LLM-based one, and not diffusion?
> LLMs are emerging as a new kind of intelligence, simultaneously a lot smarter than I expected and a lot dumber than I expected
Isn't this concerning? How can we know which one we get? In the realm of code it's easier to tell when mistakes are being made.
> regular people benefit a lot more from LLMs compared to professionals, corporations and governments
We thought this would happen with things like AppleScript, VB, visual programming. But instead, AI is currently used as a smarter search engine. The issue is that's also the area where it hallucinates the most. What do you think is the solution?
Whereas we just got the incremental progress with gpt-5 instead and it was very underwhelming. (Plus like 5 other issues at launch, but that's a separate story ;)
I'm not sure if o4-mini would have made a good default gpt though. (Most use is conversational and its language is very awkward.) So they could have just called it gpt-5 pro or something, and put it on the $20 tier. I don't know.
This would be a 100 kLOC legacy project written in C++, Python, and jQuery era Javascript circa 2010. Original devs have long left. I would rather avoid C++ as much as possible.
I've been Github Copilot (in VS Code) user since June of 2021 and still use it heavily, but the "more powerful intellisence" approach is limiting me on legacy projects.
Presumably I need to provide more context on larger projects.
I can get pretty far with just ChatGPT plus and feeding bits and pieces of project. However that seems like using the wrong tool.
Codex seems better for building things but not sure about grokking existing things.
Would Cursor be more suitable for just dumping the whole project (all languages) basically 4 different sub projects and then selectively activating what to include in queries?
- benchmarks don't mean a lot for the frontier stuff, but can be interesting for the same series of models (smaller v/s larger). reminds me of comparing clock speeds between CPUs.
- the app layer can fill the gaps to squeeze out the most for a use case, but there is still no one-size-fits-all situation.
- often the discourse here or the perspective of people building seem disconnected from an average user. a lot of discussion in the post is irrelevant for the vast majority of users. e.g. as cool as TUI can be, it is not an interface most users would gravitate towards.
while not directly related, other modalities are more exciting, and comes thanks to applying techniques for handling text to other media forms, or in conjunction.
Big media agencies that claim to use AI rely on strong creative teams who fine-tune prompts and spend weeks doing so. Even then, they don’t fully trust AI to slice long videos into shorter clips for social media.
Heavy administrative functions like HR or Finance still don’t get approval to expose any of their data to LLMs.
What I’m trying to say is that we are still in the early stages of LLM development, and as promising as this looks, it’s still far from delivering the real value that is often claimed.