FRESH

Hacker News

Home

GPT-5-Codex

384 points by meetpateltech

by jumploops

5 subcomments

Interesting, the new model's prompt is ~half the size (10KB vs. 23KB) of the previous prompt[0][1].
SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).
As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite in (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.
Additionally, they claim the new model is more steerable (both with AGENTS.md and generally). In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!
[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...
[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...

by preaching5271

3 subcomments

I've been a hardcore claude-4-sonnet + Cursor fan for a long time, but in the last 2 months my usage went through the roof. I started with the basic Cursor subscription, then upgraded to pro, until I hit usage limits again. Then I started using my own Claude API key but I was still paying ~70$ / 5 days, which is not that sustainable for me. But since grok-code-fast-1 landed, I've been using it daily with Cursor and it's fantastic, fast and cheap (free so far). I've also been using GPT-5 lately through the official Codex VSCode extension, and it blows my mind. Last night I used gpt-5-medium to help me heavily refactor a react-native app, improved it's structure and overall performance, something that would've taken me at least 2 days. Now I'm testing out gpt-5-medium-codex, asked it to restructure the entire app routing, and it seems it makes a lot of tool calls, understands, executes commands, it's very organized. Overall my stack from now on is Cursor + grok-code-fast-1 for daily use, and Codex/GPT when I need the brainz. Worth noting that I abused gpt-5-medium all day long yesterday, and I never hit any kind of limit (I just used by ChatGPT Plus account), reason for which I thank the OpenAI team

by robotswantdata

4 subcomments

Codex CLI IDE just works, very impressed with the quality. If you tried it a while back and didn’t like it, try it again via the vscode extension generous usage included with plus.
Ditched my Claude code max sub for the ChatGPT pro $200 plan. So much faster, and not hit any limits yet.

by twalichiewicz

1 subcomments

It's been interesting reading this thread and seeing that others have also switched to using Codex over Claude Code. I kept running into a huge issue with Claude Code creating mock implementations and general fakery when it was overwhelmed. I spent so much time tuning my input prompt just to keep it from making things worse that I eventually switched.
Granted, it's not an apples-to-apples comparison since Codex has the advantage of working in a fully scaffolded codebase where it only has to paint by numbers, but my overall experience has been significantly better since switching.

by klipklop

2 subcomments

From my observation of the past 2 weeks is that Claude Code is getting dramatically worse and super low usage quota's while OpenAI Codex is getting great and has a very generous usage quota in comparison.
For people that have not tried it in say ~1 month, give Codex CLI a try.

by epolanski

1 subcomments

Question, how do I get the equivalent of Claude's "normal mode" in Codex CLI?
It is super annoying that it either vibe codes and just edits and use tools, or it has a plan mode, but no in-between where it asks me whether it's fine it does a or b.
I'm not understanding why it lacks such a capability, why in the world would I want to choose between having to copy paste the edits or auto accept them by default...

by stopachka

1 subcomments

Very impressive. I've been working on a shared background presence animation, and have been testing out Claude and Codex. (By shared presence, I mean imagine a page's background changing based on where everyone's cursor is)
Both were struggling yesterday, with Claude being a bit ahead. Their biggest problems came with being "creative" (their solutions were pretty "stock"), and they had trouble making the simulation.
Tried the same problem on Codex today. The design it came up with still felt a bit lackluster, but it did _a lot_ better on the simulation.

by bdcravens

0 subcomment

I literally tried out Codex for the first time this weekend, and the results were ... weird. It'll be interesting to see if it does things differently. (It was a super simple prompt, standing up a Rails app in Docker Compose with a home page and Devise; it hard-coded each file to create inside of the bootstrap.sh, instead of actually creating the files to begin with)

by supermatt

1 subcomments

I thought i would give this AI assisted coding another go, so have subscribed to ChatGPT to try out this new codex, but it just seems soooooo slooooow.
I just don't see how this can be considered productive - I am waiting 20 minutes staring at it "thinking" while it does trivial tasks on a virtually bare repo. I guess for async agents its not such a big deal if they are slow as molasses, as you can run dozens of them, but you need a structured codebase for that - and i am already hours in and haven't even gotten a skeleton.
I have read through all the docs, watched the video. It would be so much quicker just to write this code myself. What am I doing wrong? Is it just super slow because they are over capacity or is this just the current state of the art?

by kelvinjps

0 subcomment

I bought chatgpt last month and I think that openai is doing things right now, mostly in the experience, for example it has a better voice mode than Claude's and I liked their new model names than the confusing ones they used to have, it simplified the whole thing. Also as a general assistant is better too, for comparison Claude for non code things is not that very good. And openai keep releasing tools and seems more reliable in their tools

by Tiberium

0 subcomment

Only an 1.7% upgrade on SWE-Bench compared to GPT-5, but 33.9 vs 51.3% on their internal code refactoring benchmark. This seems like an Opus 4.1-like upgrade, which is nice to see and means they're serious about Codex.

by simianwords

0 subcomment

The code review thing might be my favorite UX for AI based development. Largely stays out of your way and provides good comments.
I’m imagining if it can navigate the codebase and modify tests - like add new cases or break the tests by changing a few lines. This can actually verify if the tests were doing actual assertions and being useful.
Thorough reviewing like this probably benefits me the most - more than AI assisted development.

by gosasan

0 subcomment

I’ve been using Claude Code ($20/month) for about two weeks now, and with one of the token usage monitors I can handle most of what I need. I also have the $20/month ChatGPT plan. I know I could try Codex CLI, but I’ve been hesitant since I’ve seen people suddenly hit the limit and get locked out for a week. The problem is that there’s no way to check usage. So I’m wondering if this update improves token usage management, or if there’s now a way to actually see our usage so we don’t end up tripping the limit without any warning.

by thurn

0 subcomment

Does the "caching containers for Codex Cloud" mean I have some chance of being able to reuse build artifacts between tasks? My Rust project takes around 20 minutes to set up from scratch in a new Codex environment, which seems extremely expensive.

by kerpal

0 subcomment

First time I was able to install Codex with no NPM errors. Gonna give it a shot, seems a lot slower than Claude Code but I'm only using basic Pro with ChatGPT vs. Max 100.

by ryandetzel

0 subcomment

I don't know. Something that takes CC about a minute is now on minute 15 with this new model...

by georgeofjungle7

0 subcomment

Cool upgrade, but I wonder how this plays with existing tools like Copilot and Cursor. Everyone’s moving toward “AI pair programmer in every IDE,” and it feels like the competition is less about raw model quality now and more about integration + workflow lock-in. Codex getting everywhere (terminal, GitHub, phone) sounds powerful

by inerte

2 subcomments

Oh since when Codex cli is now included as part of a ChatGPT plan? 99% sure that wasn't the case before. Time to try to use it for real.

0 subcomment

by alvis

1 subcomments

It's interest to see this quote: `for the bottom 10% of user turns sorted by model-generated tokens (including hidden reasoning and final output), GPT‑5-Codex uses 93.7% fewer tokens than GPT‑5`
It sounds like it can make simple tasks much more correct. It's impressive to me. Today coding agent tends to pretend they're working hard by generating lots of unnecessary code. Hope it's true

by sanex

1 subcomments

I've considered swapping to Claude since the last update made talking to gpt absolutely terrible. I heavily make use of being able to put in PRs on mobile by working with codex, and if it wasn't for this I'd probably have switched. Excited to see the updates.

by ianbutler

1 subcomments

I just want the codex models in the API, I won’t touch them until then.
And before someone says it, I do happen to have my own codex like environment complete with development containers, browser, github integration, etc.
And I'm happy to pay a mint for access to the best models.

by codybontecou

1 subcomments

Do they have a GitHub action to run in GitHub, similar to Claude?

by Topfi

1 subcomments

One major improvement I have seen today, even before I saw the announcement, was that the model is far more reliable in using the Task Completion interface to communicate what stage of the prompt is being implemented. Previously this was only shown sparingly (especially in the first few weeks) and if, it didn't properly tick tasks, simply jumping from the first to completion at the end. Now this works very reliably and I do like this improvement, but if I didn't know better, would have suspected this was merely the result of a system prompt change, considering GPT-5 adherence being very solid in my experience, this should have been fixable without a tuned model. Nevertheless, I like this improvement (arguably fix of a previously broken feature).
Beyond that, purely anecdotal and subjective, but this model does seem to do extensive refactors with semi precise step-by-step guidance a bit faster (comparing GPT-5 Thinking (Medium) and GPT-5 Codex (Medium)), though adherence to prompts seems roughly equivalent between the two as of now. In any case, I really feel they should consider a more nuanced naming convention.
New Claude Sonnet 3.7 was a bit of a blunder, but overall, Anthropic has their marketing in tight order compared to OpenAI. Claude Code, Sonnet, Opus, those are great, clear differentiating names.
Codex meanwhile can mean anything from a service for code reviews with Github integration to a series of dedicated models going back to 2021.
Also, while I do enjoy the ChatGPT app integration for quick on-the-go work made easier with a Clicks keyboard, I am getting more annoyed by the drift between Codex VSCode, Codex Website and Codex in the ChatGPT mobile app. The Website has a very helpful Ask button, which can also be used to launch subtasks via prompts written by the model, but such a button is not present in the VSCode plugin, despite subtasks being something you can launch from the VSCode plugin if you have used Ask via the website first. Meanwhile, the iOS app has no Ask button and no sub task support and neither the app, nor VSCode plugin show remote work done beyond abbreviations, whereas the web page does show everything. Then there are the differences between local and remote via VSCode and the CLI, ... To people not using Codex, this must sound insane and barely understandable, but it seems that is the outcome of spreading yourself across so many fields. CLI, dedicated models, VSCode plugin, mobile app, code review, web page, some like Anthropic only work on one or two, others like Augment three, but no one else does that much, for better and worse.
I like using Codex, but it is a mess with such massive potential that needs a dedicated team lead whose only focus is to untangle this mess, before adding more features. Alternatively, maybe interview a few power user on their actual day to day experience, those that aren't just in one, but are using multiple or all parts of Codex. There is a lot of insight to be gained from someone who has an overview off the entire product stack, I think. Sending out a questionnaire to top users would be a good start, I'd definitely answer.

by bezzi

2 subcomments

is this model just super slow to anyone else?

by simianwords

1 subcomments

OpenAI is starting its new era of specialized models. Guess they gave up on a monolithic model approach

0 subcomment

by arthurcolle

0 subcomment

Agent-1

by esafak

0 subcomment

Do they demand biometrics to use it?

by king_magic

1 subcomments

Doesn't seem ready for prime-time. I'll be impressed when it actually installs.
npm ERR! code 1 npm ERR! path /usr/local/lib/node_modules/@openai/codex/node_modules/@vscode/ripgrep npm ERR! command failed npm ERR! command sh -c node ./lib/postinstall.js npm ERR! /usr/local/lib/node_modules/@openai/codex/node_modules/@vscode/ripgrep/lib/download.js:199 npm ERR! zipFile?.close();

by j45

1 subcomments

Has anyone hit any programming usage limits with the ChatGPT 5 Pro account?

by incomingpain

1 subcomments

Still waiting on codex cli to support lm studio.

by trilogic

1 subcomments

[flagged]

by brador

0 subcomment

Take off the guardrails and let humanity thrive.
It is inevitable.