FRESH

Hacker News

Home

Addendum to GPT-5 system card: GPT-5-Codex

249 points by wertyk

by bayesianbot

9 subcomments

I've been extremely impressed (and actually had quite a good time) with GPT-5 and Codex so far. It seems to handle long context well, does a great job researching the code, never leaves things half-done (with long tasks it may leave some steps for later, but it never does 50% of a step and then just randomly mock a function like Gemini used to), and gives me good suggestions if I'm trying to do something I shouldn't. And the Codex CLI also seems to be getting constant, meaningful updates.

by simonw

0 subcomment

This should probably be merged with the other GPT-5-Codex thread at https://news.ycombinator.com/item?id=45252301 since nobody in this thread is talking about the system card addendum.

by hamish-b

2 subcomments

My problem _still_ with all of the codex/gpt based offerings is that they think for way too long. After using Claude 4 models through cursor max/ampcode I feel much more effective given it's speed. Ironically, Claude Code feels just as slow as codex/gpt (even with my company patching through AWS bedrock). Only makes me feel more that the consumer modes have perverse incentives.

by jumploops

2 subcomments

Interesting, the new model uses a different prompt in Codex CLI that's ~half the size (10KB vs. 23KB) of the previous prompt[0][1].
SWE-bench performance is similar to normal gpt-5, so it seems the main delta with `gpt-5-codex` is on code refactors (via internal refactor benchmark 33.9% -> 51.3%).
As someone who recently used Codex CLI (`gpt-5-high`) to do a relatively large refactor (multiple internal libs to dedicated packages), I kept running into bugs introduced when the model would delete a file and then rewrite it (missing crucial or important details). My approach would have been to just the copy the file over and then make package-specific changes, so maybe better tool calling is at play here.
Additionally, they claim the new model is more steerable (both with AGENTS.md and generally).
In my experience, Codex CLI w/gpt-5 is already a lot more steerable than Claude Code, but any improvements are welcome!
[0]https://github.com/openai/codex/blob/main/codex-rs/core/gpt_...
[1]https://github.com/openai/codex/blob/main/codex-rs/core/prom...
(comment reposted from other thread)

by foft

1 subcomments

I've had great results with Codex, though I found ChatGPT 5 was giving much better results than the existing model. So ended up using that directly instead. So very excited to have the model upgraded in Codex itself.
The main issues with Codex now seem to be the very poor stability (it seems to be down almost 50% of the time) and lack of custom containers. Hoping those get solved soon, particularly the stability.
I also wonder where the price will end up, it currently seems unsustainably cheap.

by 8cvor6j844qw_d6

2 subcomments

Anyone can share their thoughts on Claude Code vs Codex?
I've just started out trying out Claude Code and am not sure how Codex compares on React projects.
From my initial usage, it seems Claude Code planning mode is superior than its normal? mode, and giving it an overall direction to proceed and rather than just stating a desired feature seems to produce better results. It also does better if a large task are split into very small sub-tasks.

by zapnuk

0 subcomment

It would be nice if this model would be good enough to update their typscript sdk (+agents library) to use, or at least support, zod v4 - they still use v3.
Had to spend quite a long time to figure out a dependency error...

by ionwake

2 subcomments

Can someone explain what this all means? Has codex just been updated to use chat-gpt 5 ? Or is this just extra info?

by withinboredom

3 subcomments

Codex always appears to use spaces, even when the project uses tabs (aka, a Go file). It's so annoying.

by mindwok

0 subcomment

Codex with GPT-5-High is extremely good. Like many I was a bit "meh" about the GPT 5 release, however once I started using it with Codex it became clear there was a substantial improvement in a capability I wasn't really paying attention to, which is tool calling. Or more specifically, when to call a tool. Ask GPT-5-High a question about your codebase and watch the things it looks for, and things it searches for (if you use --search). It has very good taste on how to navigate and solve a problem.

by 6thbit

0 subcomment

Direct link to the pdf
https://cdn.openai.com/pdf/97cc5669-7a25-4e63-b15f-5fd5bdc4d...

by WhitneyLand

1 subcomments

Apparently today is the first release with MCP support.
Updates (v0.36) https://github.com/openai/codex/releases

by anshumankmr

0 subcomment

So is this a new model or just a different checkpoint for coding?

by hereme888

2 subcomments

Codex just ate up my remaining turns for the day for a clearly defined patch that should have taken just a few actions. Anyone else experienced that?

by esafak

0 subcomment

Does OpenAI demand biometrics to use GPT-5-Codex?

by sergiotapia

2 subcomments

I signed up to OpenAI, verified my identity, and added my credit card, bought $10 of credits.
But when I installed Codex and tried to make a simple code bugfix, I got rate limited nearly immediately. As in, after 3 "steps" the agent took.
Are you meant to only use Codex with their $200 "unlimited" plans? Thanks!

by Difwif

1 subcomments

Is this available to use now in Codex? Should I see a new /model?

by darkteflon

1 subcomments

Does Codex have token-hiding (cf Anthropic’s “subagents”)?
I was tempted to give Codex a try but a colleague was stung by their pricing. Apparently if you go over your Pro plan allocation, they just quietly and automatically start billing you per-token?

by bezzi

1 subcomments

is this model just acting super slow with you guys too?

by lvl155

1 subcomments

I think it would be cool to see *nix “emulation” integrated into coding AIs. I don’t think it’s necessary to run these agents inside of container as most people are right now. That’s a lot of overhead.

by tschellenbach

2 subcomments

is it already supported in cursor? don't see it just yet

by bionhoward

1 subcomments

Meh, what’s the point if it’s got no privacy, which companies want to let OpenAI read your codebase? Cursor keeps winning because of privacy mode IMHO, there is no level of capability which outweighs privacy mode

by curtisszmania

0 subcomment

[dead]