Their communication is exceptional, too. Eric Traut (of Pyright fame) is all over the issues and PRs.
> Since then, the Responses API has evolved to support a special /responses/compact endpoint (opens in a new window) that performs compaction more efficiently. It returns a list of items (opens in a new window) that can be used in place of the previous input to continue the conversation while freeing up the context window. This list includes a special type=compaction item with an opaque encrypted_content item that preserves the model’s latent understanding of the original conversation. Now, Codex automatically uses this endpoint to compact the conversation when the auto_compact_limit (opens in a new window) is exceeded.
This helps preserve context over many turns, but it can also mean some context is lost between two related user turns.
A strategy that's helped me here, is having the model write progress updates (along with general plans/specs/debug/etc.) to markdown files, acting as a sort of "snapshot" that works across many context windows.
I use headless codex exec a lot, but struggles with its built-in telemetry support, which is insufficient for debugging and optimization.
Thus I made codex-plus (https://github.com/aperoc/codex-plus) for myself which provides a CLI entry point that mirrors the codex exec interface but is implemented on top of the TypeScript SDK (@openai/codex-sdk).
It exports the full session log to a remote OpenTelemetry collector after each run which can then be debugged and optimized through codex-plus-log-viewer.
However, I decided to try codex cli after hearing they rebuilt it from the ground up and used rust(instead of JS, not implying Rust==better). It's performance is quite literally insane, its UX is completely seamless. They even added small nice to haves like ctrl+left/right to skip your cursor to word boundaries.
If you haven't I genuinely think you should give it a try you'll be very surprised. Saw Theo(yc ping labs) talk about how open ai shouldn't have wasted their time optimizing the cli and made a better model or something. I highly disagree after using it.
Call the model. If it asks for a tool, run the tool and call again (with the new result appended). Otherwise, done
https://i.ytimg.com/vi/74U04h9hQ_s/maxresdefault.jpgWhy wouldnt they just expose the files directly? Having the model ask for them as regular files sounds a bit odd
The "Listen to article" media player at the top of the post -- was super quick to load on mobile but took two attempts and a page refresh to load on desktop.
If I want to listen as well as read the article ... the media player scrolls out of view along with the article title as we scroll down ..leaving us with no way to control (pause/play) the audio if needed.
There are no playback controls other than pause and speed selector. So we cannot seek or skip forward/backward if we miss a sentence. the time display on the media player is also minimal. Wish these were a more accessible standardized feature set available on demand and not limited by what the web designer of each site decides.
I asked "Claude on Chrome" extension to fix the media player to the top. It took 2 attempts to get it right. (It was using Haiku by default -- may be a larger model was needed for this task). I think there is scope to create a standard library for such client side tweaks to web pages -- sort of like greasemonkey user scripts but at a slightly higher level of abstraction with natural language prompts.
Or am I not understanding this right?
Claude Code is very effective. Opus is a solid model and claude very reliably solves problems and is generally efficient and doesn't get stuck in weird loops or go off in insane tangents too often. You can be very very efficient with claude code.
Gemini-cli is quite good. If you set `--model gemini-3-pro-preview` it is quite usable, but the flash model is absolute trash. Overall gemini-3-pro-preview is 'smarter' than opus, but the tooling here is not as good as claude code so it tends to get stuck in loops, or think for 5 minutes, or do weird extreme stuff. When Gemini is on point it is very very good, but it is inconsistent and likely to mess up so much that it's not as productive to use as claude.
Codex is trash. It is slow, tends to fail to solve problems, gets stuck in weird places, and sometimes has to puzzle on something simple for 15 minutes. The codex models are poor, and forcing the 5.2 model is expensive, and even then the tooling is incredibly bad and tends to just fail a lot. I check in to see if codex is any good from time to time and every time it is laughably bad compared to the other two.