FRESH

Hacker News

GPT-5.1 for Developers

108 points by tedsanders

by dweekly

3 subcomments

A few hours of playing around and I'm suitably impressed.
Claude 4.5 Sonnet definitely struggles with Swift 6.2 Concurrency semantics and has several times gotten itself stuck rather badly. Additionally Claude Code has developed a number of bugs, including rapidly re-scrolling the terminal buffer, pegging local CPU to 100%, and consuming vast amounts of RAM. Codex CLI was woefully behind a few months ago and, despite overly conservative out-of-the-box sandbox settings, has quite caught up to Claude Code. (Gemini CLI is an altogether embarrassing experience, but Google did just put a solid PM behind it and 3.0 Pro should be out this month if we're lucky.)
Codex with 5.1 high managed to thoughtfully paw through the documentation and source code and - with a little help pulling down parts of the Swift Book - managed to correctly resolve the issue.
I remember getting the thread manager right being one of the harder parts of my operating systems course doing an undergrad in computer science; testing threaded programs has always been a challenge. It's a strange circle-of-life moment to realize that what was hard for undergrads also serves as a benchmark for coding agents!

by __jl__

1 subcomments

The prompt caching change is awesome for any agent. Claude is far behind with increased costs for caching and manual caching checkpoints. Certainly depends on your application but prompt caching is also ignored in a lot of cost comparisons.

by Tankenstein

0 subcomment

This is the first time since GPT 4.1 that I think I can upgrade our main agent model. Any noticeable amount of reasoning has been too slow for us, since the model is having a real-time conversation with the user. "minimal" reasoning GPT-5 performs terribly, it's significantly dumber than GPT 4.1 in a long, multi-turn conversation with tools.
This time, I just dropped it in and at first glance it seems to work well. I'll probably upgrade over the weekend if I see a boost in performance somewhere after tuning the prompts.

by sunaookami

2 subcomments

Man these names are so confusing and now reasoning_effort "minimal" was renamed to "none"? And the error message says only "medium" is supported?? Also the docs make no mention if gpt-5.1-chat-latest is included in the "free" offer (when having prompt sharing turned on). The popup says gpt-5.1 is included but not gpt-5.1-chat even though gpt-5-chat-latest is included. Why is it even called "chat" when it's official name is "Instant"? And what even IS the difference between gpt-5.1 and gpt-5.1-chat if both support reasoning_effort??

by jtrn

0 subcomment

by miohtama

2 subcomments

> On coding, we’ve worked closely with startups like Cursor, Cognition, Augment Code, Factory, and Warp to improve GPT‑5.1’s coding personality, steerability, and code quality.
Why no GitHub?

by felixbraun

0 subcomment

by gedy

1 subcomments

The "apply_patch" addition is nice, as have been struggling to get any AI API to correctly return diffs

by kevinkatzke

3 subcomments

This got only a single comment and 34 points in 3 hours. Crazy how the dynamics have changed around model releases in just a single year.

by OBELISK_ASI

0 subcomment

by sherinjosephroy

0 subcomment