FRESH

Hacker News

Home

GPT-4.1 in the API

678 points by maheshrijal

by lxgr

16 subcomments

As a ChatGPT user, I'm weirdly happy that it's not available there yet. I already have to make a conscious choice between
- 4o (can search the web, use Canvas, evaluate Python server-side, generate images, but has no chain of thought)
- o3-mini (web search, CoT, canvas, but no image generation)
- o1 (CoT, maybe better than o3, but no canvas or web search and also no images)
- Deep Research (very powerful, but I have only 10 attempts per month, so I end up using roughly zero)
- 4.5 (better in creative writing, and probably warmer sound thanks to being vinyl based and using analog tube amplifiers, but slower and request limited, and I don't even know which of the other features it supports)
- 4o "with scheduled tasks" (why on earth is that a model and not a tool that the other models can use!?)
Why do I have to figure all of this out myself?

by modeless

8 subcomments

Numbers for SWE-bench Verified, Aider Polyglot, cost per million output tokens, output tokens per second, and knowledge cutoff month/year:
```
             SWE  Aider Cost Fast Fresh
 Claude 3.7  70%  65%   $15  77   8/24
 Gemini 2.5  64%  69%   $10  200  1/25
 GPT-4.1     55%  53%   $8   169  6/24
 DeepSeek R1 49%  57%   $2.2 22   7/24
 Grok 3 Beta ?    53%   $15  ?    11/24
```
I'm not sure this is really an apples-to-apples comparison as it may involve different test scaffolding and levels of "thinking". Tokens per second numbers are from here: https://artificialanalysis.ai/models/gpt-4o-chatgpt-03-25/pr... and I'm assuming 4.1 is the speed of 4o given the "latency" graph in the article putting them at the same latency.
Is it available in Cursor yet?

by swyx

8 subcomments

don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for:
- telling the model to be persistent (+20%)
- dont self-inject/parse toolcalls (+2%)
- prompted planning (+4%)
- JSON BAD - use XML or arxiv 2406.13121 (GDM format)
- put instructions + user query at TOP -and- BOTTOM - bottom-only is VERY BAD
- no evidence that ALL CAPS or Bribes or Tips or threats to grandma work
source: https://cookbook.openai.com/examples/gpt4-1_prompting_guide#...

by omneity

2 subcomments

I have been trying GPT-4.1 for a few hours by now through Cursor on a fairly complicated code base. For reference, my gold standard for a coding agent is Claude Sonnet 3.7 despite its tendency to diverge and lose focus.
My take aways:
- This is the first model from OpenAI that feels relatively agentic to me (o3-mini sucks at tool use, 4o just sucks). It seems to be able to piece together several tools to reach the desired goal and follows a roughly coherent plan.
- There is still more work to do here. Despite OpenAI's cookbook[0] and some prompt engineering on my side, GPT-4.1 stops quickly to ask questions, getting into a quite useless "convo mode". Its tool calls fails way too often as well in my opinion.
- It's also able to handle significantly less complexity than Claude, resulting in some comical failures. Where Claude would create server endpoints, frontend components and routes and connect the two, GPT-4.1 creates simplistic UI that calls a mock API despite explicit instructions. When prompted to fix it, it went haywire and couldn't handle the multiple scopes involved in that test app.
- With that said, within all these parameters, it's much less unnerving than Claude and it sticks to the request, as long as the request is not too complex.
My conclusion: I like it, and totally see where it shines, narrow targeted work, adding to Claude 3.7 - for creative work, and Gemini 2.5 Pro for deep complex tasks. GPT-4.1 does feel like a smaller model compared to these last two, but maybe I just need to use it for longer.
0: https://cookbook.openai.com/examples/gpt4-1_prompting_guide

by marsh_mellow

4 subcomments

From OpenAI's announcement:
> Qodo tested GPT‑4.1 head-to-head against Claude Sonnet 3.7 on generating high-quality code reviews from GitHub pull requests. Across 200 real-world pull requests with the same prompts and conditions, they found that GPT‑4.1 produced the better suggestion in 55% of cases. Notably, they found that GPT‑4.1 excels at both precision (knowing when not to make suggestions) and comprehensiveness (providing thorough analysis when warranted).
https://www.qodo.ai/blog/benchmarked-gpt-4-1/

by pbmango

2 subcomments

I think an under appreciated reality is that all of the large AI labs and OpenAI in particular are fighting multiple market battles at once. This is coming across in both the number of products and the packaging.
1, to win consumer growth they have continued to benefit on hyper viral moments, lately that was was image generation in 4o, which likely was technically possible a long time before launched. 2, for enterprise workloads and large API use, they seem to have focused less lately but the pricing of 4.1 is clearly an answer to Gemini which has been winning on ultra high volume and consistency. 3, for full frontier benchmarks they pushed out 4.5 to stay SOTA and attract the best researchers. 4, on top of all they they had to, and did, quickly answer the reasoning promise and DeepSeek threat with faster and cheaper o models.
They are still winning many of these battles but history highlights how hard multi front warfare is, at least for teams of humans.

by simonw

2 subcomments

Here's a summary of this Hacker News thread created by GPT-4.1 (the full sized model) when the conversation hit 164 comments: https://gist.github.com/simonw/93b2a67a54667ac46a247e7c5a2fe...
I think it did very well - it's clearly good at instruction following.
Total token cost: 11,758 input, 2,743 output = 4.546 cents.
Same experiment run with GPT-4.1 mini: https://gist.github.com/simonw/325e6e5e63d449cc5394e92b8f2a3... (0.8802 cents)
And GPT-4.1 nano: https://gist.github.com/simonw/1d19f034edf285a788245b7b08734... (0.2018 cents)

by elashri

7 subcomments

Are there any benchmarks or someone who did tests of performance of using this long max token models in scenarios where you actually use more of this token limit?
I found from my experience with Gemini models that after ~200k that the quality drops and that it basically doesn't keep track of things. But I don't have any numbers or systematic study of this behavior.
I think all providers who announce increased max token limit should address that. Because I don't think it is useful to just say that max allowed tokens are 1M when you basically cannot use anything near that in practice.

by minimaxir

0 subcomment

It's not the point of the announcement, but I do like the use of the (abs) subscript to demonstrate the improvement in LLM performance since in these types of benchmark descriptions I never can tell if the percentage increase is absolute or relative.

by 999900000999

5 subcomments

Have they implemented "I don't know" yet.
I probably spend 100$ a month on AI coding, and it's great at small straightforward tasks.
Drop it into a larger codebase and it'll get confused. Even if the same tool built it in the first place due to context limits.
Then again, the way things are rapidly improving I suspect I can wait 6 months and they'll have a model that can do what I want.

by taikahessu

3 subcomments

> They feature a refreshed knowledge cutoff of June 2024.
As opposed to Gemini 2.5 Pro having cutoff of Jan 2025.
Honestly this feels underwhelming and surprising. Especially if you're coding with frameworks with breaking changes, this can hurt you.

by runako

2 subcomments

ChatGPT currently recommends I use o3-mini-high ("great at coding and logic") when I start a code conversation with 4o.
I don't understand why the comparison in the announcement talks so much about comparing with 4o's coding abilities to 4.1. Wouldn't the relevant comparison be to o3-mini-high?
4.1 costs a lot more than o3-mini-high, so this seems like a pertinent thing for them to have addressed here. Maybe I am misunderstanding the relationship between the models?

by comex

8 subcomments

Sam Altman wrote in February that GPT-4.5 would be "our last non-chain-of-thought model" [1], but GPT-4.1 also does not have internal chain-of-thought [2].
It seems like OpenAI keeps changing its plans. Deprecating GPT-4.5 less than 2 months after introducing it also seems unlikely to be the original plan. Changing plans is necessarily a bad thing, but I wonder why.
Did they not expect this model to turn out as well as it did?
[1] https://x.com/sama/status/1889755723078443244
[2] https://github.com/openai/openai-cookbook/blob/6a47d53c967a0...

by vinhnx

0 subcomment

• Flagship GPT-4.1: top‑tier intelligence, full endpoints & premium features
• GPT-4.1-mini: balances performance, speed & cost
• GPT-4.1-nano: prioritizes throughput & low cost with streamlined capabilities
All share a 1 million‑token context window (vs 120–200k on 4o-o3/o1), excelling in instruction following, tool calls & coding.
Benchmarks vs prior models:
• AIME ’24: 48.1% vs 13.1% (~3.7× gain)
• MMLU: 90.2% vs 85.7% (+4.5 pp)
• Video‑MME: 72.0% vs 65.3% (+6.7 pp)
• SWE‑bench Verified: 54.6% vs 33.2% (+21.4 pp)

by kristianp

0 subcomment

Looks like the Quasar and Optimus stealth models on Openrouter were in fact GPT-4.1. This is what I get when I try to access the openrouter/optimus-alpha model now:

    {"error":
        {"message":"Quasar and Optimus were stealth models, and 
        revealed on April 14th as early testing versions of GPT 4.1. 
        Check it out: https://openrouter.ai/openai/gpt-4.1","code":404}

by ZeroCool2u

3 subcomments

No benchmark comparisons to other models, especially Gemini 2.5 Pro, is telling.

by osigurdson

2 subcomments

Sam made a strange statement imo in a recent Ted Talk. He said (something like) models come and go but they want to be the best platform.
For me, it was jaw dropping. Perhaps he didn't mean it the way it sounded, but seemed like a major shift to me.

by clbrmbr

0 subcomment

The deprecation of GPT-4.5 makes me sad. It's an amazing model with great world-knowledge and subtly. It KNOWS THINGS that, on a quick experiment, 4.1 just does not. 4.5 could tell me what I would see from a random street corner in New Jersey, or how to use minor features of my niche API (well, almost), and it could write remarkably. But 4.1 doesn't hold a candle to it. Please, continue to charge me $150/1M tokens. Sometimes you need a Big Model. Tells me it was costing more than $150/1M to serve (!).

by miki123211

1 subcomments

Most of the improvements in this model, basically everything except the longer context, image understanding and better pricing, are basically things that reinforcement learning (without human feedback) should be good at.
Getting better at code is something you can verify automatically, same for diff formats and custom response formats. Instruction following is also either automatically verifiable, or can be verified via LLM as a judge.
I strongly suspect that this model is a GPT-4.5 (or GPT-5???) distill, with the traditional pretrain -> SFT -> RLHF pipeline augmented with an RLVR stage, as described in Lambert et al[1], and a bunch of boring technical infrastructure improvements sprinkled on top.
[1] https://arxiv.org/abs/2411.15124

by muzani

1 subcomments

The real news for me is GPT 4.5 being deprecated and the creativity is being brought to "future models" and not 4.1. 4.5 was okay in many ways but it was absolutely a genius in production for creative writing. 4o writes like a skilled human, but 4.5 can actually write a 10 minute scene that gives me goosebumps. I think it's the context window that allows for it to actually build up scenes to hammer it down much later.

by Tiberium

4 subcomments

Very important note:
>Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version
If anyone here doesn't know, OpenAI does offer the ChatGPT model version in the API as chatgpt-4o-latest, but it's bad because they continuously update it so businesses can't reliably rely on it being stable, that's why OpenAI made GPT 4.1.

by sharkjacobs

2 subcomments

    > You're eligible for free daily usage on traffic shared with OpenAI through April 30, 2025.
    > Up to 1 million tokens per day across gpt-4.5-preview, gpt-4.1, gpt-4o and o1
    > Up to 10 million tokens per day across gpt-4.1-mini, gpt-4.1-nano, gpt-4o-mini, o1-mini and o3-mini
    > Usage beyond these limits, as well as usage for other models, will be billed at standard rates. Some limitations apply.

I just found this option in https://platform.openai.com/settings/organization/data-contr...

Is just this something I haven't noticed before? Or is this new?

by NewUser76312

1 subcomments

As a user I'm getting so confused as to what's the "best" for various categories. I don't have time/want to dig into benchmarks for different categories, look into the example data to see which best maps onto my current problems.
The graphs presented don't even show a clear winner across all categories. The one with the biggest "number", GPT-4.5, isn't even in the best in most categories, actually it's like 3rd in a lot of them.
This is quite confusing as a user.
Otherwise big fan of OAI products thus far. I keep paying $20/mo, they keep improving across the board.

by nikcub

1 subcomments

Easy to miss in the announcement that 4.5 is being shut down
> GPT‑4.5 Preview will be turned off in three months, on July 14, 2025

by frognumber

1 subcomments

Marginally on-topic: I'd love if the charts included prior models, including GPT 4 and 3.5.
Not all systems upgrade every few months. A major question is when we reach step-improvements in performance warranting a re-eval, redesign of prompts, etc.
There's a small bleeding edge, and a much larger number of followers.

by theturtletalks

3 subcomments

With these being 1M context size, does that all but confirm that Quasar Alpha and Optimus Alpha were cloaked OpenAI models on OpenRouter?

by pcwelder

1 subcomments

Did some quick tests. I believe its the same model as Quasar. It struggles with agentic loop [1]. You'd have to force it to do tool calls.
Tool use ability feels ability better than gemini-2.5-pro-exp [2] which struggles with JSON schema understanding sometimes.
Llama 4 has suprising agentic capabilities, better than both of them [3] but isn't as intelligent as the others.
[1] https://github.com/rusiaaman/chat.md/blob/main/samples/4.1/t...
[2] https://github.com/rusiaaman/chat.md/blob/main/samples/gemin...
[3] https://github.com/rusiaaman/chat.md/blob/main/samples/llama...

by impure

3 subcomments

I like how Nano matches Gemini 2.0 Flash's price. That will help drive down prices which will be good for my app. However I don't like how Nano behaves worse than 4o Mini in some benchmarks. Maybe it will be good enough, we'll see.

by exizt88

0 subcomment

For conversational AI, the most significant part is GPT-4.1 mini being 2x faster than GPT-4o at basically the same reasoning capabilities.

by porphyra

2 subcomments

pretty wild versioning that GPT 4.1 is newer and better in many regards than GPT 4.5.

by oofbaroomf

4 subcomments

I'm not really bullish on OpenAI. Why would they only compare with their own models? The only explanation could be that they aren't as competitive with other labs as they were before.

by jmkni

0 subcomment

The increased context length is interesting.
It would be incredible to be able to feed an entire codebase into a model and say "add this feature" or "we're having a bug where X is happening, tell me why", but then you are limited by the output token length
As others have pointed out too, the more tokens you use, the less accuracy you get and the more it gets confused, I've noticed this too
We are a ways away yet from being able to input an entire codebase, and have it give you back an updated version of that codebase.

by starchild3001

0 subcomment

I feel there's some "benchmark-hacking" is going on with GPT4.1 model as its metrics on livebench.com aren't all that exciting.
- It's basically GPT4o level on average.
- More optimized for coding, but slightly inferior in other areas.
It seems to be a better model than 4o for coding tasks, but I'm not sure if it will replace the current leaders -- Gemini 2.5 Pro, o3-mini / o1, Claude 3.7/3.5.

by elAhmo

0 subcomment

Company worth hundreds of billions of dollars, on paper at least, has one of the worst naming schemes for their products in the recent history.
Sam acknowledged this a few months ago, but with another release not really bringing any clarity, this is getting ridiculous now.

by lsaferite

0 subcomment

Is there an API endpoint at OpenAI that gives the information on this page as structured data?
https://platform.openai.com/docs/models/gpt-4.1
As far as I can tell there's no way to discover the details of a model via the API right now.
Given the announced adoption of MCP and MCP's ability to perform model selection for Sampling based on a ranking for speed and intelligence, it would be great to have a model discovery endpoint that came with all the details on that page.

by ComputerGuru

0 subcomment

The benchmarks and charts they have up are frustrating because they don’t include 03-mini(-high) which they’ve been pushing as the low-latency+low-cost smart model to use for coding challenges instead of 4o and 4o-mini. Why won’t they include that in the charts?

by bartkappenburg

0 subcomment

By leaving out scale or prior models they are effectively manipulating improvement. If from 3 to 4 it was from 10 to 80, and from 4 to 4o it was 80 to 82, leaving out 3 would let us see a steep line instead of steep decrease of growth.
Lies, damn lies and statistics ;-)

by asdev

3 subcomments

> We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency.
why would they deprecate when it's the better model? too expensive?

by XCSme

2 subcomments

I tried 4.1-mini and 4.1-nano. The response are a lot faster, but for my use-case they seem to be a lot worse than 4o-mini(they fail to complete the task when 4o-mini could do it). Maybe I have to update my prompts...

by Ninjinka

0 subcomment

I've been using it in Cursor for the past few hours and prefer it to Sonnet 3.7. It's much faster and doesn't seem to make the sort of stupid mistakes Sonnet has been making recently.

by thund

0 subcomment

Hey OpenAI if you ever need a Version Engineer, I’m available.

by wongarsu

0 subcomment

Is the version number a retcon of 4.5? On OpenAI's models page the names appear completely reasonable [1]: The o1 and o3 reasoning models, and non-reasoning there is 3.5, 4, 4o and 4.1 (let's pretend 4o makes sense). But that is only reasonable as long as we pretend 4.5 never happened, which the models page apparently does
1: https://platform.openai.com/docs/models

by composableaide

0 subcomment

Excited to see 4.1 in the API. The Nano model pricing is comparable to Gemini Flash but not where we would like it to be: https://composableai.de/openai-veroeffentlicht-4-1-nano-als-...

0 subcomment

by nsoonhui

1 subcomments

  We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency. GPT‑4.5 Preview will be turned off in three months

Here's something I just don't understand, how can ChatGPT 4.5 be worse than 4.1? Or the only thing bad is that the OpenAI naming ability?

by neal_

0 subcomment

The better the benchmarks, the worse the model is. Subjectively for me the more advanced models dont follow instructions, and are less capable of implementing features or building stuff. I could not tell a difference in blind testing SOTA models gemini, claude, openai, deepseek. There has been no major improvements in the LLM space since the original models gained popularity. Each release claims to be much better the last, and every time i have been disappointed and think this is worse.
First it was the models stopped putting in effort and felt lazy, tell it to do something and it will tell you to do it your self. Now its the opposite and the models go ham changing everything they see, instead of changing one line, SOTA models rather rewrite the whole project and still not fix the issue.
Two years back I totally thought these models are amazing. I always would test out the newest models and would get hyped up about it. Every problem i had i thought if i just prompt it differently I can get it to solve this. Often times i have spent hours prompting starting new chats, adding more context. Now i realize its kinda useless and its better to just accept the models where they are, rather then try and make them a one stop shop, or try to stretch capabilities.
I think this release I won’t even test it out, im not interested anymore. I’ll probably just continue using deepseek free, and gemini free. I canceled my openai subscription like 6 months ago, and canceled claude after 3.7 disappointment.

by forbiddenvoid

2 subcomments

Lots of improvements here (hopefully), but still no image generation updates, which is what I'm most eager for right now.

by sc077y

0 subcomment

I'm wondering if one of the big reasons that OpenAI is making gpt-4.5 deprecated is not only because it's not cost-effective to host but because they don't want their parent model being used to train competitors' models (like deepseek).

by flakiness

0 subcomment

Big focus on coding. It feels like a defensive move against Claude (and more recently, Gemini Pro) which became very popular in that regime. I guess they recently figured out some ways to train the model for these "agentic" coding through RL or something - and the finding is too new to apply 4.5 on time.

by intended

1 subcomments

If reasoning models are any good, then can they figure out overpowered builds for poe2?
Wait, wouldn’t this be a decent test for reasoning ?
Every patch changes things, and there’s massive complexity with the various interactions between items, uniques, runes, and more.

by asdev

6 subcomments

it's worse than 4.5 on nearly every benchmark. just an incremental improvement. AI is slowing down

by esafak

0 subcomment

More information here:

  https://platform.openai.com/docs/models/gpt-4.1
  https://platform.openai.com/docs/models/gpt-4.1-mini
  https://platform.openai.com/docs/models/gpt-4.1-nano

by rvz

1 subcomments

The big change about this announcement is the 1M context window on all models.
But the price is what matters.

by growt

1 subcomments

My theory: they need to move off the 4o version number before releasing o4-mini next week or so.

by tdehnke

0 subcomment

I just wish they would start using human friendly names for them, and use a YY.rev version number so it's easier to know how new/old something is.
Broad Knowledge 25.1 Coder: Larger Problems 25.1 Coder: Line focused 25.1

by yberreby

2 subcomments

> Note that GPT‑4.1 will only be available via the API. In ChatGPT, many of the improvements in instruction following, coding, and intelligence have been gradually incorporated into the latest version (opens in a new window) of GPT‑4o, and we will continue to incorporate more with future releases.
The lack of availability in ChatGPT is disappointing, and they're playing on ambiguity here. They are framing this as if it were unnecessary to release 4.1 on ChatGPT, since 4o is apparently great, while simultaneously showing how much better 4.1 is relative to GPT-4o.
One wager is that the inference cost is significantly higher for 4.1 than for 4o, and that they expect most ChatGPT users not to notice a marginal difference in output quality. API users, however, will notice. Alternatively, 4o might have been aggressively tuned to be conversational while 4.1 is more "neutral"? I wonder.

by aitchnyu

0 subcomment

I'm using models which scored at least 50% in Aider leaderboard but I'm micromanaging 50 line changes instead of being more vibe. Is it worth experimenting with a model that didnt crack 10%?

by Aeroi

0 subcomment

The user shoudn't have to research which model is the best for them. OpenAI needs to do a better job in UX and putting the best model forward in chatgpt.

by gcy

2 subcomments

4.10 > 4.5 — @stevenheidel
@sama: underrated tweet
Source: https://x.com/stevenheidel/status/1911833398588719274

by vzaliva

0 subcomment

They continue to baffle users with their version numbering. Intiutively 4.5 is newer/better than 4.1 and perhaps 4o but of course this is not the case.

by archeantus

2 subcomments

“GPT‑4.1 scores 54.6% on SWE-bench Verified, improving by 21.4%abs over GPT‑4o and 26.6%abs over GPT‑4.5—making it a leading model for coding.”
4.1 is 26.6% better at coding than 4.5. Got it. Also…see the em dash

by meetpateltech

3 subcomments

GPT-4.1 Pricing (per 1M tokens):
gpt-4.1
- Input: $2.00
- Cached Input: $0.50
- Output: $8.00
gpt-4.1-mini
- Input: $0.40
- Cached Input: $0.10
- Output: $1.60
gpt-4.1-nano
- Input: $0.10
- Cached Input: $0.025
- Output: $0.40

by user14159265

1 subcomments

And it is available at https://t3.chat/ (as well as claude, grok, gemini etc) for 8usd/month

by codingwagie

9 subcomments

GPT-4.1 probably is a distilled version of GPT-4.5
I dont understand the constant complaining about naming conventions. The number system differentiates the models based on capability, any other method would not do that. After ten models with random names like "gemini", "nebula" you would have no idea which is which. Its a low IQ take. You dont name new versions of software as completely different software
Also, Yesterday, using v0, I replicated a full nextjs UI copying a major saas player. No backend integration, but the design and UX were stunning, and better than I could do if I tried. I have 15 years of backend experience at FAANG. Software will get automated, and it already is, people just havent figured it out yet

by sandspar

0 subcomment

Is this correct: OpenAI will sequester 4.1 in the API permanently? And, since November 2024, they've already wrapped much of 4.1's features into ChatGPT 4o?

by msp26

0 subcomment

I was hoping for native image gen in the API but better pricing is always appreciated.
Gemini was drastically cheaper for image/video analysis, I'll have to see how 4.1 mini and nano compare.

by pcwelder

2 subcomments

Can someone explain to me why we should take Aider's polyglot benchmark seriously?
All the solutions are already available on the internet on which various models are trained, albeit in various ratios.
Any variance could likely be due to the mix of the data.

by elias_t

1 subcomments

Does someone have the benchmarks compared to other models?

by htrp

0 subcomment

anyone want to guess parameter sizes here for
GPT‑4.1, GPT‑4.1 mini GPT‑4.1 nano
I'll start with
800 bn MoE (probably 120 bn activated), 200 bn MoE (33 bn activated), and 7bn parameter for nano

by furyofantares

0 subcomment

It's another Daft Punk day. Change a string in your program* and it's better, faster, cheaper: pick 3.
*Then fix all your prompts over the next two weeks.

by lich-001

0 subcomment

I wish they would deprecate all existing ones when they bake a new model instead of aiming for pointless model diversity.

by croemer

1 subcomments

Testing against unspecified other "leading" models allows for shenanigangs:
> Qodo tested GPT‑4.1 head-to-head against other leading models [...] they found that GPT‑4.1 produced the better suggestion in 55% of cases
The linked blog post goes 404: https://www.qodo.ai/blog/benchmarked-gpt-4-1/

by __mharrison__

1 subcomments

I know this is somewhat off topic, but can someone explain the naming convention used by OpenAI? Number vs "mini" vs "o" vs "turbo" vs "chat"?

by bli940505

0 subcomment

Does this mean that the o1 and o3-mini models are also using 4.1 as the base now?

by simianwords

2 subcomments

Could any one guess the reason as to why they didn't ship this in the chat UI?

by soheil

0 subcomment

Main takeaways:
- Coding accuracy improved dramatically
- Handles 1M-token context reliably
- Much stronger instruction following

0 subcomment

by p1dda

0 subcomment

LLMs are not intelligent

by LeicaLatte

0 subcomment

i've recently set claude 3.7 as the default option for customers when they start new chats in my app. this was a recent change, and i'm feeling good about it. supporting multiple providers can be a nightmare for customer service, especially when it comes to billing and handling response quality queries. with so many choices from just one provider, it simplifies things significantly. curious about how openai manages customer service internally.

by yieldcrv

0 subcomment

More season 4’s than attack on titan

by i_love_retros

0 subcomment

I feel overwhelmed

by bbstats

0 subcomment

by polytely

3 subcomments

It seems that OpenAI is really differentiating itself in the AI market by developing the most incomprehensible product names in the history of software.

by oidar

3 subcomments

I need an AI to understand the naming conventions that OpenAI is using.

by T3uZr5Fg

0 subcomment

[dead]

by bakugo

2 subcomments

> We will also begin deprecating GPT‑4.5 Preview in the API, as GPT‑4.1 offers improved or similar performance on many key capabilities at much lower cost and latency. GPT‑4.5 Preview will be turned off in three months, on July 14, 2025, to allow time for developers to transition.
Well, that didn't last long.

by j_maffe

1 subcomments

OAI are so ahead of the competition, they don't need to compare with the competition anymore /s

by curtisszmania

0 subcomment

[dead]

by Yoplaid

0 subcomment

[dead]

1 subcomments

by pastureofplenty

0 subcomment

The plagiarism machine got an update! Yay!