https://genai-showdown.specr.net/image-editing
Conclusions
- OpenAI has always had some of the strongest prompt understanding alongside the weakest image fidelity. This update goes some way towards addressing this weakness.
- It's leagues better at making localized edits without altering the entire image's aesthetic than gpt-image-1, doubling the previous score from 4/12 to 8/12 and the only model that legitimately passed the Giraffe prompt.
- It's one of the most steerable models with a 90% compliance rate
Updates to GenAI Showdown
- Added outtakes sections to each model's detailed report in the Text-to-Image category, showcasing notable failures and unexpected behaviors.
- New models have been added including REVE and Flux.2 Dev (a new locally hostable model).
- Finally got around to implementing a weighted scoring mechanism which considers pass/fail, quality, and compliance for a more holistic model evaluation (click pass/fail icon to toggle between scoring methods).
If you just want to compare gpt-image-1, gpt-image-1.5, and NB Pro at the same time:
https://genai-showdown.specr.net/image-editing?models=o4,nbp...
One curious case demoed here in the docs is the grid use case. Nano Banana Pro can also generate grids, but for NBP grid adherence to the prompt collapses after going higher than 4x4 (there's only a finite amount of output tokens to correspond to each subimage), so I'm curious that OpenAI started with a 6x6 case albeit the test prompt is not that nuanced.
Question: with copyright and authorship dead wrt AI, how do I make (at least) new content protected?
Anecdotal: I had a hobby of doing photos in quite rare style and lived in a place where you'd get quite a few pictures of. When I asked gpt to generate a picture of that are in that style, it returned highly modified, but recognizable copy of a photo I've published years ago.
Noticed it captured a megaman legends vibe ....
https://x.com/AgentifySH/status/2001037332770615302
and here it generated a texture map from a 3d character
https://x.com/AgentifySH/status/2001038516067672390/photo/1
however im not sure if these are true uv maps that is accurate as i dont have the 3d models itself
but ive tried this in nano banana when it first came out and it couldn't do it
I like this benchmark because its based upon user votes, so overfitting is not as easy (after all, if users prefer your result, you've won).
- Gemini/Nano did a pretty average job, only applying some grey to some of the panels. I tried a few different examples and got similar output.
- GPT did a great job and themed the whole app and made it look great. I think I'd still need a designer to finesse some things though.
We're seeing AI get better at both creative tasks (images) and operational tasks (clicking through websites).
For anyone building AI agents: the security model is still the hard part. Prompt injection remains unsolved even with dedicated security LLMs.
-The latency is still too high, lower than 10 seconds for nano banana and around 25 seconds for GPT image 1.5
-The quality is higher but not a jump like previous google models to Nano Banana Pro. Nano banana pro is still at least equivalently good or better in my opinion.
POST "https://api.openai.com/v1/responses": 500 Internal Server Error {
"message": "An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_******************* in your message.",
"type": "server_error",
"param": null,
"code": "server_error"
}
POST "https://api.openai.com/v1/responses": 400 Bad Request {
"message": "Invalid value: 'blah'. Supported values are: 'gpt-image-1' and 'gpt-image-1-mini'.",
"type": "invalid_request_error",
"param": "tools[0].model",
"code": "invalid_value"
}Some models are very strong at sharp details and localized edits, but they can break global lighting consistency — shadows, reflections, or overall scene illumination drift in subtle ways. GPT-Image seems to trade a bit of micro-detail for better global coherence, especially in lighting, which makes composites feel more believable even if they’re not pixel-perfect.
It’s hard to capture this in benchmarks, but for real-world editing workflows it ends up mattering more than I initially expected.
[1]: https://chatgpt.com/share/6941c96c-c160-8005-bea6-c809e58591...
They even linked to their Image Playground where it's also not available..
I updated my local playground to support it and I'm just handling the 404 on the model gracefully
What angle is there for second tier models? Could the future for OpenAI be providing a cheaper option when you don't need the best? It seems like that segment would also be dominated by the leading models.
I would imagine the future shakes out as: first class hosted models, hosted uncensored models, local models.
So, let's simulate that future. Since no one trusts your talent in coding, art or writing, you wouldn't care to do any of these. But the economy is built on the products and services which get their value based how much of human talent and effort is required to produce them.
So, the value of these services and products goes down as demand and trust goes down. No one knows or cares who is a good programmer in the team, who is great thinker and writer and who is a modern Picasso.
So, the motivation disappears for humans. There are no achievements to target, there is no way to impress others with your talent. This should lead to uniform workforce without much difference in talents. Pretty much a robot army.
That's still dangerously bad for the use-case they're proposing. We don't need better looking but completely wrong infographics.
Still fails. Every photo of a man with half gray hair will have the other half black.
> In the style of a 1970s book sci-fi novel cover: A spacer walks towards the frame. In the background his spaceship crashed on an icy remote planet. The sky behind is dark and full of stars.
Nano banana pro via gemini did really well, although still way too detailed, and it then made a mess of different decades when I asked it to follow up: https://gemini.google.com/share/1902c11fd755
It's therefore really disappointing that GPT-image 1.5 did this:
https://chatgpt.com/share/6941ed28-ed80-8000-b817-b174daa922...
Completely generic, not at all like a book cover, it completely ignored that part of the prompt while it focused on the other elements.
Did it get the other details right? Sure, maybe even better, but the important part it just ignored completely.
And it's doing even worse when I try to get it to correct the mistake. It's just repeating the same thing with more "weathering".
impressive stuff though - as you can give it a base image + prompt.
Not even one. And no one on the team said anything?
Come on Sam, do better.
Where is the image given along with the prompt? If I didn't miss it: Would have been nice to show the attached image.
I really hope everyone is starting to get disillusioned with OpenAI. They're just charging you more and more for what? Shitty images that are easy to sniff out?
In that case, I have a startup for you to invest in. Its a bridge-selling app.
Aren't we plagued enough by all the fake bullshit out there.
Ffs!
/rant
Sorry gotta be honest and blunt every one of those times...
Two women walking in single file
Although it tried very hard and had them staggered slightly