- Nano-Banana can produce some astonishing results. I maintain a comparison website for state-of-the-art image models with a very high focus on adherence across a wide variety of text-to-image prompts.
I recently finished putting together an Editing Comparison Showdown counterpart where the focus is still adherence but testing the ability to make localized edits of existing images using pure text prompts. It's currently comparing 6 multimodal models including Nano-Banana, Kontext Max, Qwen 20b, etc.
https://genai-showdown.specr.net/image-editing
Gemini Flash 2.5 leads with a score of 7 out of 12, but Kontext comes in at 5 out of 12 which is especially surprising considering you can run the Dev model of it locally.
- Amazing model. The only limit is your imagination, and it's only $0.04/image.
Since the page doesn't mention it, this is the Google Gemini Image Generation model:
https://ai.google.dev/gemini-api/docs/image-generation
Good collection of examples. Really weird to choose an inappropriate for work one as the second example.
- This is the first time I really don't understand how people are getting good results. On https://aistudio.google.com with Nano Banana selected (gemini-2.5-flash-image-preview) I get - garbage - results. I'll upload a character reference photo and a scene and ask Gemini to place the character in the scene. What it then does is to simply cut and paste the character into the scene, even if they are completely different in style, colours, etc.
I get far better results using ChatGPT for example. Of course, the character seldom looks anything like the reference, but it looks better than what I could do in paint in two minutes.
Am I using the wrong model, somehow??
by voidUpdate
4 subcomments
- Well it's good to see they are showcasing examples where the model really fails too.
- The second one in case 2 doesn't look anything like the reference map
- The face in case 5 changes completely despite the model being instructed to not do that
- Case 8 ignores the provided pose reference
- Case 9 changes the car positions
- Case 16 labels the tricuspid in the wrong place and I have no idea what a "mittic" is
- Case 27 shows the usual "models can't do text" though I'm not holding that against it too much
- Same with case 29, as well as the text that is readable not relating to the parts of the image it is referencing
- Case 33 just generated a generic football ground
- Case 37 has nonsensical labellings ("Define Jawline" attached to the eye)
- Case 58 has the usual "models don't understand what a wireframe is", but again I'm not holding that against it too much
Super nice to see how honest they are about the capabilities!
- I recently released a Python package for easily generating images with Nano Banana: https://github.com/minimaxir/gemimg
Through that testing, there is one prompt engineering trend that was consistent but controversial: both a) LLM-style prompt engineering with with Markdown-formated lists and b) old-school AI image style quality syntatic sugar such as award-winning and DSLR camera are both extremely effective with Gemini 2.5 Flash Image, due to its text encoder and larger training dataset which can now more accurately discriminate which specific image traits are present in an award-winning image and what traits aren't. I've tried generations both with and without those tricks and the tricks definitely have an impact. Google's developer documentation encourages the latter.
However, taking advantage of the 32k context window (compared to 512 for most other models) can make things interesting. It’s possible to render HTML as an image (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...) and providing highly nuanced JSON can allow for consistent generations. (https://github.com/minimaxir/gemimg/blob/main/docs/notebooks...)
- Unfortunately NSFW in parts. It might be insensitive to circulate the top URL in most US tech workplaces. For those venues, maybe you want to pick out isolated examples instead.
(Example: Half of Case 1 is an anime/manga maid-uniform woman lifting up front of skirt, and leaning back, to expose the crotch of underwear. That's the most questionable one I noticed. It's one of the first things a visitor to the top URL sees.)
- Personally, I'm underwhelmed by this model. I feel like these examples are cherry-picked. Here are some fails I've had:
- Given a face shot in direct sunlight with severe shadows, it would not remove the shadows
- Given an old black and white photo, it would not render the image in vibrant color as if taken with a modern DSLR camera. It will colorize the photo, but only with washed out, tinted colors
- When trying to reproduce the 3 x 3 grid of hair styles, it repeatedly created a 2x3 grid. Finally, it made a 3x3 grid, but one of the nine models was black instead of caucasian.
- It is unable to integrate real images into fabricated imagery. For example, when given an image of a tutu and asked to create an image of a dolphin flying over clouds wearing the tutu, the result looks like a crude photoshop snip and copy/paste job.
- This is amazing. Not that long ago, even getting a model to reliably output the same character multiple times was a real challenge. Now we’re seeing this level of composition and consistency. The pace of progress in generative models is wild.
Huge thanks to the author (and the many contributors) as well for gathering so many examples; it’s incredibly useful to see them to better understand the possibilities of the tool.
by mitthrowaway2
10 subcomments
- I've come to realize that I liked believing that there was something special about the human mental ability to use our mind's eye and visual imagination to picture something, such as how we would look with a different hairstyle. It's uncomfortable seeing that skill reproduced by machinery at the same level as my own imagination, or even better. It makes me feel like my ability to use my imagination is no more remarkable than my ability to hold a coat off the ground like a coat hook would.
- Nano banana is great. Been using it for creating coloring books based off photos for my son and friends’ kids: https://github.com/dbish/bespoke-books-ai-example
Does a pretty good job (most of the time) of sticking to the black and white coloring book style while still bringing in enough detail to recognize the original photo in the output.
by foobarbecue
1 subcomments
- Man, I hate this. It all looks so good, and it's all so incorrect. Take the heart diagram, for example. Lots of words that sort of sound cardiac but aren't ("ventricar," "mittic"), and some labels that ARE cardiac, but are in the wrong place. The scenes generated from topo maps look convincing, but they don't actually follow the topography correctly. I'm not looking forward to when search and rescue people start using this and plan routes that go off cliffs. Most people I know are too gullible to understand that this is a bullshit generator. This stuff is lethal and I'm very worried it will accelerate the rate at which the populace is getting stupider.
by rimmontrieu
3 subcomments
- Impressive examples but for GenAI it always comes down to the fact that you have to cherry pick the best result after so many fail attempts. Right now, it feels like they're pushing the narrative that ExpectedOutput = LLM(Prompt, Input) when it's actually ExpectedOutput = LLM(Prompt, Input) * Takes where Takes can vary from 1 to 100 or more
- I have two friends who are excellent professional graphic artists and I hesitate to send them this.
by twaldecker
2 subcomments
- One thing that couldn't be done is transparent background. The model just generates the pattern in the background. Not real alpha channel transparency. You can even see artifacts in the pattern.
by mustaphah
1 subcomments
- In a side-by-side comparison with GPT-4o [1], they are pretty much on par.
[1] https://github.com/JimmyLv/awesome-nano-banana
by throwaway2037
9 subcomments
- Does anyone else cringe when they see so many examples of sexualised young women? Literally, Case 1/B has a women lifting up her skirt to reveal her underwear. For an otherwise very impressive model, you are spoiling the PR with this kind of immature content. Sheesh. I guess that confirms it: I am a old grumpy man! I count 26 examples with young women, and 9 examples with men. The only thing missing was "Lena": https://en.wikipedia.org/wiki/Lenna
- While I think most of the examples are incredible...
...the technical graphics (especially text) is generally wrong. Case 16 is an annotated heart and the anatomy is nonsensical. Case 28 with the tallest buildings has the decent images, but has the wrong names, locations, and years.
- I'm furnishing a new apartment and Nano Banana has been super useful for placing furniture I want to purchase in rooms to make a judgment if things will work for us or not. Take a picture of the room, feed Nano Banana with that picture and the product picture and ask it to place it in the right location. It can even imagine things at night or even add lamps with lights on. Super useful!
by aussiegreenie
0 subcomment
- I am not very good with graphics. Yesterday, I used nano Banana to create an image for the front cover of a report. It took about 5 minutes. Normally, I would have spent at least an hour and still would not have gotten as good an image.
The Gemini models save me about an hour a day.
- I wish open source models would go this route of quality. Instead, every single release since and including flux dev have had some of the worst AI look I've seen so far. Sure these models might produce less mangled bodies, but in terms of actual aesthetics they lack behind even SD1.5 while needing >10x the amount of parameters.
- This is gonna be a golden age for creative prototyping and memes, and absolutely horrible for information quality and trustworthiness of content.
- In case 5: Photos of Yourself in Different Eras
The output just looks like a clearly different person. Its difficult to production-ize things that are inconsistent.
- So it seems like image generation/deepfake proliferation is pretty inevitable. I imagine we can't trust any image anymore (for e.g. identification verification purposes) unless it is done in person or otherwise notarized somehow. Is there a way (NFT-ish?) to "tag"/sign an image to say it was taken by an actual camera?
- After looking at Cases 4, 9, 23, 33, and 61, I think it might be suited to take in several wide-angle pictures or photospheres or such from inside a residence, and output a corresponding floor plan schematic.
If anyone has examples, guides, or anything to save me from pouring unnecessary funds into those API credits just to figure out how to feed it for this kind of task, I'd really appreciate sharing.
- In the AI image generation scene, is there anything solid yet in the way of generating vector illustrations for apps?
by smusamashah
0 subcomment
- It can print code output as image as well. I don't think it will work for complex logic though. https://x.com/smusamashah/status/1961081534661685392
- Cute. I love the “Not backed by [Y]” badge in one of the source images, sweet irony of being on HN’s front page.
Has anybody ever connected a 3D printer to such a machine’s output? Some of the action figures should definitely be 3D-printed.
- I didn't try it but I've seen really good results, is some innovation going on under the hood that we don't know? Is the technology the same of similar models? I can't find technical info on the internet
- Does the Nano Banana naming imply the existence of Regular Banana or even Mega Banana?
- GitHub should be so ashamed that the back button no longer works and jumps to top of page.
They should have learned what do in SPA 101
- The ability to pretty accurately keep the same image from an input is a clear sign of it's improved abilities.
- These actually look awesome, wonder if it can actually create nice isometric graphics for games
- I'm pretty sure these are cherry-picked out of many generation attempts, I tried a few basic things and it flat out refused to do many of them like turning a cartoon illustration into a real-world photographic portrait, it kept wanting to create a pixar style image, then when I used an ai generated portrait as an example, it refused with an error saying it wouldn't modify real world people...
I then tried to generate some multi-angle product shots from a single photo of an object, and it just refused to do the whole left, right, front, back thing, and kept doing things like a left, a front, another left, and weird half back/half side view combination.
Very frustrating.
- Some examples are mind blowing. It’s interesting if it can generate web/app designs
by darepublic
0 subcomment
- Am I wrong to think they have Google photos to thank for this
by ChrisArchitect
2 subcomments
- sigh
so many little details off when the instructions are clear and/or the details are there. Brad Pitt jeans? The result are not the same style and missing clear details which should be expected to just translate over.
Another one where the prompt ended with output in a 16:9 ratio. The image isn't in that ratio.
The results are visually something but then still need so much review. Can't trust the model. Can't trust people lazily using it. Someone mentioned something about 'net negative'.
- Computer graphics playing in my head and I like it! I don't support Technicolor parfaits and those snobby little petit fours that sit there uneaten, and my position on that is common knowledge to everyone in Oceania.
- Bytedance's Seedream seems to be giving it a run for its money:
https://www.youtube.com/watch?v=EdEn3aWHpO8
- Has AI generation of chest hair finally been solved? I think this is the first time I’ve seen a remotely realistic looking result.
by HeartStrings
0 subcomment
- Nano banana is actually a world model. It generates an entire world, then shows you the frame you need.
- wow. RIP midjourney.
by moralestapia
3 subcomments
- Wow, just amazing.
Is this model open? Open weights at least? Can you use it commercially?
- best post ever for me. thanks for sharing.
- The #1 most frustrating part of image models to me has always been their inability to keep the relevant details. Ask to change a hairstyle and you'd get a subtly different person
..guess that's solved now.. overnight. Mindblowing
- [dead]
- [dead]
- While these are incredibly good, it's sad to think about the unfathomable amount of abuse, spam, disinformation, manipulation and who know what other negatives these advancement are gonna cause. It was one thing when you could spot an AI image, but now and moving forward it's be basically increasingly futile to even try.
Almost all "human" interaction online will be subject to doubt soon enough.
Hard to be cheerful when technology will be a net negative overall even if it benefits some.
by jacobjjacob
2 subcomments
- Does the first example really need to be some softcore weeb p*rn?
by flysonic10
0 subcomment
- I added some of these examples into my Nanna Banana image generator: https://nannabanana.ai