by minimaxir
3 subcomments
- Whisk itself (https://labs.google/fx/tools/whisk) was released a few months ago under the radar as a demo for Imagen 3 and it's actually fun to play with and surprisingly robust given its particular implementation.
It uses a prompt transmutation trick (convert the uploaded images into a textual description; can verify by viewing the description of the uploaded image) and the strength of Imagen 3's actually modern text encoder to be able to adhere to those long transmuted descriptions for Subject/Scene/Style.
by delichon
12 subcomments
- I think I would buy "yes" shares in a Polymarket event that predicts a motion picture created by a single person grossing more than $100M by 2027.
- This is amazing. I wouldn't think that something as computationally expensive as generating 8 second videos would be available outside of paid API anytime soon.
- I am not really technical in this domain, but why is everything text-to-X?
Wouldn't it be possible to draw a rough sketch of a terrain, drop a picture of the character, draw a 3D spline for the walk path, while having a traditional keyframe style editor, and give certain points some keyframe actions (like character A turns on his flashlight at frame 60) - in short, something that allows minute creative control just like current tools do?
- I burned through $48 in GCP credit making 12x 8-second videos in Veo2. Beware...
- Brave to make ads with the Ghibli style. Would have thought that's burned by now.
by ninininino
1 subcomments
- As usual with Gen AI the curated demo itself displays misunderstanding and failure to meet the prompt. In the "Glacial Cavern" demo, the "candy figures" are not within the ice walls but are in the foreground/center of the scene.
These things are great (I am not being sarcastic, I mean it when I say great) if and only if you don't actually care about all of your requirements being met, but if exactness matters they are mind-bogglingly frustrating because you'll get so close to what you want but some important detail is wrong.
- There's also Google Vids, also using Veo 2 under the hood. Product confusion :) https://workspace.google.com/products/vids/
- Content moderation is incredibly frustrating — it might even be the key reason why Veo2 and even Gemini could ultimately fail.
I just want to make some fun videos where my kid plays a superhero, but it keeps failing.
- this is semi-relevant -- and I do love how technically amazing this all is, but a massive caveat for someone who's been dabbling hard in this space, (images+video) -- I cannot emphasize enough how draining text-2-<whatever> is. even when a result comes out that's kind of cool, I feel nothing because it wasn't really me who did it.
I would say 97% of the time, the results are not what I want (and of course that's the case, it's just textual input) and so I change the text slightly, and a whole new thing comes out that is once again incorrect, and then I sit there for 5minutes while some new slop churns out of the slop factory. All of this back and forth drains not only my wallet/credits, but my patience and my soul. I really don't know how these "tools" are ever supposed to help creatives, short of generating short form ad content that few people really only want to work on anyway. So far the only products spawning from these tools are tiktok/general internet spam companies.
The closest thing that I've bumped into that actually feels like it empowers artists is https://github.com/Acly/krita-ai-diffusion that plugs into Krita and uses a combination of img2img with masking and txt2img. A slightly more rewarding feedback loop
by byearthithatius
0 subcomment
- Very impressive release compared to what was possible even a single year ago. It feels like we are in a great state right now with respect to ML where all the big companies are competing and pushing each other to make the tech better. This is rare nowadays in America (or in general).
- Pretty disappointed with content moderation on Veo2. Here are the steps I did:
1. Took a picture of me and asked to describe person in the image.
2. Used Imagegen to create the cartoon version using description.
3. Tried to use veo-2.0-generate-001 to generate video of person in image (holding a coffee cup in original image) drinking coffee and having a conversation.
Video generation is blocked by content moderation.
by snappyleads
0 subcomment
- I been waiting along time for this - how long before we get to the 30sec - 1 min milestone for video generation - why is it capped - is it hardware limitations or software?
- I wonder what takes more compute power: this or a blender render farm?
- is there a tool to generate AI videos that doesn't change the original picture so much?
Whisk redraws the entire thing and it barely resembles source picture.
- Two notes:
- is there a sly dig in there at Meta? Ice cream melting ... blue-suited hand
- the Ghibli style feels controversial
- The UI on this product page does not make any sense to me. The three prompt workflows don’t stack in any obvious way, then seemingly combine on any submission to the main prompt area?
They generate independent images.
Gemini’s web interface is also way behind chatgpt and Claude. The mobile app is even worse.
This is while having the champ 2.5 pro model in the pocket.
It seems that web product resources are not getting adequate allocation to the AI group(s).
by wewewedxfgdf
5 subcomments
- 1: Press release about amazing AI development.
2: "Try it now!" the release always says.
3: I go try it.
4: Doesn't work. In this case, I give it a prompt to make a video and literally nothing happens, it goes back to the prompt. In the case of the breathtakingly astonishing Gemini 2.5 Coding - attach to source code file to the prompt "file type not supported".
That's the pattern - I've come to expect it and was not disappointed with Google Gemini 2.5 coding nor with this video thing they are promoting here.
by anonzzzies
0 subcomment
- I have Advanced but no Veo2 model; is it controlled rollout or something again?
- evil technology
- [flagged]
- [flagged]
by transformi
0 subcomment
- [flagged]
by strangattractor
3 subcomments
- Google is the new Microsoft in the sense that they can Embrace, extend, and extinguish their competition. No matter what xAI or OpenAI or "anything"AI tries to build Google will eventually copy and destroy them at scale. AI (or A1 as our Secretary of Education calls it) is interesting because it is more difficult to protect the IP other than as trade secrets.