I'm not a game developer myself, but some of my favorite games carry a deep sense of intentionality. For instance, there is typically not a single item misplaced in a FromSoftware game (or, for instance, Lies of P -- more recently). Almost every object is placed intentionally.
Games which lack this intentionality often feel dead in contrast. You run into experiences which break immersion, or pull you out of the experience that the developer is trying to convey to you.
It's difficult for me to imagine world models getting to a place where this sort of intentionality is captured. The best frontier LLMs fail to do this in writing (all the time), and even in code, and the surface of experiences for those mediums often feel "smaller" than the user interaction profile of a video game.
It's not clear how these world models could be used modularly by humans hoping to develop intentional experiences? I don't know much about their usage (LLMs are somewhat modular: they can produce text, humans can work on it, other LLMs can work on it). Is the same true for the video output here?
All this to say, I'm impressed with these world models, but similar to LLMs with writing, it's not really clear what it is that we are building towards? We are able to create less satisfying, less humane experiences faster? Perhaps the most immediate benefit is the ability for robotic systems to simulate actions (by conjuring a world, and imagining the implications).
In general, I have the feeling that we are hurtling towards a world with less intentionality behind all the things we experience. Everything becomes impersonal, more noisy, etc.
Everyone is right to be skeptical of this coming from a 2.8B model. Weights or it didn't happen.
> 720p, 1-min video generation with 6-DoF camera control
As nl said,
> The model is out here: https://huggingface.co/Efficient-Large-Model/SANA-Video_2B_7...
README says "intended for research use only"
Code license is Apache 2.0
Model license (nvidia open...) says
Models are commercially usable.
You are free to create and distribute Derivative Models
(As usual: model output is unrestricted, and also unprotectable absent human authoring)Also, will this run on RTX 4090 with 24GB memory?
Thank you!
There’s no doubt they’re technically impressive, but what does one do with it?
> A dedicated 17B long-video refiner sharpens texture, motion, and late-window quality on top of the long-rollout backbone.
I can't say I'm looking forward to an AI video future.
Seedance 2.0, Kling 3 are regarded the best closed source video models we have. I have subscribed to a few AI video subreddits, consensus atm is they are good for anything but long form videos with humans.
No surprises that we're very good at spotting even the most subtle differences while looking at other people.
EDIT> dont ask how I came up with this quote
It's honestly impressive, on the surface. The visuals are gorgeous... but it's still empty. What makes a "World" a world is precisely it's coherency. It's not about how it looks but rather how it "works". The plants in an ecosystems are a certain way because of the available resources, all the way to forces like gravity. It doesn't just "look" like that. To echo Konrad Lorenz a fish doesn't just swim in the water, rather the fish IS an efficient representation of the water it lives within. Here in such "worlds" there is nothing happening. There is minimal superficial coherence, no logic, nothing.
The ultimate liminal spaces.