> The world is unpredictable. If you try to build a generative model that predicts every detail of the future, it will fail. JEPA is not generative AI. It is a system that learns to represent videos really well. The key is to learn an abstract representation of the world and make predictions in that abstract space, ignoring the details you can’t predict. That’s what JEPA does. It learns the underlying rules of the world from observation, like a baby learning about gravity. This is the foundation for common sense, and it’s the key to building truly intelligent systems that can reason and plan in the real world. The most exciting work so far on this is coming from academia, not the big industrial labs stuck in the LLM world.
I think anything less than that is just a parlor trick.