Meta on Wednesday unveiled its new V-JEPA 2 AI model, a “world model” that is designed to help AI agents understand the world around them. V-JEPA 2 is an extension of the V-JEPA model that Meta released last year, which was trained on over one million hours of video. This training data is supposed to […]

Meta’s V-JEPA 2 model teaches AI to understand its surroundings


Meta on Wednesday unveiled its new V-JEPA 2 AI model, a “world model” that is designed to help AI agents understand the world around them.

V-JEPA 2 is an extension of the V-JEPA model that Meta released last year, which was trained on over one million hours of video. This training data is supposed to help robots or other AI agents operate in the physical world, understanding and predicting how concepts like gravity will impact what happens next in a sequence.

These are the kinds of common sense connections that small children and animals make as their brains develop — when you play fetch with a dog, for example, the dog will (hopefully) understand how bouncing a ball on the ground will cause it to rebound upward, or how it should run toward where it thinks the ball will land, and not where the ball is at that precise moment.

Meta depicts examples where a robot may be confronted with, for example, the point-of-view of holding a plate and a spatula and walking toward a stove with cooked eggs. The AI can predict that a very likely next action would be to use the spatula to move the eggs to the plate.

According to Meta, V-JEPA 2 is 30x faster than Nvidia’s Cosmos model, which also tries to enhance intelligence related to the physical world. However, Meta may be evaluating its own models according to different benchmarks than Nvidia.

“We believe world models will usher a new era for robotics, enabling real world AI agents to help with chores and physical tasks without needing astronomical amounts of robotic training data,” explained Meta’s Chief AI Scientist Yann LeCun in a video.