PerfectPhysics Challenge

A benchmark and challenge for physical reasoning and video prediction

Benchmark Description

PerfectPhysics is an evaluation suite which tests how well world models and video generations adhere to well-known physics constants and laws.

The benchmark includes curated physics experiments spanning from motion physics (e.g. gravity, projectile motion) to fluid dynamics (e.g. viscosity, fluid motion). Each scenario will have initial context frames and will require models to generate the future frames. Core physics constants such as gravitation acceleration, viscosity, friction, and more will then be estimated from generated videos. Each challenge will focus on a different physics experiment and will be evaluated on a different set of physics constants.

Submissions will be evaluated using an internal pipeline which differs depending on the experiment. For example, motion physics experiments will typically use a SAM-based segmentation mask to determine depth-calibrated position for accurate gravity estimation. The goal is to measure whether the video generation models respect physical constraints (such as free-fall gravitational acceleration, friction, viscosity, etc.) across the generated frames.

Current Challenge

Challenge 1: Free Fall

Our first challenge is focused on free fall physics. The videos depict various objects being released from rest and falling under the influence of gravity.

Videos will be evaluated by extracting object locations and estimating acceleration due to gravity.

How to Participate

1. Obtain the evaluation set: Download the PerfectPhysics input videos for the current week from HuggingFace.

2. Run your model: Generate future frames from the provided context frames. Depending on the input size of your model, you may choose to change the input resolution of the video. Just make sure that the output video is in the same aspect ratio and resolution, as our evaluations are calibrated according to our input video resolution and aspect ratio. We recommend zero-padding and downsampling/upsampling rather than cropping to ensure the best results.

3. Package outputs: Create a single compressed file containing your generated videos.

4. Submit: Upload your file via the submission form which will ask for your team name, affiliation (optional), contact email, as well as the framerate of your generated output video frames.

Results

We evaluate existing world models and image2video models on our benchmark (containing more videos than individual weekly challenges). We summarize our results below:

Video2Video models

Model Free-Fall Gravity Parabolic Motion Gravity Honey Viscosity Glycerine Viscosity
Ground Truth 9.81 m/s^2 9.81 m/s^2 13.61 Pa.s 1.44 Pa.s
Cosmos-Predict2.5 3.992 ± 3.622 m/s^2 4.760 ± 4.653 m/s^2 1.995 ± 1.192 Pa.s 1.264 ± 2.787 Pa.s
Cosmos-Predict2 Coming Soon Coming Soon Coming Soon Coming Soon

Image2Video models

Model Free-Fall Gravity Parabolic Motion Gravity Honey Viscosity Glycerine Viscosity
Ground Truth 9.81 m/s^2 9.81 m/s^2 13.61 Pa.s 1.44 Pa.s
Wan 2.2 -0.078 ± 1.097 m/s^2 0.386 ± 2.572 m/s^2 Coming Soon Coming Soon
Hunyuan World Coming Soon Coming Soon Coming Soon Coming Soon

Community Leaderboard

Team Model Free-Fall Gravity Parabolic Motion Gravity Honey Viscosity Glycerine Viscosity

Submission and Support

Use the link below to submit your results. After submission, our evaluation server will validate your files and compute metrics. You will receive a confirmation email with a tracking ID.

Open Submission Form

Contact Us

Questions about the benchmark or submission process? Reach out and we will get back to you promptly.

contactworldbench@gmail.com