AutoGaze Official Demo

Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing

📄 Paper       🌐 Project Website

Example Videos - Click Thumbnail to Load

Settings

0.01 1.35
0 1.5

What file formats are supported?

The app supports common video formats (MP4, AVI, MOV, etc.) and image formats (JPG, PNG, etc.).

What is the Gazing Ratio?

The gazing ratio explicitly controls how many patches the model looks at per frame. Higher values mean more patches are selected. The range extends to past 1.0 because of multi-scale gazing; if all patches at all scales are selected, the ratio can reach up to 1.35.

What is Task Loss Requirement?

This threshold determines when the model stops gazing at a frame, based on the predicted reconstruction loss from the current gazed patches. Lower = more gazing, higher = less gazing.

How do Gazing Ratio and Task Loss interact?

These two parameters separately control the number of gazed patches in an image/video. This demo will take the stricter of the two requirements when determining how many patches to gaze at. For example, if the gazing ratio suggests gazing at 15% of patches, but the task loss requirement is met after only 7% patches, then only 7% patches will be gazed at. To only use one of the two parameters, set the other to its maximum value.

Results

Ready