Abnormal Inference Time and Repetitive Summary with Efficient-Large-Model/LongVILA-R1-7B on Specific Video Chunk

When using LongViLa-R1 for video summarization, I encountered an issue where one video chunk took an abnormally long time to process, resulting in a large summary with significant repetition.

Model: LongViLa-R1

Input Prompt: "Concise summary of the video."

Video: A video containing Traffic Accident scene.

Length: 80 seconds

Resolution: 1920x1080

FPS: 20

Chunking Settings:

Chunk Size: 10 seconds


I got 8 chunked videos, each having 10 seconds duration.

**Observed Behavior:**

1.**Chunk 3** exhibited an exceptionally high inference time of 523.76 seconds, whereas other chunks averaged around 6 seconds.

2.The summary generated for Chunk 3 was excessively long and contained numerous repeated sentences, failing to provide a concise summary as requested by the prompt.

This suggests a potential issue where the model gets stuck in a loop or encounters a specific type of content in a video chunk that causes a performance bottleneck and output generation failure.

## Per-Chunk Inference Log
Chunk 1: VLM inference time = 6.69 seconds

Chunk 2: VLM inference time = 5.09 seconds

Chunk 3: VLM inference time = **523.76 seconds**

Chunk 4: VLM inference time = 6.52 seconds

Chunk 5: VLM inference time = 6.52 seconds

Chunk 6: VLM inference time = 5.29 seconds

Chunk 7: VLM inference time = 6.43 seconds

Chunk 8: VLM inference time = 4.35 seconds

please check  [summary.txt](https://github.com/user-attachments/files/22002895/summary.txt) that contains all 8 chunks response

How can this issue be solved, and why is it occurring?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormal Inference Time and Repetitive Summary with Efficient-Large-Model/LongVILA-R1-7B on Specific Video Chunk #270

Per-Chunk Inference Log

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Abnormal Inference Time and Repetitive Summary with Efficient-Large-Model/LongVILA-R1-7B on Specific Video Chunk #270

Description

Per-Chunk Inference Log

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions