Precision Sport Year 3 Project System

This document is the complete system description for the Precision Sport Science Project (Year 3), split into System Development and Model Design.

System Development

Entry and Page Structure

Entry file: app.py (Streamlit multipage)
Main pages:
- project_page/Home.py
- project_page/CoLab_project.py
- project_page/Project_2nd.py
- project_page/project_3rd.py (key workflow for this project)

Quick Start

1) Install environment

pip install -r requirements.txt

2) Basic launch (UI only)

streamlit run app.py

3) GPU / inference services (optional)

If you want to use VQA / feature extraction / DVCA backend inference, start the worker services first:

bash lanuch_server.sh

After the UI starts, go to the project_3rd page to upload videos and run VQA and Highlight.

Year 3 Project Page: Key Workflow

File location: project_page/project_3rd.py

Feature overview

Select built-in videos or upload a local mp4
Sidebar shows the current video
Main content tabs:
- Badminton Tactic Analyst
- Badminton Highlight Director

Video sources

Built-in videos: selected from the GAME_INFO dictionary
Upload your own: use st.file_uploader to upload .mp4
- Uploaded video is temporarily stored in tmp/

Main Outputs and Temp Paths

Video temp: tmp/
TrackNet CSV: tmp_video/tracknet_csv/
HitFrame results: tmp_hitframe/hitframe_output/

Backend Interfaces

`project_page/project_3rd.py` -> `module/function_button.py`

VQA APIs

backend.load_VQA_engine
backend.VQA_stream (token-by-token streaming)
backend.uploaded_VQA_stream

Diversity Highlight (DVCA)

backend.Diverse_Video_Clip_Retrieve
backend.Diverse_Video_Clip_Sampling
backend.Diverse_Video_Clip_Concat

System Optimization (1): VQA_stream Logic

What engine.predict_stream does

After backend.VQA_stream(...) assembles clip_paths, it calls:
- engine.predict_stream(clip_paths, prompt, task_id="default")
engine.predict_stream(...) returns a token generator that yields (raw_token, token_id) step by step.
The UI does not wait for full output; it updates the screen as each token arrives.

How the UI renders in real time

In function_button.VQA():
1. raw_generator = self.do_VQA_stream(vid_path, prompt)
2. for raw_token, token_id in raw_generator: iterate token by token
3. Parse tags on the fly: remove <thinking>, </thinking>, <answer>, </answer>
4. Update different blocks based on state (THINKING/ANSWER)
5. After each token, immediately call markdown(... + "▌")

Why it can be real-time

engine.predict_stream streams tokens instead of returning a full string at once.
Streamlit st.write_stream / markdown can refresh the UI inside the loop.

Final full decode

The UI accumulates all token_ids and uses engine.model.t5_tokenizer.decode(...) to decode again.
parse_and_format_response(...) cleans and normalizes the output format.

System Optimization (2): Multi-GPU Inference

This project uses "multiple worker processes + one GPU per worker", with HTTP load sharing in parallel.

Worker startup (GPU binding)

lanuch_server.sh starts multiple FastAPI workers with different CUDA_VISIBLE_DEVICES (e.g., 8001–8005):

CUDA_VISIBLE_DEVICES=1 uvicorn worker_service:app --host 0.0.0.0 --port 8001 --workers 1 &
CUDA_VISIBLE_DEVICES=2 uvicorn worker_service:app --host 0.0.0.0 --port 8002 --workers 1 &
...

Worker endpoints and responsibilities (`worker_service.py`)

/predict: VQA inference (ENGINE.predict(...))
/encode_features: feature extraction (FE.encode_video(...))

In worker_service.py, _DEVICE="cuda:0"; but since each process only sees one GPU (limited by CUDA_VISIBLE_DEVICES), it effectively binds to the assigned GPU.

Distributed VQA inference (`diversity/pipeline/retriever_client.py`)

Distributed flow (VQA / Temporal Grounding example):

Discover available workers
- Probe each worker and collect reachable ones (defaults 8001–8007).
Sharding
- Round-robin distribute questions/clip indices across workers to balance load.
Bucketize by K
- Group samples per worker by K so each batch has similar K, reducing padding.
Micro-batching
- Split each worker's samples into micro-batches by batch_size_per_worker.
Async parallelism + flow control
- Send requests asynchronously; each worker caps concurrent batches with max_in_flight to avoid GPU spikes.
Retry and collect results
- Failed batches retry (with backoff); collect all results and restore original order by index.

Distributed feature extraction (`diversity/pipeline/feature_extract_client.py`)

Distributed flow:

Discover available workers (same as VQA).
Round-robin split video paths to each worker.
Each worker forms micro-batches with batch_size_per_worker and sends them.
Collect all features, restore order by index, and compose [N, D].

System Demo

Model Design: VQA Model Architecture

The core goal is to align natural language queries to semantic events and time segments in badminton match videos, especially fine-grained actions like stroke / rally.

The model uses a three-stage training strategy (Stage 0–2), progressively building action-aware visual representations -> vision-language alignment -> semantic temporal localization reasoning.

Overall Architecture Overview

The model consists of three main modules:

Visual Encoder:
- Converts video frame sequences into high-dimensional visual token representations.
- Serves as the visual backbone for all downstream modules.
Q-former:
- Uses learnable query tokens to extract key semantic information from visual features.
- Effectively compresses long videos before passing to the LLM.
LLM:
- Performs high-level semantic reasoning and natural language answer generation.
- Outputs event descriptions and implicit temporal localization results.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
demo_notebook		demo_notebook
diversity		diversity
engines		engines
image		image
lavis		lavis
module		module
output/results		output/results
project_page		project_page
.gitignore		.gitignore
README.md		README.md
app.py		app.py
backend.py		backend.py
inference_caption.py		inference_caption.py
inference_vqa.py		inference_vqa.py
label.csv		label.csv
lanuch_server.sh		lanuch_server.sh
requirements.txt		requirements.txt
worker_service.py		worker_service.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Precision Sport Year 3 Project System

System Development

Entry and Page Structure

Quick Start

1) Install environment

2) Basic launch (UI only)

3) GPU / inference services (optional)

Year 3 Project Page: Key Workflow

Feature overview

Video sources

Main Outputs and Temp Paths

Backend Interfaces

`project_page/project_3rd.py` -> `module/function_button.py`

System Optimization (1): VQA_stream Logic

What engine.predict_stream does

How the UI renders in real time

Why it can be real-time

Final full decode

System Optimization (2): Multi-GPU Inference

Worker startup (GPU binding)

Worker endpoints and responsibilities (`worker_service.py`)

Distributed VQA inference (`diversity/pipeline/retriever_client.py`)

Distributed feature extraction (`diversity/pipeline/feature_extract_client.py`)

System Demo

Model Design: VQA Model Architecture

Overall Architecture Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Precision Sport Year 3 Project System

System Development

Entry and Page Structure

Quick Start

1) Install environment

2) Basic launch (UI only)

3) GPU / inference services (optional)

Year 3 Project Page: Key Workflow

Feature overview

Video sources

Main Outputs and Temp Paths

Backend Interfaces

project_page/project_3rd.py -> module/function_button.py

System Optimization (1): VQA_stream Logic

What engine.predict_stream does

How the UI renders in real time

Why it can be real-time

Final full decode

System Optimization (2): Multi-GPU Inference

Worker startup (GPU binding)

Worker endpoints and responsibilities (worker_service.py)

Distributed VQA inference (diversity/pipeline/retriever_client.py)

Distributed feature extraction (diversity/pipeline/feature_extract_client.py)

System Demo

Model Design: VQA Model Architecture

Overall Architecture Overview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`project_page/project_3rd.py` -> `module/function_button.py`

Worker endpoints and responsibilities (`worker_service.py`)

Distributed VQA inference (`diversity/pipeline/retriever_client.py`)

Distributed feature extraction (`diversity/pipeline/feature_extract_client.py`)

Packages