Skip to content

aiden1020/PrecisionSport-system

Repository files navigation

Precision Sport Year 3 Project System

This document is the complete system description for the Precision Sport Science Project (Year 3), split into System Development and Model Design.


System Development

Entry and Page Structure

  • Entry file: app.py (Streamlit multipage)

  • Main pages:

    • project_page/Home.py
    • project_page/CoLab_project.py
    • project_page/Project_2nd.py
    • project_page/project_3rd.py (key workflow for this project)

Quick Start

1) Install environment

pip install -r requirements.txt

2) Basic launch (UI only)

streamlit run app.py

3) GPU / inference services (optional)

If you want to use VQA / feature extraction / DVCA backend inference, start the worker services first:

bash lanuch_server.sh

After the UI starts, go to the project_3rd page to upload videos and run VQA and Highlight.


Year 3 Project Page: Key Workflow

File location: project_page/project_3rd.py

Feature overview

  • Select built-in videos or upload a local mp4

  • Sidebar shows the current video

  • Main content tabs:

    • Badminton Tactic Analyst
    • Badminton Highlight Director

Video sources

  • Built-in videos: selected from the GAME_INFO dictionary

  • Upload your own: use st.file_uploader to upload .mp4

    • Uploaded video is temporarily stored in tmp/

Main Outputs and Temp Paths

  • Video temp: tmp/
  • TrackNet CSV: tmp_video/tracknet_csv/
  • HitFrame results: tmp_hitframe/hitframe_output/

Backend Interfaces

project_page/project_3rd.py -> module/function_button.py

VQA APIs

  • backend.load_VQA_engine
  • backend.VQA_stream (token-by-token streaming)
  • backend.uploaded_VQA_stream

Diversity Highlight (DVCA)

  • backend.Diverse_Video_Clip_Retrieve
  • backend.Diverse_Video_Clip_Sampling
  • backend.Diverse_Video_Clip_Concat

System Optimization (1): VQA_stream Logic

What engine.predict_stream does

  1. After backend.VQA_stream(...) assembles clip_paths, it calls:
    • engine.predict_stream(clip_paths, prompt, task_id="default")
  2. engine.predict_stream(...) returns a token generator that yields (raw_token, token_id) step by step.
  3. The UI does not wait for full output; it updates the screen as each token arrives.

How the UI renders in real time

  • In function_button.VQA():
    1. raw_generator = self.do_VQA_stream(vid_path, prompt)
    2. for raw_token, token_id in raw_generator: iterate token by token
    3. Parse tags on the fly: remove <thinking>, </thinking>, <answer>, </answer>
    4. Update different blocks based on state (THINKING/ANSWER)
    5. After each token, immediately call markdown(... + "▌")

Why it can be real-time

  • engine.predict_stream streams tokens instead of returning a full string at once.
  • Streamlit st.write_stream / markdown can refresh the UI inside the loop.

Final full decode

  • The UI accumulates all token_ids and uses engine.model.t5_tokenizer.decode(...) to decode again.
  • parse_and_format_response(...) cleans and normalizes the output format.

image

System Optimization (2): Multi-GPU Inference

This project uses "multiple worker processes + one GPU per worker", with HTTP load sharing in parallel. image

Worker startup (GPU binding)

lanuch_server.sh starts multiple FastAPI workers with different CUDA_VISIBLE_DEVICES (e.g., 8001–8005):

CUDA_VISIBLE_DEVICES=1 uvicorn worker_service:app --host 0.0.0.0 --port 8001 --workers 1 &
CUDA_VISIBLE_DEVICES=2 uvicorn worker_service:app --host 0.0.0.0 --port 8002 --workers 1 &
...

Worker endpoints and responsibilities (worker_service.py)

  • /predict: VQA inference (ENGINE.predict(...))
  • /encode_features: feature extraction (FE.encode_video(...))

In worker_service.py, _DEVICE="cuda:0"; but since each process only sees one GPU (limited by CUDA_VISIBLE_DEVICES), it effectively binds to the assigned GPU.

Distributed VQA inference (diversity/pipeline/retriever_client.py)

Distributed flow (VQA / Temporal Grounding example):

  1. Discover available workers
    • Probe each worker and collect reachable ones (defaults 8001–8007).
  2. Sharding
    • Round-robin distribute questions/clip indices across workers to balance load.
  3. Bucketize by K
    • Group samples per worker by K so each batch has similar K, reducing padding.
  4. Micro-batching
    • Split each worker's samples into micro-batches by batch_size_per_worker.
  5. Async parallelism + flow control
    • Send requests asynchronously; each worker caps concurrent batches with max_in_flight to avoid GPU spikes.
  6. Retry and collect results
    • Failed batches retry (with backoff); collect all results and restore original order by index.

image

Distributed feature extraction (diversity/pipeline/feature_extract_client.py)

Distributed flow:

  1. Discover available workers (same as VQA).
  2. Round-robin split video paths to each worker.
  3. Each worker forms micro-batches with batch_size_per_worker and sends them.
  4. Collect all features, restore order by index, and compose [N, D].

System Demo

IMAGE ALT TEXT HERE IMAGE ALT TEXT HERE


Model Design: VQA Model Architecture

The core goal is to align natural language queries to semantic events and time segments in badminton match videos, especially fine-grained actions like stroke / rally.

The model uses a three-stage training strategy (Stage 0–2), progressively building action-aware visual representations -> vision-language alignment -> semantic temporal localization reasoning. image


Overall Architecture Overview

The model consists of three main modules:

  1. Visual Encoder:

    • Converts video frame sequences into high-dimensional visual token representations.
    • Serves as the visual backbone for all downstream modules.
  2. Q-former:

    • Uses learnable query tokens to extract key semantic information from visual features.
    • Effectively compresses long videos before passing to the LLM.
  3. LLM:

    • Performs high-level semantic reasoning and natural language answer generation.
    • Outputs event descriptions and implicit temporal localization results.

About

This repo is for NSTC precision sport project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages