Skip to content

JOyagdol/View-Aware-Graph

Repository files navigation

View-Aware Graph

View-Aware Graph extracts a structured, viewpoint-aware graph JSON from a single image using a Vision-Language Model.

Input Image
  -> VLM
  -> Structured View-Aware Graph JSON

This repository owns only the image-to-graph extraction stage. Downstream Cypher generation, Neo4j retrieval, graph similarity matching, localization, and mapping are intentionally out of scope here.

What It Produces

The target output is schema-valid JSON containing:

  • source image metadata
  • viewpoint assumptions
  • observed scene nodes
  • view-aware spatial relations
  • confidence values
  • visual evidence text
  • uncertainty and occlusion notes when available

The current output contract is schemas/view_aware_graph.schema.json.

Current Capabilities

  • Versioned extraction prompt in prompts/view_graph_extraction.md
  • Provider-neutral VLM request/response interface
  • Local Ollama adapter for Qwen vision models
  • CLI workflow through view-aware-graph extract
  • JSON parsing from VLM/Ollama responses
  • JSON Schema validation with readable errors
  • Conservative repair for common relation vocabulary drift
  • PNG graph overlay and GT-free summary commands for inspection
  • Unit tests for parsing, validation, adapter behavior, and CLI workflow

Not Included

This repository does not implement:

  • Cypher query generation
  • Neo4j world graph retrieval
  • graph similarity matching
  • localization or mapping
  • model fine-tuning
  • private image or dataset publication

Fine-tuning may be considered later only after enough curated image/GT-graph pairs exist.

Setup

Create and activate the conda environment:

conda env create -f environment.yml
conda activate view-aware-graph

If the environment already exists:

conda env update -f environment.yml --prune
conda activate view-aware-graph

Install the package in editable mode:

pip install -e .[dev]

Run checks:

ruff check .
mypy
pytest

Local Ollama Run

Set the local adapter environment:

export VLM_PROVIDER=ollama
export VLM_MODEL=qwen2.5vl:7b

Run the GT lobby image:

view-aware-graph extract \
  --image data/raw/smartcitylab_lobby/SmartCityLab_Lobby_GT.jpg \
  --config configs/default.toml \
  --output data/processed/view_graphs/smartcitylab_lobby_gt_cli.json \
  --run-id smartcitylab_lobby_gt_cli \
  --verbose

For the 32B comparison run:

export VLM_MODEL=qwen2.5vl:32b
export VLM_TIMEOUT_SECONDS=1800

view-aware-graph extract \
  --image data/raw/smartcitylab_lobby/SmartCityLab_Lobby_GT.jpg \
  --config configs/default.toml \
  --output data/processed/view_graphs/smartcitylab_lobby_gt_qwen2_5vl_32b_cli.json \
  --run-id smartcitylab_lobby_gt_qwen2_5vl_32b_cli \
  --verbose

Generated raw and processed outputs are written under ignored local directories:

data/interim/vlm_raw/
data/processed/view_graphs/

The CLI also writes a PNG graph overlay next to the graph JSON by default:

data/processed/view_graphs/<run_id>.png
data/processed/view_graphs/<run_id>.report.json

Summarize an existing graph without another VLM call:

view-aware-graph summarize \
  --graph data/processed/view_graphs/smartcitylab_lobby_gt_qwen2_5vl_32b_cli.json \
  --report data/processed/view_graphs/smartcitylab_lobby_gt_qwen2_5vl_32b_cli.report.json

Documentation

Data Policy

Local raw images, raw VLM responses, generated graph outputs, and Korean local documentation are ignored by Git by default. Only small, safe examples should be committed intentionally.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages