View-Aware Graph extracts a structured, viewpoint-aware graph JSON from a single image using a Vision-Language Model.
Input Image
-> VLM
-> Structured View-Aware Graph JSON
This repository owns only the image-to-graph extraction stage. Downstream Cypher generation, Neo4j retrieval, graph similarity matching, localization, and mapping are intentionally out of scope here.
The target output is schema-valid JSON containing:
- source image metadata
- viewpoint assumptions
- observed scene nodes
- view-aware spatial relations
- confidence values
- visual evidence text
- uncertainty and occlusion notes when available
The current output contract is schemas/view_aware_graph.schema.json.
- Versioned extraction prompt in
prompts/view_graph_extraction.md - Provider-neutral VLM request/response interface
- Local Ollama adapter for Qwen vision models
- CLI workflow through
view-aware-graph extract - JSON parsing from VLM/Ollama responses
- JSON Schema validation with readable errors
- Conservative repair for common relation vocabulary drift
- PNG graph overlay and GT-free summary commands for inspection
- Unit tests for parsing, validation, adapter behavior, and CLI workflow
This repository does not implement:
- Cypher query generation
- Neo4j world graph retrieval
- graph similarity matching
- localization or mapping
- model fine-tuning
- private image or dataset publication
Fine-tuning may be considered later only after enough curated image/GT-graph pairs exist.
Create and activate the conda environment:
conda env create -f environment.yml
conda activate view-aware-graphIf the environment already exists:
conda env update -f environment.yml --prune
conda activate view-aware-graphInstall the package in editable mode:
pip install -e .[dev]Run checks:
ruff check .
mypy
pytestSet the local adapter environment:
export VLM_PROVIDER=ollama
export VLM_MODEL=qwen2.5vl:7bRun the GT lobby image:
view-aware-graph extract \
--image data/raw/smartcitylab_lobby/SmartCityLab_Lobby_GT.jpg \
--config configs/default.toml \
--output data/processed/view_graphs/smartcitylab_lobby_gt_cli.json \
--run-id smartcitylab_lobby_gt_cli \
--verboseFor the 32B comparison run:
export VLM_MODEL=qwen2.5vl:32b
export VLM_TIMEOUT_SECONDS=1800
view-aware-graph extract \
--image data/raw/smartcitylab_lobby/SmartCityLab_Lobby_GT.jpg \
--config configs/default.toml \
--output data/processed/view_graphs/smartcitylab_lobby_gt_qwen2_5vl_32b_cli.json \
--run-id smartcitylab_lobby_gt_qwen2_5vl_32b_cli \
--verboseGenerated raw and processed outputs are written under ignored local directories:
data/interim/vlm_raw/
data/processed/view_graphs/
The CLI also writes a PNG graph overlay next to the graph JSON by default:
data/processed/view_graphs/<run_id>.png
data/processed/view_graphs/<run_id>.report.json
Summarize an existing graph without another VLM call:
view-aware-graph summarize \
--graph data/processed/view_graphs/smartcitylab_lobby_gt_qwen2_5vl_32b_cli.json \
--report data/processed/view_graphs/smartcitylab_lobby_gt_qwen2_5vl_32b_cli.report.json- Project rules:
docs/en/CONVENTIONS.md - Graph scheme:
docs/en/VIEW_AWARE_GRAPH_SCHEME.md - Annotation guide:
docs/en/ANNOTATION_GUIDE.md - CLI workflow:
docs/en/CLI_WORKFLOW.md - Metric plan:
docs/en/METRIC_PLAN.md - VLM interface:
docs/en/VLM_INTERFACE.md - Model candidates:
docs/en/VLM_MODEL_CANDIDATES.md - Evaluation rubric:
docs/en/VLM_EVALUATION_RUBRIC.md - Current tasks:
docs/en/TODO.md
Local raw images, raw VLM responses, generated graph outputs, and Korean local documentation are ignored by Git by default. Only small, safe examples should be committed intentionally.