SmartMatch: Multi-Agent Mood Board Generation via Graph-Augmented Retrieval and Multimodal Synthesis

Team 04

Qingyang Wang · Nandini Kodali · Caroline Delva · Xinzhou Li

Georgetown University — DSAN 6725: Applied Generative AI for Developers — Spring 2026

Abstract

Visual content selection is a recurring challenge for creatives and marketers who must identify images that match not just a topic but a specific emotional tone, aesthetic intent, and compositional style. Conventional image search fails because abstract or emotionally rich language does not map naturally to the visual feature spaces that retrieval models operate in.

SmartMatch is a multi-agent system that generates cohesive nine-image mood boards from free-form natural language. The pipeline comprises five stages: a Visual Concept Grounding Agent that uses Claude to decompose user intent into structured visual descriptors; a Hybrid Retrieval system combining SigLIP-2 visual embeddings with per-field OpenAI text embeddings over 25,000 Unsplash images; a Graph RAG Agent that builds a knowledge graph over the corpus and performs candidate deduplication, expansion, and reranking; a Multimodal Verification and Coherence Agent that selects a visually consistent final set; and a Justification Agent that produces natural-language explanations for each image alongside a board-level narrative. When retrieval scores fall below a threshold, the system falls back to gpt-image-1 with Claude-driven diverse prompt synthesis.

LLM-as-judge evaluation across 50 diverse queries yields an overall mean score of 3.45 / 5.0, with relevance at 4.18, coherence at 3.06, and aesthetics at 3.16.

Pipeline

User Input (text + optional images)
  → [1] Input Guardrail
  → [2] Visual Concept Grounding Agent (Claude)
       → visual_description, scene, mood, style, lighting, color_palette, intent

  Branch A (uploaded images)
       → Generation Agent (gpt-image-1, editing mode)

  Branch B (text-only)
       → Hybrid Retrieval (SigLIP-2 × 0.3 + Field Text × 0.7, top-20)
       → Graph RAG: deduplicate → expand → rerank
       → score ≥ 0.5?
           YES → Multimodal Verification → Coherence Agent
           NO  → Generation Agent (diverse prompt synthesis)

  → Justification Agent (per-image + board summary)
  → Output Guardrail
  → Mood Board UI (like/dislike · chat refinement · download)

System Components

Component	Description	Technology
Visual Concept Grounding	Converts abstract text into structured visual descriptors	Claude (haiku)
Hybrid Retrieval	Cosine similarity over SigLIP-2 embeddings + per-field text embeddings	SigLIP-2 + OpenAI + FAISS
Graph RAG	Knowledge graph over 25,000 images; dedup, expand, rerank	FAISS + NetworkX
Multimodal Verification	Filters candidates against query mood/palette/intent	Claude Vision
Coherence Agent	Selects final 9 images balancing consistency and diversity	Claude
Generation Agent	Synthesizes images with diverse prompt synthesis + quality retry	gpt-image-1
Justification Agent	Per-image explanations + board narrative	Claude
Multi-turn Refinement	Chat interface: like/dislike signals + natural-language feedback steer retrieval across turns	MemoryManager + Streamlit

Running Locally

Prerequisites

Python 3.10+
API keys: Anthropic, OpenAI, HuggingFace

1. Install dependencies

pip install -r requirements.txt
playwright install chromium

2. Set up environment variables

cp .env.example .env

Fill in .env:

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
HF_TOKEN=hf_...

3. Download data files

Large data files are hosted on HuggingFace and auto-downloaded on first startup if HF_TOKEN is set. To download manually:

python download_data.py

Downloads into src/data/ (~1 GB total):

File	Size
`embeddings/image_embeddings.npy`	154 MB
`embeddings/field_embeddings.npz`	922 MB
`graph/image_graph.pkl`	~200 MB
`processed/dataset_clean.csv`	—
`processed/description_grounding_outputs.json`	—

4. Run the app

streamlit run src/app.py

Open http://localhost:8501 and describe a mood, feeling, or idea to generate a mood board.

Deploying to HuggingFace Spaces

Live demo: huggingface.co/spaces/NandiniKodali/smartmatch

First-time setup

Create a Space — SDK: Streamlit, Hardware: CPU Basic
Add secrets in Space Settings → Variables and Secrets:
- ANTHROPIC_API_KEY
- OPENAI_API_KEY
- HF_TOKEN
Add remote: git remote add space https://huggingface.co/spaces/NandiniKodali/smartmatch

Redeploy after changes

HF Spaces rejects binary/large files via standard git push. Use a clean orphan branch:

git checkout --orphan space-deploy
git add -A
git rm --cached -r outputs/ ProjectPaper/ spring-2025/ spring-2026/ "src/agents/moodboard_layout/rendered_templates/"
git commit -m "deploy: your message here"
git push space space-deploy:main --force
git checkout -f main
git branch -D space-deploy

On first startup the Space auto-downloads embedding files (~1 min). Subsequent loads are fast.

Project Structure

src/
├── app.py                          # Streamlit UI + multi-turn chat
├── api/server.py                   # FastAPI server
├── pipeline/
│   ├── state.py                    # Pydantic models (GroundingOutput, ImageResult, MoodBoardBundle)
│   ├── logger.py                   # Structured JSONL pipeline logger
│   └── run_pipeline.py             # Pipeline runner
├── agents/
│   ├── orchestrator/               # Coordinates all pipeline stages
│   ├── guardrails/                 # Input / output safety checks
│   ├── qwen_visual_grounding/      # Visual Concept Grounding Agent + Justification Agent
│   ├── siglip_image_retrieval/     # SigLIP-2 visual embedding + FAISS retrieval
│   ├── field_text_retrieval/       # Per-field text embedding + hybrid scoring
│   ├── graph_rag/                  # Graph RAG: dedup → expand → rerank
│   ├── multimodal_verification/    # Claude Vision candidate filtering
│   ├── coherence/                  # Final 9-image coherence selection
│   ├── generation/                 # gpt generation + diverse prompt synthesis
│   ├── moodboard_layout/           # HTML template
│   ├── memory/memory_manager.py    # Cross-turn grounding state
│   └── content_router/             # Branch A / B routing logic
├── data/
│   ├── data_prep/                  # Graph construction, embedding generation, data cleaning
│   ├── embeddings/                 # image_embeddings.npy, field_embeddings.npz (HF-hosted)
│   ├── graph/                      # image_graph.pkl (HF-hosted)
│   └── processed/                  # dataset_clean.csv, description_grounding_outputs.json
├── evaluation/
│   ├── llm_judge.py                # LLM-as-judge scorer (50-query evaluation)
│   └── ablation.py                 # Ablation study runner
└── tools/                          # Diagnostics, patch scripts, comparison utilities

Key Contributions

Visual Concept Grounding — Claude converts abstract user text into structured visual descriptors (mood, palette, lighting, intent), directly addressing SigLIP-2's weakness on non-literal language.
Graph RAG over image corpus — A weighted knowledge graph (750K edges, avg degree 30) enables connectivity-based reranking that improves coherence beyond flat similarity search.
Routing with pre-Graph RAG score — Fallback to generation is triggered by the raw hybrid score, preventing artificially inflated post-reranking scores from masking low retrieval quality.
Diverse prompt synthesis + quality retry — Claude generates visually distinct prompts before generation; images scoring below threshold are individually re-prompted without discarding the full batch.
Multi-turn refinement loop — Like/dislike signals and natural-language chat feedback are folded into subsequent grounding calls, steering retrieval toward the user's aesthetic intent across turns.

Deliverables

Item	File
Paper	`Deliverables/FinalPaper.pdf`
Slides	`Deliverables/SmartMatch-slides.pdf`
Poster	`Deliverables/Poster.pdf`
Live demo	huggingface.co/spaces/NandiniKodali/smartmatch

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
Deliverables		Deliverables
ProjectPaper		ProjectPaper
src		src
.env.example		.env.example
.gitignore		.gitignore
InitialInstructions.md		InitialInstructions.md
LICENSE		LICENSE
README.md		README.md
deliverables.md		deliverables.md
download_data.py		download_data.py
evaluation_report.txt		evaluation_report.txt
moodboard_plan.md		moodboard_plan.md
packages.txt		packages.txt
pytest.ini		pytest.ini
requirements.txt		requirements.txt
risks.md		risks.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SmartMatch: Multi-Agent Mood Board Generation via Graph-Augmented Retrieval and Multimodal Synthesis

Team 04

Abstract

Pipeline

System Components

Running Locally

Prerequisites

1. Install dependencies

2. Set up environment variables

3. Download data files

4. Run the app

Deploying to HuggingFace Spaces

First-time setup

Redeploy after changes

Project Structure

Key Contributions

Deliverables

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SmartMatch: Multi-Agent Mood Board Generation via Graph-Augmented Retrieval and Multimodal Synthesis

Team 04

Abstract

Pipeline

System Components

Running Locally

Prerequisites

1. Install dependencies

2. Set up environment variables

3. Download data files

4. Run the app

Deploying to HuggingFace Spaces

First-time setup

Redeploy after changes

Project Structure

Key Contributions

Deliverables

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages