ArtSeek

Official code for "ArtSeek: Deep artwork understanding via multimodal in-context reasoning and late interaction retrieval".

Summary

ArtSeek is a multimodal system for understanding artworks built on three components:

Multimodal Retrieval — Retrieves relevant information from 5M+ multimodal fragments of Wikipedia's visual arts section via a late interaction mechanism built on ColQwen2.
Late Interaction Classification Network (LICN) — Predicts artwork attributes (artist, genre, style) by combining multimodal retrieval with a multi-head classifier.
In-Context Reasoning — A Qwen2.5-VL-32B MLLM reasons over retrieved fragments and predicted attributes to answer open-ended questions about artworks.

Installation

Note

Developed on: 8 CPUs, 1× NVIDIA A100 (CUDA 12.6), 128 GB RAM.

uv sync
uv pip install flash-attn --no-build-isolation

Qdrant Setup

ArtSeek uses a Qdrant vector store. On HPC systems, build Qdrant from source:

# Requires Rust 1.87.0+, LLVM/Clang (e.g. via Spack)
spack load gcc
spack load llvm
export LLVM_ROOT=$(spack location -i llvm)
export LIBCLANG_PATH=$LLVM_ROOT/lib

# Install Protobuf locally
mkdir -p ./proto_bin
curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v25.1/protoc-25.1-linux-x86_64.zip
unzip -o protoc-25.1-linux-x86_64.zip -d ./proto_bin
export PROTOC=$(pwd)/proto_bin/bin/protoc
export PATH=$(pwd)/proto_bin/bin:$PATH

# Build Qdrant
git clone https://github.com/qdrant/qdrant.git
cd qdrant && git checkout v1.17.0
RUSTFLAGS="-C target-cpu=native" cargo build --release --bin qdrant

Try ArtSeek

Note

Running the full pipeline requires the Qdrant server with the wikifragments-visual-arts-embeds collection loaded. The store requires ~250 GB of disk space.

1. Download datasets and models

Download the embeddings dataset from Hugging Face:

from datasets import load_dataset
ds = load_dataset("cilabuniba/wikifragments-visual-arts-embeds", num_proc=4)

Download the LICN classifier:

hf download cilabuniba/artseek-licn

2. Build the Qdrant store

Start your Qdrant server, then ingest the embeddings. We provide launches/make_qdrant_store.sh to reproduce our setup (4 GPUs, 8 CPUs per process):

python -m artseek.data.main make-qdrant-store --process-idx 0 --num-proc 1

Increase --num-proc and run multiple processes in parallel (index 0 to N-1) to speed up ingestion (~2.5 hours with 4 processes).

3. Add the Qdrant index

python -m artseek.data.main add-qdrant-index

⏱ This step takes approximately 8 hours.

4. Configure the model path

In artseek/method/generate/pipe.py, update the licn_pretrained_path with your local snapshot path from the downloaded artseek-licn model:

MODEL = Qwen2_5_VLRAGModel(
    retriever_pretrained_model_name_or_path=get_models_dir() / "colqwen2-v1.0",
    retriever_collection_name="wikifragments-visual-arts-embeds",
    model_pretrained_model_name_or_path="Qwen/Qwen2.5-VL-32B-Instruct-AWQ",
    licn_pretrained_path="data_/hf/hub/models--cilabuniba--artseek-licn/snapshots/<your-snapshot-hash>",
)

5. Run the notebook

Open try.ipynb to interact with ArtSeek. You can toggle classification and retrieval via the classify and retrieve parameters in build_graph.

Train the LICN Module

accelerate launch -m artseek.method.classify.train_li_classification_network train \
  --config-path models/configs/classify/li_classification_network_tft.yaml

Evaluation

Retrieval

Run artseek/method/retrieve/eval.py. Create separate data stores for each configuration to evaluate.

Classification

accelerate launch -m artseek.method.classify.train_li_classification_network test \
  --config-path models/configs/classify/li_classification_network_tft.yaml

Text Generation

# Run inference
python -m artseek.method.generate.test inference \
  --config-path models/configs/generate/artpedia_short.yaml

# Compute NLP metrics
python -m artseek.method.generate.eval pred-message-to-str \
  --config-path models/configs/generate/artpedia_short.yaml

Note: consider disabling SPICE for large datasets like PaintingForm. See data/README.md for dataset setup instructions (ArtPedia, PaintingForm, SemArt v2.0).

Reproducibility

The sections below document the original data pipeline for full reproducibility. These steps are not needed to try ArtSeek — all datasets and models are available on Hugging Face.

Neo4j (ArtGraph)

Install Neo4j Community Edition 4.4.47.
Extract: tar -xzf neo4j-community-4.4.47-unix.tar.gz
Download the ArtGraph dump (artgraph2.0.dump).
Load the graph: ./neo4j-community-4.4.47/bin/neo4j-admin load --from=artgraph2.0.dump --database=neo4j --force
Download the APOC JAR (v4.4.0.24) and place it in the plugins/ folder.
Enable APOC procedures in conf/neo4j.conf.

Data Pipeline (in order)

All steps are run via artseek.data.main:

download-wikiart-images — Downloads WikiArt images (target: 116,475 images). Repeat until complete.
make-artgraph-dataset — Builds the multitask classification dataset from Neo4j. Requires ArtGraph running.
define-valid-labels-artgraph-dataset — Defines evaluation labels.
get-visual-arts-dataset-pages — Recursively collects Wikipedia pages under "Visual arts" (depth=5).
WikiExtractor (run from wikiextractor/ directory):
```
python WikiExtractorNew.py --json -s --lists --links \
    ../data/dumps/enwiki-latest-pages-articles.xml.bz2 -o text_en
```
This is a modified WikiExtractor that preserves image URLs and captions as <a> tags.
download-and-save-images-wikipedia — Downloads images from extracted Wikipedia pages.
create-wikifragments-dataset — Builds an HF dataset of Wikipedia paragraphs with attached images.
create-wikifragments-visual-arts-full-dataset — Filters to visual arts pages and builds fragment images.
colqwen-embed-new — Embeds fragments using ColQwen2 multi-vector representations (parallelizable).
make-qdrant-store — Ingests embeddings into Qdrant.
add-qdrant-index — Adds the HNSW index for efficient retrieval.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
artseek		artseek
assets		assets
data		data
fonts		fonts
launches		launches
models		models
notebooks		notebooks
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
app.py		app.py
artseek.yml		artseek.yml
chat_template.json		chat_template.json
main.py		main.py
pyproject.toml		pyproject.toml
self_portrait.jpg		self_portrait.jpg
temp.ipynb		temp.ipynb
try.ipynb		try.ipynb
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArtSeek

Summary

Installation

Qdrant Setup

Try ArtSeek

1. Download datasets and models

2. Build the Qdrant store

3. Add the Qdrant index

4. Configure the model path

5. Run the notebook

Train the LICN Module

Evaluation

Retrieval

Classification

Text Generation

Reproducibility

Neo4j (ArtGraph)

Data Pipeline (in order)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ArtSeek

Summary

Installation

Qdrant Setup

Try ArtSeek

1. Download datasets and models

2. Build the Qdrant store

3. Add the Qdrant index

4. Configure the model path

5. Run the notebook

Train the LICN Module

Evaluation

Retrieval

Classification

Text Generation

Reproducibility

Neo4j (ArtGraph)

Data Pipeline (in order)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages