prism-annotator

A CLI tool for automatic PRISM annotation of medical/clinical texts using LLMs.

PRISM (Problem-oriented, Real-time, Informatics-based, Structured, Medical record) defines a schema for annotating medical entities (diseases, symptoms, anatomical parts, tests, medications, etc.) and their relations (temporal, spatial, causal) in clinical text.

prism-annotator uses the TANL (Translation between Augmented Natural Languages; Paolini+, ICLR 2021) inline annotation format with LLMs to extract structured PRISM annotations from free-text medical documents.

Installation

pip install prism-annotator

Or with uv:

uv add prism-annotator

Quick Start

1. Scaffold a new project

prism init my-project --language en
cd my-project

This creates:

config.yaml — extraction configuration
prompts/ — system prompts and few-shot examples (customise these)
data/ — place your input texts here
.env.example — API key template

2. Add your input data

Place .txt files in data/, or point config.yaml at a CSV file:

data:
  input_path: "data/"          # directory of .txt files
  # input_path: "data/notes.csv"  # or a CSV file
  # text_column: "text"           # CSV column name

3. Add few-shot examples

Edit prompts/entity_examples.yaml with domain-specific examples:

- input: "Chest CT showed ground-glass opacity in the right lung."
  output: "[Chest CT | t-test(+)] showed [ground-glass opacity | d(+)] in the [right lung | a]."

4. Set your API key and run

export OPENROUTER_API_KEY=sk-...
prism extract --config config.yaml

CLI Commands

prism extract       Run entity or relation extraction
prism merge         Merge entity + relation results
prism visualise     Generate interactive HTML viewer
prism to-xml        Convert to PRISM inline XML
prism validate      Validate results against PRISM schema
prism init          Scaffold a new project

Multi-phase pipeline

PRISM annotation runs in three phases:

# Phase 1: Entity extraction
prism extract --config entity.yaml

# Phase 2a: Medical relation extraction
prism extract --config medical_rel.yaml --entity-results output/entity/results.json

# Phase 2b: Temporal relation extraction
prism extract --config time_rel.yaml --entity-results output/entity/results.json

# Phase 3: Merge
prism merge --entity output/entity/results.json \
            --medical-relation output/medical/results.json \
            --time-relation output/time/results.json \
            -o output/merged

# Generate viewer
prism visualise output/merged

Configuration

See config.yaml generated by prism init for all options. Key settings:

Section	Field	Description
`data.input_path`	Path	Directory of `.txt` files or a `.csv` file
`data.text_column`	String	CSV column containing document text
`model.model_id`	String	LLM model ID (OpenRouter format)
`model.base_url`	URL	API endpoint (OpenRouter, OpenAI, etc.)
`prompts.language`	`ja`/`en`	Language for built-in system prompts
`prompts.prompts_dir`	Path	Custom prompts directory

Custom Prompts

The prompt fallback chain:

prompts_dir in config (if set)
prompts/ in working directory
Built-in defaults (Japanese or English)

Each phase has a system prompt (.md) and few-shot examples (.yaml):

Phase	System prompt	Examples
Entity	`entity_system.md`	`entity_examples.yaml`
Medical relation	`medical_relation_system.md`	`medical_relation_examples.yaml`
Time relation	`time_relation_system.md`	`time_relation_examples.yaml`

Output Formats

JSON (results.json) — primary structured output
XML (results.xml) — PRISM inline-annotated XML
HTML (viewer.html) — interactive browser-based viewer
Statistics (stats.json) — entity/relation distribution

PRISM Schema (v8)

13 entity types, 10 relation types. See the PRISM Annotation Guidelines v8 for full specification:

The PRISM annotation scheme was originally proposed in the following works:

Shuntaro Yada, Ayami Joh, Ribeka Tanaka, Fei Cheng, Eiji Aramaki, and Sadao Kurohashi. 2020. Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), pages 4565–4572, Marseille, France. European Language Resources Association. [ACL Anthology]

矢田竣太郎, 田中リベカ, Fei Cheng, 荒牧英治, 黒橋禎夫. 2022. 汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定：重篤肺疾患ドメインに着目して. 自然言語処理, 29(4), pp. 1165–1197. [J-STAGE]

Note that our "PRISM" acronym (i.e. Problem-oriented, Real-time, Informatics-based, Structured, Medical record) dereived from a research funding scheme, called PRISM (Public/Private R&D Investment Strategic Expansion PrograM), by which our research project above was originally supported.

The TANL inline annotation format used by this tool is adapted from:

Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, and Stefano Soatto. 2021. Structured Prediction as Translation between Augmented Natural Languages. In Proceedings of the Ninth International Conference on Learning Representations (ICLR). [OpenReview]

Supported LLM Providers

Any OpenAI-compatible API endpoint works. Configure in config.yaml:

model:
  model_id: "anthropic/claude-sonnet-4"   # OpenRouter
  base_url: "https://openrouter.ai/api/v1"
  api_key_env: "OPENROUTER_API_KEY"

model:
  model_id: "gpt-4o"                        # OpenAI direct
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"

Citation

If you use this tool, please cite the original PRISM annotation works:

@inproceedings{yada-etal-2020-towards,
    title = "Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases",
    author = "Yada, Shuntaro and Joh, Ayami and Tanaka, Ribeka and Cheng, Fei and Aramaki, Eiji and Kurohashi, Sadao",
    booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
    month = may,
    year = "2020",
    address = "Marseille, France",
    publisher = "European Language Resources Association",
    url = "https://aclanthology.org/2020.lrec-1.561/",
    pages = "4565--4572",
    isbn = "979-10-95546-34-4",
}

@article{yada-etal-2022-prism,
    title = "汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定：重篤肺疾患ドメインに着目して",
    author = "矢田, 竣太郎 and 田中, リベカ and Cheng, Fei and 荒牧, 英治 and 黒橋, 禎夫",
    journal = "自然言語処理",
    volume = "29",
    number = "4",
    pages = "1165--1197",
    year = "2022",
    url = "https://www.jstage.jst.go.jp/article/jnlp/29/4/29_1165/_article/-char/ja/",
}

See also CITATION.cff for machine-readable citation metadata.

Changelog

0.2.1

Fix: viewer entity highlights now align to the original source text instead of using TANL-stripped offsets, which caused the first character(s) of entities to be excluded or misaligned.

0.2.0

Initial release.

Licence

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
src/prism_annotator		src/prism_annotator
.gitignore		.gitignore
.python-version		.python-version
CITATION.cff		CITATION.cff
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

prism-annotator

Installation

Quick Start

1. Scaffold a new project

2. Add your input data

3. Add few-shot examples

4. Set your API key and run

CLI Commands

Multi-phase pipeline

Configuration

Custom Prompts

Output Formats

PRISM Schema (v8)

Supported LLM Providers

Citation

Changelog

0.2.1

0.2.0

Licence

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

prism-annotator

Installation

Quick Start

1. Scaffold a new project

2. Add your input data

3. Add few-shot examples

4. Set your API key and run

CLI Commands

Multi-phase pipeline

Configuration

Custom Prompts

Output Formats

PRISM Schema (v8)

Supported LLM Providers

Citation

Changelog

0.2.1

0.2.0

Licence

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages