A CLI tool for automatic PRISM annotation of medical/clinical texts using LLMs.
PRISM (Problem-oriented, Real-time, Informatics-based, Structured, Medical record) defines a schema for annotating medical entities (diseases, symptoms, anatomical parts, tests, medications, etc.) and their relations (temporal, spatial, causal) in clinical text.
prism-annotator uses the TANL (Translation between Augmented Natural Languages; Paolini+, ICLR 2021) inline annotation format with LLMs to extract structured PRISM annotations from free-text medical documents.
pip install prism-annotatorOr with uv:
uv add prism-annotatorprism init my-project --language en
cd my-projectThis creates:
config.yaml— extraction configurationprompts/— system prompts and few-shot examples (customise these)data/— place your input texts here.env.example— API key template
Place .txt files in data/, or point config.yaml at a CSV file:
data:
input_path: "data/" # directory of .txt files
# input_path: "data/notes.csv" # or a CSV file
# text_column: "text" # CSV column nameEdit prompts/entity_examples.yaml with domain-specific examples:
- input: "Chest CT showed ground-glass opacity in the right lung."
output: "[Chest CT | t-test(+)] showed [ground-glass opacity | d(+)] in the [right lung | a]."export OPENROUTER_API_KEY=sk-...
prism extract --config config.yamlprism extract Run entity or relation extraction
prism merge Merge entity + relation results
prism visualise Generate interactive HTML viewer
prism to-xml Convert to PRISM inline XML
prism validate Validate results against PRISM schema
prism init Scaffold a new project
PRISM annotation runs in three phases:
# Phase 1: Entity extraction
prism extract --config entity.yaml
# Phase 2a: Medical relation extraction
prism extract --config medical_rel.yaml --entity-results output/entity/results.json
# Phase 2b: Temporal relation extraction
prism extract --config time_rel.yaml --entity-results output/entity/results.json
# Phase 3: Merge
prism merge --entity output/entity/results.json \
--medical-relation output/medical/results.json \
--time-relation output/time/results.json \
-o output/merged
# Generate viewer
prism visualise output/mergedSee config.yaml generated by prism init for all options. Key settings:
| Section | Field | Description |
|---|---|---|
data.input_path |
Path | Directory of .txt files or a .csv file |
data.text_column |
String | CSV column containing document text |
model.model_id |
String | LLM model ID (OpenRouter format) |
model.base_url |
URL | API endpoint (OpenRouter, OpenAI, etc.) |
prompts.language |
ja/en |
Language for built-in system prompts |
prompts.prompts_dir |
Path | Custom prompts directory |
The prompt fallback chain:
prompts_dirin config (if set)prompts/in working directory- Built-in defaults (Japanese or English)
Each phase has a system prompt (.md) and few-shot examples (.yaml):
| Phase | System prompt | Examples |
|---|---|---|
| Entity | entity_system.md |
entity_examples.yaml |
| Medical relation | medical_relation_system.md |
medical_relation_examples.yaml |
| Time relation | time_relation_system.md |
time_relation_examples.yaml |
- JSON (
results.json) — primary structured output - XML (
results.xml) — PRISM inline-annotated XML - HTML (
viewer.html) — interactive browser-based viewer - Statistics (
stats.json) — entity/relation distribution
13 entity types, 10 relation types. See the PRISM Annotation Guidelines v8 for full specification:
The PRISM annotation scheme was originally proposed in the following works:
Shuntaro Yada, Ayami Joh, Ribeka Tanaka, Fei Cheng, Eiji Aramaki, and Sadao Kurohashi. 2020. Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases. In Proceedings of the Twelfth Language Resources and Evaluation Conference (LREC), pages 4565–4572, Marseille, France. European Language Resources Association. [ACL Anthology]
矢田 竣太郎, 田中 リベカ, Fei Cheng, 荒牧 英治, 黒橋 禎夫. 2022. 汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定:重篤肺疾患ドメインに着目して. 自然言語処理, 29(4), pp. 1165–1197. [J-STAGE]
Note that our "PRISM" acronym (i.e. Problem-oriented, Real-time, Informatics-based, Structured, Medical record) dereived from a research funding scheme, called PRISM (Public/Private R&D Investment Strategic Expansion PrograM), by which our research project above was originally supported.
The TANL inline annotation format used by this tool is adapted from:
Giovanni Paolini, Ben Athiwaratkun, Jason Krone, Jie Ma, Alessandro Achille, Rishita Anubhai, Cicero Nogueira dos Santos, Bing Xiang, and Stefano Soatto. 2021. Structured Prediction as Translation between Augmented Natural Languages. In Proceedings of the Ninth International Conference on Learning Representations (ICLR). [OpenReview]
Any OpenAI-compatible API endpoint works. Configure in config.yaml:
model:
model_id: "anthropic/claude-sonnet-4" # OpenRouter
base_url: "https://openrouter.ai/api/v1"
api_key_env: "OPENROUTER_API_KEY"model:
model_id: "gpt-4o" # OpenAI direct
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY"If you use this tool, please cite the original PRISM annotation works:
@inproceedings{yada-etal-2020-towards,
title = "Towards a Versatile Medical-Annotation Guideline Feasible Without Heavy Medical Knowledge: Starting From Critical Lung Diseases",
author = "Yada, Shuntaro and Joh, Ayami and Tanaka, Ribeka and Cheng, Fei and Aramaki, Eiji and Kurohashi, Sadao",
booktitle = "Proceedings of the Twelfth Language Resources and Evaluation Conference",
month = may,
year = "2020",
address = "Marseille, France",
publisher = "European Language Resources Association",
url = "https://aclanthology.org/2020.lrec-1.561/",
pages = "4565--4572",
isbn = "979-10-95546-34-4",
}
@article{yada-etal-2022-prism,
title = "汎用的な臨床医学テキストアノテーション仕様およびガイドラインの策定:重篤肺疾患ドメインに着目して",
author = "矢田, 竣太郎 and 田中, リベカ and Cheng, Fei and 荒牧, 英治 and 黒橋, 禎夫",
journal = "自然言語処理",
volume = "29",
number = "4",
pages = "1165--1197",
year = "2022",
url = "https://www.jstage.jst.go.jp/article/jnlp/29/4/29_1165/_article/-char/ja/",
}See also CITATION.cff for machine-readable citation metadata.
- Fix: viewer entity highlights now align to the original source text instead of using TANL-stripped offsets, which caused the first character(s) of entities to be excluded or misaligned.
- Initial release.
MIT