This folder contains the code used to preprocess clinical notes and extract structured information using Azure OpenAI.
main.py: Lightweight entrypoint for the note extraction workflow.client.py: Azure OpenAI client setup and authentication helper.templates.py: JSON-like extraction templates used byinfo_extractn.utils.py: Data cleanup and post-processing utility functions.preprocessing-notes.ipynb: Notebook workflow that imports the extraction utilities and processes notes.
main.py- Exposes
info_extractn()for calling the OpenAI extraction pipeline. - Re-exports the shared helpers and templates.
- Exposes
client.py- Configures
openai.AzureOpenAIwith Azure credentials and endpoint.
- Configures
templates.py- Stores
templateandtemplate_biomarkerdefinitions used for JSON extraction.
- Stores
utils.py- Provides note formatting and cleaning helpers:
format_notesextract_suvr_valuesreplace_missingcheck_qc
- Provides note formatting and cleaning helpers:
- Open
preprocessing-notes.ipynb. - Import the extraction functions from
main.py. - Load the raw notes CSV and call
format_notes(). - Run
info_extractn(text, template)orinfo_extractn(text, template_biomarker).
- The project uses Azure OpenAI with
DefaultAzureCredential(). templates.pycontains the extraction schema definitions.utils.pyincludes final data normalization helpers for the extracted DataFrame.
- Python files have been syntax-checked with
python3 -m py_compile.