diff --git a/README.md b/README.md index 149a3f9..0ac43c9 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,20 @@ You can download the latest SQLite release of the [China Biographical Database]( Check [**latest.json**](https://github.com/cbdb-project/cbdb_sqlite/blob/master/latest.json) for the current release date, filename, SHA-256 checksum, and direct download URL. +## Post-processing (optional) + +The raw database export does not include convenience views or the denormalised `ADDRESSES` table. +Use the scripts in [`scripts/`](./scripts/) to add them, or run the one-click Colab notebook: + +| What you want | How to get it | +|---------------|---------------| +| Everything in one click | Open [`scripts/setup_cbdb.ipynb`](./scripts/setup_cbdb.ipynb) in Google Colab | +| Foreign key constraints | `python scripts/add_foreign_keys.py --db latest.db` | +| 18 convenience views | `bash scripts/create_views.sh latest.db` | +| `ADDRESSES` hierarchy table | `python scripts/create_addresses_table.py --db latest.db` | + +See [`scripts/README.md`](./scripts/README.md) for full documentation. + ## Data Limitations * The ZZZ releases are now deprecated in favor of views. Use [`create_views.sh`](./scripts/create_views.sh) to create views in the SQLite file. diff --git a/scripts/README.md b/scripts/README.md index 985b15c..c883027 100644 --- a/scripts/README.md +++ b/scripts/README.md @@ -2,26 +2,76 @@ [中文文档](./README.zh.md) -This directory contains the helper scripts used to download, normalise, and compare CBDB SQLite releases. The scripts themselves live in the project root so they can be executed directly without adjusting `PATH`; refer to their relative locations when running the commands below. +This directory contains helper scripts for downloading, post-processing, and comparing CBDB SQLite releases. ## Available Scripts -- `process_cbdb_dbs.sh`: end-to-end workflow that downloads the latest and historical SQLite dumps, unpacks them, applies the normalisation helpers, vacuums the databases, and generates a schema/data summary comparison. -- `compare_db_tables.py`: compares two SQLite databases table-by-table, emitting a report of schema and data discrepancies. +### One-stop notebook + +- **`setup_cbdb.ipynb`** — Google Colab notebook that runs the full setup pipeline in one click: + downloads the latest database, adds foreign keys, creates views, and builds the `ADDRESSES` table. + Upload to [Google Colab](https://colab.research.google.com/) and click **Runtime → Run all**. + Each step can be toggled on or off via boolean flags in the *Configuration* cell. + +### Individual scripts + +| Script | Description | +|--------|-------------| +| `add_foreign_keys.py` | Fetches `foreign_keys_regen.csv` from GitHub and recreates SQLite tables with proper `FOREIGN KEY` constraints. Skips tables that already have FK constraints (idempotent). | +| `create_views.sh` | Creates 18 convenience SQL views (e.g. `View_PeopleData`, `View_EntryData`, `View_PostingOfficeData`). | +| `create_addresses_table.py` | Builds the `ADDRESSES` table by resolving the full administrative hierarchy for each address across time, preserving gaps in the data. | +| `compare_db_tables.py` | Compares two SQLite databases table-by-table, emitting row-count and schema discrepancies. | +| `process_cbdb_dbs.sh` | End-to-end workflow: downloads the latest and a historical SQLite dump, unpacks them, vacuums both, and runs `compare_db_tables.py`. | ## Prerequisites -The scripts expect the following command line tools: +### For the Colab notebook (`setup_cbdb.ipynb`) + +No local installation needed — just upload to Google Colab. + +### For running scripts locally + +| Tool | Required by | +|------|-------------| +| `python3` | `add_foreign_keys.py`, `create_addresses_table.py`, `compare_db_tables.py` | +| `sqlite3` CLI | `create_views.sh` | +| `bash` | `create_views.sh`, `process_cbdb_dbs.sh` | +| `wget`, `7z` | `process_cbdb_dbs.sh` | + +`process_cbdb_dbs.sh` checks for missing tools at startup and exits early if any are absent. + +## Usage + +### Add foreign keys + +```bash +python scripts/add_foreign_keys.py --db latest.db +``` + +Pass `--csv-url URL` to use a different branch of `foreign_keys_regen.csv`. + +### Create views + +```bash +bash scripts/create_views.sh latest.db +``` + +### Build ADDRESSES table + +```bash +python scripts/create_addresses_table.py --db latest.db +``` + +### Compare two releases -- `wget` -- `7z` -- `sqlite3` -- `python3` +```bash +python scripts/compare_db_tables.py old.db new.db +``` -Install the tools before running the scripts. `process_cbdb_dbs.sh` will perform a sanity check and exit early if any are missing. +### Download and compare historical releases -## Usage Notes +```bash +bash scripts/process_cbdb_dbs.sh +``` -- Run the shell script from the repository root: `./process_cbdb_dbs.sh`. -- Both Python utilities accept `--help` for detailed argument listings. -- Intermediate downloads are written to a temporary directory and cleaned up automatically; resulting databases are created alongside the scripts. +Intermediate downloads are written to a temporary directory and cleaned up automatically. diff --git a/scripts/README.zh.md b/scripts/README.zh.md index eb5cd7a..4577387 100644 --- a/scripts/README.zh.md +++ b/scripts/README.zh.md @@ -1,25 +1,74 @@ # CBDB 脚本说明 -此目录包含用于 CBDB 项目的辅助脚本,脚本本体保留在仓库根目录,便于直接执行。运行时请根据下述说明使用相对路径调用对应文件。 +此目录包含用于下载、后处理及比较 CBDB SQLite 发布版本的辅助脚本。 ## 脚本一览 -- `process_cbdb_dbs.sh`:完整流程脚本,负责下载最新与历史版 SQLite 数据库、解压、运行规范化工具、执行 `VACUUM`,并生成数据库差异报告。 -- `compare_db_tables.py`:逐表对比两个 SQLite 数据库的结构与数据,输出差异摘要。 +### 一键 Notebook + +- **`setup_cbdb.ipynb`** — Google Colab Notebook,一键完成完整配置流程:下载最新数据库、添加外键、创建视图、生成 `ADDRESSES` 表。 + 上传至 [Google Colab](https://colab.research.google.com/) 后点击 **Runtime → Run all** 即可运行。 + 每个步骤均可在 *Configuration* 单元格中通过布尔变量单独开关。 + +### 独立脚本 + +| 脚本 | 说明 | +|------|------| +| `add_foreign_keys.py` | 从 GitHub 读取 `foreign_keys_regen.csv`,将缺少外键的 SQLite 表重建并补充 `FOREIGN KEY` 约束。已有外键的表会自动跳过(幂等操作)。 | +| `create_views.sh` | 创建 18 个便于查询的 SQL 视图(如 `View_PeopleData`、`View_EntryData`、`View_PostingOfficeData` 等)。 | +| `create_addresses_table.py` | 通过解析地址在各时间段内的行政区划层级关系,构建 `ADDRESSES` 表,并保留数据中的空缺时段。 | +| `compare_db_tables.py` | 逐表对比两个 SQLite 数据库的行数与结构,输出差异摘要。 | +| `process_cbdb_dbs.sh` | 完整流程脚本:下载最新版和某一历史版 SQLite 数据库,解压后执行 `VACUUM`,并调用 `compare_db_tables.py` 生成对比报告。 | ## 运行前提 -请确认已安装以下命令行工具: +### Colab Notebook(`setup_cbdb.ipynb`) + +无需本地安装,直接上传至 Google Colab 使用。 + +### 本地运行脚本 + +| 工具 | 所需脚本 | +|------|----------| +| `python3` | `add_foreign_keys.py`、`create_addresses_table.py`、`compare_db_tables.py` | +| `sqlite3` CLI | `create_views.sh` | +| `bash` | `create_views.sh`、`process_cbdb_dbs.sh` | +| `wget`、`7z` | `process_cbdb_dbs.sh` | + +`process_cbdb_dbs.sh` 启动时会检查依赖,缺少工具时会直接报错退出。 + +## 使用方法 + +### 添加外键 + +```bash +python scripts/add_foreign_keys.py --db latest.db +``` + +可通过 `--csv-url URL` 指定其他分支的 `foreign_keys_regen.csv`。 + +### 创建视图 + +```bash +bash scripts/create_views.sh latest.db +``` + +### 生成 ADDRESSES 表 + +```bash +python scripts/create_addresses_table.py --db latest.db +``` + +### 比较两个发布版本 -- `wget` -- `7z` -- `sqlite3` -- `python3` +```bash +python scripts/compare_db_tables.py old.db new.db +``` -`process_cbdb_dbs.sh` 会在启动时检查依赖,缺失工具时会直接报错退出。 +### 下载历史版本并对比 -## 使用提示 +```bash +bash scripts/process_cbdb_dbs.sh +``` -- 从仓库根目录执行:`./process_cbdb_dbs.sh`。 -- 两个 Python 工具均可通过 `--help` 查看详细参数。 -- 脚本会创建临时目录存放下载文件并在结束时清理,生成的数据库位于脚本所在目录。 +下载文件会写入临时目录,脚本结束后自动清理。 diff --git a/scripts/add_foreign_keys.py b/scripts/add_foreign_keys.py new file mode 100644 index 0000000..3b5d3fa --- /dev/null +++ b/scripts/add_foreign_keys.py @@ -0,0 +1,234 @@ +#!/usr/bin/env python3 +""" +Add foreign key constraints to a CBDB SQLite database based on foreign_keys_regen.csv. + +Reads the FK definitions from a CSV URL, groups them by table, and recreates +each affected table with proper FOREIGN KEY constraints appended to the schema. +Tables that already have FK constraints are skipped (idempotent). + +Usage: + python add_foreign_keys.py [--db DB_PATH] [--csv-url URL] +""" + +from __future__ import annotations + +import argparse +import csv +import io +import logging +import re +import sqlite3 +import urllib.request +from collections import defaultdict +from pathlib import Path +from typing import Dict, List, Optional, Tuple + +CSV_URL = ( + "https://raw.githubusercontent.com/cbdb-project/cbdb-user-mdb-tests" + "/main/reports/foreign_keys_regen.csv" +) + +logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s") +logger = logging.getLogger(__name__) + +# (child_col, parent_table, parent_col) +FKDef = Tuple[str, str, str] + + +def fetch_csv(url: str) -> str: + logger.info("Fetching FK definitions from %s", url) + with urllib.request.urlopen(url) as response: + return response.read().decode("utf-8-sig") # strip BOM if present + + +def parse_foreign_keys(csv_content: str) -> Dict[str, List[FKDef]]: + """ + Parse foreign_keys_regen.csv and return {table_name: [(col, ref_table, ref_col), ...]}. + + Table names are normalised to UPPER CASE. Duplicate (table, col, ref_table, ref_col) + combinations are deduplicated while preserving order. + """ + fk_map: Dict[str, List[FKDef]] = defaultdict(list) + seen: set = set() + + reader = csv.DictReader(io.StringIO(csv_content)) + for row in reader: + table = row["AccessTblNm"].strip().upper() + col = row["AccessFldNm"].strip() + ref_table = row["ForeignKey"].strip().upper() + ref_col = row["ForeignKeyBaseField"].strip() + + key = (table, col, ref_table, ref_col) + if key in seen: + continue + seen.add(key) + fk_map[table].append((col, ref_table, ref_col)) + + return dict(fk_map) + + +def _has_foreign_keys(conn: sqlite3.Connection, table: str) -> bool: + return bool(conn.execute(f'PRAGMA foreign_key_list("{table}")').fetchall()) + + +def _get_create_sql(conn: sqlite3.Connection, table: str) -> Optional[str]: + row = conn.execute( + "SELECT sql FROM sqlite_master WHERE type='table' AND name=?", (table,) + ).fetchone() + return row[0] if row else None + + +def _build_create_with_fks(create_sql: str, tmp_name: str, fk_defs: List[FKDef]) -> str: + """ + Append FOREIGN KEY constraints to an existing CREATE TABLE statement + and rename the table to tmp_name. + + Uses paren-depth tracking so nested expressions inside CHECK or DEFAULT + clauses do not confuse the outer closing-paren search. + """ + depth = 0 + close_pos = -1 + for i, ch in enumerate(create_sql): + if ch == "(": + depth += 1 + elif ch == ")": + depth -= 1 + if depth == 0: + close_pos = i + break + + if close_pos == -1: + raise ValueError("Could not locate closing paren in CREATE TABLE SQL.") + + body = create_sql[:close_pos].rstrip().rstrip(",") + tail = create_sql[close_pos + 1:] + + fk_clauses = [] + for col, ref_table, ref_col in fk_defs: + fk_clauses.append( + f' FOREIGN KEY ("{col}") REFERENCES "{ref_table}" ("{ref_col}")' + ) + + new_sql = body + ",\n" + ",\n".join(fk_clauses) + "\n)" + tail + + # Replace the table name; handle double-quoted, backtick, bracket, or bare identifiers. + new_sql = re.sub( + r'(?i)(CREATE\s+TABLE\s+)(?:"[^"]*"|`[^`]*`|\[[^\]]*\]|\S+)', + rf'\1"{tmp_name}"', + new_sql, + count=1, + ) + return new_sql + + +def _recreate_with_fks( + conn: sqlite3.Connection, table: str, fk_defs: List[FKDef] +) -> bool: + """ + Recreate *table* with FOREIGN KEY constraints appended. Returns True on success. + Foreign-key enforcement is disabled for the duration of the operation. + """ + create_sql = _get_create_sql(conn, table) + if not create_sql: + logger.warning(" %s: not found in sqlite_master, skipping.", table) + return False + + if "VIRTUAL" in create_sql.upper(): + logger.info(" %s: virtual table, skipping.", table) + return False + + tmp = f"_fk_rebuild_{table}" + try: + new_create = _build_create_with_fks(create_sql, tmp, fk_defs) + except ValueError as exc: + logger.error(" %s: could not build new CREATE TABLE — %s", table, exc) + return False + + col_list = ", ".join( + f'"{row[1]}"' + for row in conn.execute(f'PRAGMA table_info("{table}")').fetchall() + ) + + conn.execute("PRAGMA foreign_keys = OFF") + try: + conn.execute(f'DROP TABLE IF EXISTS "{tmp}"') + conn.execute(new_create) + conn.execute(f'INSERT INTO "{tmp}" SELECT {col_list} FROM "{table}"') + conn.execute(f'DROP TABLE "{table}"') + conn.execute(f'ALTER TABLE "{tmp}" RENAME TO "{table}"') + conn.commit() + fk_summary = ", ".join(f"{col}->{ref_t}.{ref_c}" for col, ref_t, ref_c in fk_defs) + logger.info(" ✓ %s (%d FKs: %s)", table, len(fk_defs), fk_summary) + return True + except Exception as exc: + conn.rollback() + try: + conn.execute(f'DROP TABLE IF EXISTS "{tmp}"') + except Exception: + pass + logger.error(" ✗ %s: %s", table, exc) + return False + finally: + conn.execute("PRAGMA foreign_keys = ON") + + +def add_foreign_keys(db_path: str | Path, csv_url: str = CSV_URL) -> None: + """ + Add FOREIGN KEY constraints to all applicable tables in *db_path* based on + foreign_keys_regen.csv. Tables that already have FK constraints are skipped. + """ + content = fetch_csv(csv_url) + fk_map = parse_foreign_keys(content) + logger.info("CSV parsed: FK definitions found for %d tables.", len(fk_map)) + + conn = sqlite3.connect(str(db_path)) + try: + # Build a case-insensitive lookup from uppercase name → actual DB name. + db_table_lookup: Dict[str, str] = { + row[0].upper(): row[0] + for row in conn.execute( + "SELECT name FROM sqlite_master WHERE type='table' AND name NOT LIKE 'sqlite_%'" + ).fetchall() + } + + updated = skipped = missing = 0 + for upper_name, fk_defs in fk_map.items(): + actual = db_table_lookup.get(upper_name) + if actual is None: + logger.debug(" %s: not in database, skipping.", upper_name) + missing += 1 + continue + if _has_foreign_keys(conn, actual): + skipped += 1 + continue + if _recreate_with_fks(conn, actual, fk_defs): + updated += 1 + + logger.info( + "Finished: %d tables updated, %d already had FKs, %d not in database.", + updated, + skipped, + missing, + ) + finally: + conn.close() + + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Add foreign key constraints to a CBDB SQLite database." + ) + parser.add_argument( + "--db", + default="latest.db", + type=Path, + help="Path to the SQLite database (default: latest.db).", + ) + parser.add_argument( + "--csv-url", + default=CSV_URL, + metavar="URL", + help="URL of foreign_keys_regen.csv (default: main branch on GitHub).", + ) + args = parser.parse_args() + add_foreign_keys(args.db, args.csv_url) diff --git a/scripts/setup_cbdb.ipynb b/scripts/setup_cbdb.ipynb new file mode 100644 index 0000000..2faaff7 --- /dev/null +++ b/scripts/setup_cbdb.ipynb @@ -0,0 +1,242 @@ +{ + "nbformat": 4, + "nbformat_minor": 5, + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.10.0" + }, + "colab": { + "provenance": [] + } + }, + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-title", + "source": [ + "# CBDB SQLite Setup\n", + "\n", + "This notebook downloads the latest CBDB SQLite database and applies optional post-processing steps:\n", + "\n", + "| Step | What it does |\n", + "|------|--------------|\n", + "| **Add Foreign Keys** | Reads `foreign_keys_regen.csv` from GitHub and recreates tables with proper `FOREIGN KEY` constraints |\n", + "| **Create Views** | Creates 18 convenience SQL views (e.g. `View_PeopleData`, `View_EntryData`) |\n", + "| **Create Addresses** | Builds the `ADDRESSES` hierarchy table from `ADDR_CODES` / `ADDR_BELONGS_DATA` |\n", + "\n", + "**Usage:** Edit the *Configuration* cell below, then click **Runtime → Run all**." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-setup", + "source": [ + "## 0 · Setup\n", + "Clone the CBDB SQLite repository to make the helper scripts available." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "id": "cell-setup", + "outputs": [], + "source": "import os, sys, subprocess\n\nREPO_URL = \"https://github.com/cbdb-project/cbdb_sqlite.git\"\nREPO_DIR = \"/content/cbdb_sqlite\"\n\nif not os.path.exists(REPO_DIR):\n subprocess.run([\"git\", \"clone\", \"--depth\", \"1\", REPO_URL, REPO_DIR], check=True)\nelse:\n subprocess.run([\"git\", \"-C\", REPO_DIR, \"pull\", \"--ff-only\"], check=True)\n\nSCRIPTS_DIR = os.path.join(REPO_DIR, \"scripts\")\nif SCRIPTS_DIR not in sys.path:\n sys.path.insert(0, SCRIPTS_DIR)\n\nprint(f\"Repository ready at {REPO_DIR}\")" + }, + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-config", + "source": [ + "## 1 · Configuration\n", + "Set each flag to `True` or `False` to control which steps run." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "id": "cell-config", + "outputs": [], + "source": "# ── What to run ────────────────────────────────────────────────────────────────\nADD_FOREIGN_KEYS = True # add FK constraints from foreign_keys_regen.csv\nCREATE_VIEWS = True # create 18 convenience SQL views\nCREATE_ADDRESSES = True # build ADDRESSES hierarchy table\nDOWNLOAD_RESULT = True # download the processed .db file (Colab only)\n\n# ── Output path ────────────────────────────────────────────────────────────────\nDB_PATH = \"/content/cbdb_latest.db\"\n\n# ── Source URLs (no need to change) ────────────────────────────────────────────\nLATEST_JSON_URL = (\n \"https://raw.githubusercontent.com/cbdb-project/cbdb_sqlite/master/latest.json\"\n)\nFK_CSV_URL = (\n \"https://raw.githubusercontent.com/cbdb-project/cbdb-user-mdb-tests\"\n \"/main/reports/foreign_keys_regen.csv\"\n)" + }, + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-download", + "source": [ + "## 2 · Download & Extract Database" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "id": "cell-download", + "outputs": [], + "source": [ + "import json, urllib.request, zipfile, shutil, os\n", + "\n", + "print(\"Fetching latest.json...\")\n", + "with urllib.request.urlopen(LATEST_JSON_URL) as r:\n", + " info = json.loads(r.read().decode())\n", + "\n", + "print(f\" File : {info['sqlite_filename']}\")\n", + "print(f\" Generated : {info['generated_at_utc']}\")\n", + "print(f\" URL : {info['download_url']}\")\n", + "\n", + "if os.path.exists(DB_PATH):\n", + " print(f\"\\nDatabase already exists at {DB_PATH}. Delete it to re-download.\")\n", + "else:\n", + " zip_path = \"/content/_cbdb_download.zip\"\n", + " print(\"\\nDownloading...\")\n", + "\n", + " def _progress(count, block, total):\n", + " if total > 0:\n", + " pct = min(count * block * 100 / total, 100)\n", + " print(f\"\\r {pct:5.1f}%\", end=\"\", flush=True)\n", + "\n", + " urllib.request.urlretrieve(info[\"download_url\"], zip_path, _progress)\n", + " print() # newline after progress bar\n", + "\n", + " print(\"Extracting...\")\n", + " with zipfile.ZipFile(zip_path, \"r\") as z:\n", + " db_members = [\n", + " m for m in z.namelist()\n", + " if m.lower().endswith((\".db\", \".sqlite3\", \".sqlite\"))\n", + " ]\n", + " if not db_members:\n", + " raise FileNotFoundError(\"No database file found in the downloaded archive.\")\n", + " z.extract(db_members[0], \"/content/\")\n", + " shutil.move(f\"/content/{db_members[0]}\", DB_PATH)\n", + " os.remove(zip_path)\n", + "\n", + " size_mb = os.path.getsize(DB_PATH) / 1024 / 1024\n", + " print(f\"Saved to {DB_PATH} ({size_mb:.1f} MB)\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-fk", + "source": [ + "## 3 · Add Foreign Keys\n", + "Fetches `foreign_keys_regen.csv` from GitHub, parses the FK definitions,\n", + "and recreates each table that is missing its `FOREIGN KEY` constraints.\n", + "Tables that already have FK constraints are skipped (idempotent)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "id": "cell-fk", + "outputs": [], + "source": [ + "if ADD_FOREIGN_KEYS:\n", + " import add_foreign_keys\n", + " add_foreign_keys.add_foreign_keys(DB_PATH, FK_CSV_URL)\n", + "else:\n", + " print(\"Skipped (ADD_FOREIGN_KEYS = False).\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-views", + "source": [ + "## 4 · Create Views\n", + "Runs `create_views.sh` to add 18 convenience SQL views such as\n", + "`View_PeopleData`, `View_EntryData`, `View_PostingOfficeData`, etc." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "id": "cell-views", + "outputs": [], + "source": [ + "if CREATE_VIEWS:\n", + " import subprocess, os\n", + " script = os.path.join(REPO_DIR, \"scripts\", \"create_views.sh\")\n", + " result = subprocess.run(\n", + " [\"bash\", script, DB_PATH],\n", + " capture_output=True, text=True\n", + " )\n", + " if result.returncode != 0:\n", + " print(result.stderr)\n", + " raise RuntimeError(\"create_views.sh failed — see output above.\")\n", + " lines = result.stdout.splitlines()\n", + " created = sum(1 for l in lines if l.startswith(\"Creating view\"))\n", + " print(f\"Created {created} views.\")\n", + " for line in lines:\n", + " if any(kw in line for kw in (\"ERROR\", \"✗\", \"All sanity checks\")):\n", + " print(line)\n", + "else:\n", + " print(\"Skipped (CREATE_VIEWS = False).\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-addresses", + "source": [ + "## 5 · Create Addresses Table\n", + "Builds the `ADDRESSES` table by resolving the full administrative hierarchy\n", + "for each address across time, preserving gaps in the data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "id": "cell-addresses", + "outputs": [], + "source": [ + "if CREATE_ADDRESSES:\n", + " from create_addresses_table import AddressHierarchyBuilder\n", + " with AddressHierarchyBuilder(DB_PATH) as builder:\n", + " builder.run()\n", + "else:\n", + " print(\"Skipped (CREATE_ADDRESSES = False).\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "id": "md-download-result", + "source": [ + "## 6 · Download Result\n", + "Triggers a browser download of the processed database.\n", + "Only works inside Google Colab; outside Colab the file path is printed instead." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "id": "cell-download-result", + "outputs": [], + "source": [ + "if DOWNLOAD_RESULT:\n", + " try:\n", + " from google.colab import files\n", + " print(f\"Starting download of {DB_PATH}...\")\n", + " files.download(DB_PATH)\n", + " except ImportError:\n", + " print(f\"Not running in Colab. Your database is at: {DB_PATH}\")\n", + "else:\n", + " print(\"Skipped (DOWNLOAD_RESULT = False).\")" + ] + } + ] +} \ No newline at end of file