From 466a88e75173ff77a460eeea224776c81d580aa5 Mon Sep 17 00:00:00 2001 From: "Sung Yun (CODE SIGNING KEY)" Date: Thu, 28 May 2026 14:41:00 -0400 Subject: [PATCH] add security model --- AGENTS.md | 30 ++++++ SECURITY-THREAT-MODEL.md | 228 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 258 insertions(+) create mode 100644 AGENTS.md create mode 100644 SECURITY-THREAT-MODEL.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000000..2606a579b9 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,30 @@ + + +# Apache Iceberg Python — Agent Instructions + +This file provides repository-specific guidance for automated agents working +in this repository. + +## Security Model + +When assessing potential vulnerabilities or calibrating automated security +findings, use [`SECURITY-THREAT-MODEL.md`](SECURITY-THREAT-MODEL.md) as the +authoritative detailed description of this repository's security boundaries, +trust assumptions, and non-boundaries. diff --git a/SECURITY-THREAT-MODEL.md b/SECURITY-THREAT-MODEL.md new file mode 100644 index 0000000000..ac4024d955 --- /dev/null +++ b/SECURITY-THREAT-MODEL.md @@ -0,0 +1,228 @@ + + +# Apache Iceberg Python Security Threat Model + +This document describes the detailed security threat model for Apache +Iceberg Python. It is intended for maintainers and automated security triage. + +## Purpose + +Apache Iceberg Python is primarily a client library and implementation of the +Iceberg table format and catalog interactions for Python applications and +services. It is commonly embedded in larger systems that provide their own +authentication, authorization, and credential management. Because of that +deployment model, many bug classes that look security-relevant in the abstract +are not actually security vulnerabilities in Iceberg Python itself. + +This model is intended to answer: + +- what Iceberg Python generally treats as a security vulnerability +- what Iceberg Python generally treats as correctness, hardening, or + deployment work +- which boundaries are primarily owned by Iceberg Python versus the + surrounding catalog, application, or service +- which issue classes should be downgraded by default by scanners + +## Scope + +This model is scoped to the Apache Iceberg Python repository itself: + +- table format and metadata handling +- catalog and REST catalog clients +- transport, credential, and configuration handling implemented in this repo +- helper tooling shipped in this repo + +It is not a general threat model for every Python service that embeds Iceberg +Python. + +In particular, it does not attempt to define the complete security model for: + +- applications or services that embed Iceberg Python +- storage-level authorization enforced outside Iceberg Python + +## Security Goals + +Iceberg Python should: + +- avoid exposing secrets or delegated credentials to principals that were not + already trusted with them +- avoid creating new unauthorized capabilities in Iceberg Python-owned + components +- avoid violating trust boundaries that Iceberg Python itself owns, such as + leaking auth, transport, or credential-bearing state across catalog or + client boundaries in the same process + +Iceberg Python does not aim to be the primary enforcement point for: + +- user-to-user authorization inside the embedding application +- storage-level authorization +- service-side credential scoping performed by an external catalog + +## Roles + +### Operator + +The operator configures the surrounding catalog, application, service, and +storage integration around Iceberg Python. This role is trusted to choose +endpoints, warehouses, storage integrations, and credentials. + +### Catalog control plane + +The catalog control plane resolves tables and supplies metadata, locations, +configuration, and delegated credentials to Iceberg Python. It may be +implemented by a REST catalog server or another catalog implementation. +Iceberg Python assumes this control plane is trusted and outside its primary +security boundary. + +### REST catalog client + +The REST catalog client consumes catalog-provided metadata, configuration, and +credentials. Client-side bugs in routing, caching, or reuse may still be +security-relevant if they leak credential-bearing state across boundaries that +the Iceberg Python client is expected to preserve. + +### Embedding application + +Applications and services embedding Iceberg Python are responsible for their +own user-facing authorization boundaries unless Iceberg Python explicitly +documents otherwise. + +### Table writer or maintainer + +This role may already have legitimate power to write or replace table +metadata, write or delete files, choose paths under an allowed warehouse or +table location, and invoke destructive maintenance operations. If a report +only shows a new way to achieve the same effect this role can already cause +legitimately, it is usually not a security issue in Iceberg Python. + +## Trust Boundaries + +### Boundary 1: operator-trusted configuration + +The following are generally treated as trusted operator or deployment inputs: + +- catalog properties +- endpoint configuration +- warehouse and storage roots +- transport wiring and credential configuration + +If a report depends on the attacker controlling those values directly, it is +usually not a vulnerability in Iceberg Python itself. + +### Boundary 2: catalog-supplied metadata + +Iceberg Python often accepts metadata locations, table properties, namespace +properties, and related control-plane information from a catalog. By default, +Iceberg Python treats those sources as trusted. + +This means a malicious catalog supplying incorrect or malicious metadata is +usually not an Iceberg Python vulnerability by itself. + +### Boundary 3: REST catalog-supplied configuration and delegated storage access + +In REST deployments, Iceberg Python may also accept service endpoints, +configuration, and delegated storage access from the REST catalog server. By +default, those are treated as trusted control-plane inputs unless Iceberg +Python explicitly documents a stronger guarantee. + +This means a malicious REST catalog server sending dangerous endpoints is +usually not an Iceberg Python vulnerability by itself. It also means many +credential-selection bugs are often correctness or specification issues rather +than security boundary failures. + +The major exception is secret exposure. If Iceberg Python surfaces credentials +or secrets to a new audience that was not already trusted with them, that is +security-relevant. + +### Boundary 4: storage-level authorization + +Object store permissions are enforced by the storage provider and the +credentials the surrounding deployment chooses to hand to Iceberg Python. +Iceberg Python is not the root authority for bucket- or object-level +authorization. + +## In-Scope Security Vulnerabilities + +The following categories are generally security-relevant in Iceberg Python +when the report is credible and reproducible. + +### 1. Secret or credential disclosure to a new audience + +Examples include: + +- catalog or storage credentials exposed through a user-visible surface +- one catalog's credentials or auth state leaking into another catalog or + client + +### 2. Iceberg Python-owned trust-boundary violations + +Security issues exist when Iceberg Python itself is expected to separate +catalogs, clients, or principals and fails to do so. + +Examples include: + +- process-global auth or transport state crossing catalog instances +- secret-bearing state from one principal reused for another principal within + an Iceberg Python-owned boundary + +## Usually Out of Scope or Non-Security by Default + +These categories may still be real bugs worth fixing, but they are not usually +security vulnerabilities in Iceberg Python itself. + +### 1. Correctness bugs + +Examples include incorrect metadata handling, ambiguous matching semantics, +and logic bugs that do not create a new trust-boundary violation. + +### 2. Parser hardening and malformed-input robustness + +Malformed-input crashes, runtime exceptions, and memory amplification are +usually treated as robustness or hardening work rather than security issues in +Iceberg Python itself. + +### 3. Malicious catalog or external service scenarios + +Reports that require a malicious catalog or other external control-plane +service are usually outside Iceberg Python's primary security boundary. + +### 4. Equivalent-harm reports + +If the actor already has a legitimate capability that can cause the same harm, +the new path is usually not a security issue. + +## Scanner Calibration Rules + +A scanner targeting Iceberg Python should treat a finding as higher-confidence +only if it plausibly shows one of the following: + +- exposure of a secret or delegated credential to a new audience +- creation of a new unauthorized capability in an Iceberg Python-owned + component +- violation of an Iceberg Python-owned trust boundary rather than a + surrounding catalog, application, service, or operator boundary + +A finding should be downgraded or rejected by default if it instead depends +primarily on: + +- malformed-input robustness or denial-of-service behavior +- a malicious catalog or external service +- a principal that already has equivalent power through legitimate write or + maintenance capabilities