Skip to content

tjirab/tff

Repository files navigation

TFF: Transformation Fitness Functions

PyPI version Python versions

Configurable fitness functions engine and linter for transformation projects.

TFF allows you to enforce architectural layout boundaries, layer structure policies, schema contracts, and code formatting rules across data pipelines. It ships with dedicated plugins for SQLMesh and dbt and outputs clean, color-coded lint reports to the terminal.

20260629_tff-health
More screenshots

tff lint

20260629_tff-lint

tff info

20260629_tff-info

CTE fingerprinting demo

20260630_cte-fingerprinting

Documentation

Setup and usage details differ depending on your pipeline engine. Refer to the corresponding guide:


Quick Installation

Install the adapter matching your pipeline tool:

πŸ“ For SQLMesh projects:

# With uv:
uv add tff-sqlmesh

# Or pip:
pip install tff-sqlmesh

⚑ For dbt projects:

# With uv:
uv add tff-dbt

# Or pip:
pip install tff-dbt

CLI Usage Guide

Once installed, use the unified tff CLI to run linting and health checks.

tff [command] [options]

Subcommands

  • lint: Run all enabled fitness checks and format lint reports.
  • health: Calculate and report overall project fitness health scores.
  • info: Show diagnostic information about the project environment, configuration files, and adapter versions.
  • help: Print help information for the CLI or specific subcommands.

Common Options

For detailed option explanations, run tff help <command> or tff <command> --help.

tff lint

  • --project PATH: Path to the project root directory (default: current directory).
  • --config PATH: Path to fitness_functions.yaml relative to project root (default: fitness_functions.yaml).
  • --provider {auto,dbt,sqlmesh}: Pipeline engine provider (default: auto-detected).
  • --checks CHECKS: Comma-separated list of specific checks to run (default: all enabled).
  • --fail-level {error,warning}: Exit non-zero when findings at or above this severity exist (default: error).
  • --group-by {connascence,model}: How to group violations in the report (default: model).
  • --dialect DIALECT: SQL dialect of models (dbt only; auto-inferred by default).

tff health

  • --project PATH, --config PATH, --provider {auto,dbt,sqlmesh}, --dialect DIALECT: (Same as above)
  • --fail-under SCORE: Exit non-zero when overall health score (0.0 - 100.0) is below this threshold (default: 0.0).
  • --scope PATH_PREFIX [...]: Restrict the health report to models whose path starts with one of the given prefixes (e.g. models/sources or models/marts/marketing). Multiple prefixes can be provided.
  • --group-by {connascence,domain}: How to group the detailed health breakdown. connascence (default) groups by connascence category; domain groups by path segment under models/ (e.g. models/sources, models/marts/marketing).

tff info

  • --project PATH: Path to the project root directory (default: current directory).
  • --config PATH: Path to fitness_functions.yaml relative to project root (default: fitness_functions.yaml).
  • --provider {auto,dbt,sqlmesh}: Pipeline engine provider (default: auto-detected).

Quick Start Examples

Run linting on the current project:

tff lint

Show project health report and require a score of at least 80% to pass:

tff health --fail-under 80

Show health scores only for the models/marts/marketing domain:

tff health --scope models/marts/marketing

Group health breakdown by domain instead of connascence category:

tff health --group-by domain

Combine domain scoping and grouping:

tff health --scope models/marts --group-by domain

Show configuration, adapter versions, and provider files for the current project:

tff info

Get detailed help for the lint subcommand:

tff help lint
# or
tff lint --help

Core Features

TFF runs two categories of quality guardrails (for full configuration details, see the Rules & Checks Reference):

1. Architectural Checks

  • Layer integrity: Prevent models in upstream layers (e.g. marts) from depending on downstream/raw layers.
  • Custom exclusions: Enforce custom domain isolation boundaries (e.g., prevent marts/finance from depending on marts/marketing).
  • Schema contracts: Ensure matching structures between model schemas (e.g., source tables and target core columns).
  • Dependency graph: Track DAG metrics and fail if model fan-in or fan-out exceeds defined thresholds.
  • Materialization depth: Prevent deep nesting of views that degrades query performance.
  • Duplicate CTEs: Detect duplicate complex transformation logic in CTEs across different models (Connascence of Algorithm).

2. Linter Rules


Shared Configuration

All adapters use a shared fitness_functions.yaml config file located in the root of your project:

contract_groups_path: linter_contract_groups.json
exclusions_path: linter_exclusions.json

layers:
  order: [staging, core, marts]  # Configured bottom-to-top hierarchy

checks:
  layer_integrity: { enabled: true }
  custom_exclusions: { enabled: true }
  schema_contracts: { enabled: true }
  dependency_graph:
    enabled: true
    fan_out_warn: 15
    fan_out_fail: 25
    fan_in_warn: 10
  duplicate_ctes:
    enabled: true
    severity: warning
    min_ast_nodes: 12

rules:
  ban_select_star:
    enabled: true
  no_positional_group_by_or_order_by:
    enabled: true
  environment_agnostic_references:
    enabled: true
    banned_environments: [prod, dev, staging, uat, qa]
  classification_macros:
    enabled: true
    skip_layers: [staging]
    columns:
      product_type: "@product_type\\b"
  sql_complexity:
    enabled: true
    thresholds:
      decision_points: [15, 25]
      cte_count: [8, 12]
      join_count: [8, 12]
      line_count: [250, 400]
  mart_naming:
    enabled: true
    layer_name: marts
    rule: prefix_with_subdirectory
  column_names:
    enabled: true
    replacements:
      api_request: api_call
  column_types:
    enabled: true
    rules:
      - name: id_is_text
        pattern: "_id$"
        data_type: text
  metadata:
    owner: true
    description: true
    grain: true
    unique_values: true
    not_null: true
  filename_equals_modelname:
    enabled: true

Further Reading & Learning Resources

To learn more about the architectural concepts behind fitness functions and connascence, check out these resources:

  • Connascence.io β€” A guide to software coupling metrics (connascence of name, type, meaning, algorithm, etc.), which inspired the classification and structure of the linter report findings.
  • Evolutionary Architecture β€” The homepage for Building Evolutionary Architectures, which introduces the concept of architectural fitness functions to guide design changes over time.

About

Fitness functions for your data transformation project

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Contributors

Languages