diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
new file mode 100644
index 0000000..e3b8137
--- /dev/null
+++ b/.github/copilot-instructions.md
@@ -0,0 +1,184 @@
+# GitHub Copilot Instructions for geocoder
+
+## Repository Overview
+
+This is the **geocoder** container, part of the DeGAUSS (Decentralized Geomarker Assessment for Multi-Site Studies) project. It geocodes US street addresses to latitude/longitude coordinates using a custom Ruby-based geocoding library and SQLite database containing 2021 TIGER/Line Street Range Address files.
+
+## Architecture
+
+The project uses a multi-language architecture:
+
+- **R**: Primary entrypoint (`entrypoint.R`) for data processing, CSV I/O, and workflow orchestration
+- **Ruby**: Geocoding engine (`geocode.rb` + `lib/geocoder/us/`) using custom Geocoder::US gem
+- **Docker**: Containerization with rocker R base image
+- **SQLite**: Address database (`geocoder.db`) with spatial data
+
+### Key Components
+
+1. **entrypoint.R**: Main R script that:
+   - Reads CSV files with address column
+   - Cleans and validates addresses
+   - Calls Ruby geocoder for each address
+   - Filters results based on score/precision thresholds
+   - Outputs geocoded CSV with matched coordinates and metadata
+
+2. **geocode.rb**: Ruby wrapper that:
+   - Accepts address string as command-line argument
+   - Queries SQLite database via Geocoder::US gem
+   - Returns JSON results to stdout
+
+3. **lib/geocoder/us/**: Custom Ruby gem with:
+   - `database.rb`: SQLite database interface
+   - `address.rb`: Address parsing and normalization
+   - `metaphone.rb`: Phonetic matching for street names
+   - Other supporting modules
+
+## Code Style and Conventions
+
+### R Code
+- Use `dplyr` pipe syntax (`%>%`) for data transformations
+- Suppress library loading messages with `withr::with_message_sink("/dev/null", library(...))`
+- Use `cli::cli_alert_info()` for user-facing messages
+- Follow tidyverse style conventions
+- Use `mappp::mappp()` for parallel geocoding with caching
+- Always include `show_col_types = FALSE` when using `readr::read_csv()`
+
+### Ruby Code
+- Follow standard Ruby conventions (snake_case, 2-space indentation)
+- Use `require` for dependencies at top of files
+- Return results as JSON when interfacing with R
+- The Geocoder::US module uses a SQLite database connection that should be thread-safe
+
+### Address Handling
+- Input addresses must have a column named `address`
+- ZIP codes must be 5 digits (not ZIP+4)
+- Address cleaning removes special characters and normalizes spacing
+- Three types of "bad" addresses are flagged:
+  - `po_box`: PO Box addresses
+  - `cincy_inst_foster_addr`: Cincinnati institutional/foster addresses
+  - `non_address_text`: Blank, "foreign", "verify", or "unknown"
+
+## Development Workflow
+
+### Building the Container
+```bash
+docker build -t geocoder .
+```
+
+The Dockerfile:
+1. Starts from `rocker/r-ver:4.4.3` base image
+2. Installs system dependencies (SQLite, Ruby, build tools)
+3. Downloads geocoder database from S3
+4. Builds and installs Ruby Geocoder-US gem
+5. Installs R packages via renv
+6. Sets entrypoint to `/app/entrypoint.R`
+
+### Testing
+Run the container with test data:
+```bash
+docker run --rm -v "${PWD}/test":/tmp geocoder my_address_file.csv
+docker run --rm -v "${PWD}/test":/tmp geocoder my_address_file.csv 0.6
+docker run --rm -v "${PWD}/test":/tmp geocoder my_address_file.csv all
+```
+
+Test files are in the `test/` directory.
+
+### Making Changes
+
+When modifying:
+- **R code**: Update `entrypoint.R` and ensure renv dependencies are current
+- **Ruby code**: Update files in `lib/geocoder/us/` and rebuild gem with `make -f Makefile.ruby`
+- **Dependencies**: Update `renv.lock` for R packages, or `gemspec` for Ruby gems
+- **Docker**: Update `Dockerfile` and rebuild container
+
+## Geocoding Output
+
+The geocoder adds these columns:
+- `matched_street`, `matched_city`, `matched_state`, `matched_zip`: Matched address components
+- `precision`: Method of geocode (`range`, `street`, `intersection`, `zip`, `city`)
+- `score`: Match quality (0-1, higher is better)
+- `lat`, `lon`: Coordinates (NA for low-quality geocodes)
+- `geocode_result`: Summary (`geocoded`, `imprecise_geocode`, `po_box`, `cincy_inst_foster_addr`, `non_address_text`)
+
+### Quality Filtering
+- Default score threshold: 0.5
+- Imprecise geocodes (intersection/zip/city or low scores) return NA for coordinates
+- Use `all` argument to return all geocodes regardless of quality
+
+## Key Technical Concepts
+
+### Geocoding Flow
+1. Address cleaning and validation in R
+2. Parallel geocoding with caching (mappp package)
+3. Ruby subprocess calls for each address
+4. SQLite database queries with fuzzy matching
+5. Result ranking by precision and score
+6. Quality filtering based on threshold
+
+### Database Structure
+The SQLite database (`geocoder.db`) contains:
+- Street range address data from Census TIGER/Line files
+- Spatial geometry for coordinate calculation
+- Indexed by ZIP code for efficient querying
+
+### Score and Precision
+- **Score**: Text similarity between input and matched address (Levenshtein-based)
+- **Precision**: Geocoding method quality (range > street > intersection > zip > city)
+- Both factors determine if coordinates are returned
+
+## Dependencies Management
+
+### R Packages (renv)
+- Managed via `renv.lock`
+- Restored during Docker build: `renv::restore()`
+- Key dependencies: dplyr, readr, mappp, cli, dht (DeGAUSS helper tools)
+
+### Ruby Gems
+- Defined in `gemspec` file
+- Built during Docker build: `make -f Makefile.ruby install`
+- Key gems: sqlite3, json, Text
+
+### System Dependencies
+- SQLite3 with development headers
+- Ruby with build tools (flex, bison)
+- SSL/SSH libraries for R packages
+
+## Best Practices
+
+1. **Minimal changes**: This is a stable, production container - avoid unnecessary modifications
+2. **Test thoroughly**: Always test with sample CSV files after changes
+3. **Preserve compatibility**: Maintain backward compatibility with existing output format
+4. **Document changes**: Update README.md if user-facing behavior changes
+5. **Version carefully**: Follow semantic versioning for releases
+6. **Cache-friendly**: The geocoding uses caching - ensure changes don't break cache keys
+7. **Thread safety**: Ruby geocoder may be called in parallel - maintain thread safety
+
+## Common Tasks
+
+### Adding a new geocode result type
+1. Update classification logic in `entrypoint.R`
+2. Add new factor level to `geocode_result` column
+3. Update README.md to document new result type
+
+### Modifying address parsing
+1. Update `lib/geocoder/us/address.rb`
+2. Rebuild gem: `make -f Makefile.ruby install`
+3. Test with edge cases
+
+### Adjusting quality thresholds
+1. Modify filtering logic in `entrypoint.R` (lines 109-126)
+2. Consider impact on geocode success rate
+3. Update documentation if defaults change
+
+### Updating geocoder database
+1. Replace S3 URL in Dockerfile (line 15)
+2. Ensure database format compatibility
+3. Test thoroughly with real addresses
+
+## DeGAUSS Ecosystem
+
+This container is part of the DeGAUSS ecosystem:
+- Uses `dht` package (DeGAUSS helper tools) for common functions
+- Follows DeGAUSS naming conventions for output files
+- Integrates with other DeGAUSS geomarker containers
+- See https://degauss.org for ecosystem documentation
diff --git a/.gitignore b/.gitignore
index 410bf44..b95a070 100644
--- a/.gitignore
+++ b/.gitignore
@@ -20,3 +20,16 @@ test/d_for_geocoding.rds
 test/geocoding_cache/
 test/tmp*
 /.Rprofile
+
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.egg-info/
+dist/
+build/
+.pytest_cache/
+.coverage
+htmlcov/
diff --git a/PYTHON_MIGRATION.md b/PYTHON_MIGRATION.md
new file mode 100644
index 0000000..cdb0652
--- /dev/null
+++ b/PYTHON_MIGRATION.md
@@ -0,0 +1,149 @@
+# Python Migration - Geocoder
+
+This directory contains the initial Python implementation of the geocoder, migrating from the Ruby + R stack to a pure Python + DuckDB solution.
+
+## Status: 🚧 Work in Progress
+
+This is the initial scaffolding for the Python migration. The geocoding engine is not yet implemented.
+
+## Completed
+
+- ✅ Python package structure (`geocoder_us/`)
+- ✅ Constants module (`constants.py`) - ~1000 lines ported from Ruby
+  - Directional prefixes/suffixes (North, South, etc.)
+  - Street type qualifiers
+  - Prefix and suffix street types with canonical abbreviations
+  - US state and territory names
+- ✅ Preprocessing module (`preprocessing.py`) - Address cleaning and validation
+  - `clean_address()` - Normalize whitespace and special characters
+  - `address_is_po_box()` - Detect PO Box addresses
+  - `address_is_institutional()` - Flag institutional addresses
+  - `address_is_nonaddress()` - Detect placeholder text
+- ✅ Main entrypoint (`entrypoint.py`) - CLI interface
+  - Argument parsing (filename, score_threshold)
+  - CSV I/O with pandas
+  - Address preprocessing pipeline
+  - Output file naming (matches original format)
+  - Summary reporting with tabulate
+- ✅ Requirements file (`requirements.txt`) - Python dependencies
+- ✅ **Address parsing module (`address.py`)** - 330 lines
+  - Complete address component parser (number, street, city, state, ZIP)
+  - Street and city tokenization for fuzzy matching
+  - Abbreviation expansion
+  - PO Box and intersection detection
+- ✅ **Metaphone module (`metaphone.py`)** - 190 lines
+  - Phonetic encoding algorithm
+  - Similarity scoring
+  - Support for external metaphone libraries
+- ✅ **Database module (`database.py`)** - 210 lines
+  - DuckDB connection management
+  - Spatial and fuzzystrsim extension loading
+  - Thread-safe query execution
+  - Query method stubs (ready for schema)
+- ✅ **Integrated entrypoint (`entrypoint.py`)** - Updated
+  - Parallel geocoding with joblib (n_jobs=-1)
+  - Result caching with joblib Memory
+  - Address parsing integration
+  - Score threshold filtering
+  - Result classification (geocoded, imprecise_geocode, po_box, etc.)
+
+## TODO
+
+### Phase 1: Core Geocoding Engine
+- [x] `database.py` - DuckDB interface with spatial extension
+  - [x] Set up DuckDB connection
+  - [x] Load spatial extension (spatial + fuzzystrsim)
+  - [x] Thread-safe query execution
+  - [ ] Query street range data (pending database migration)
+  - [ ] Implement scoring logic (pending schema)
+- [x] `address.py` - Address parsing
+  - [x] Port regex patterns from Ruby
+  - [x] Parse street number, name, city, state, ZIP
+  - [x] Handle edge cases (PO boxes, intersections)
+  - [x] Street and city tokenization for matching
+- [x] `metaphone.py` - Phonetic matching
+  - [x] Implement metaphone algorithm
+  - [x] Phonetic similarity scoring
+  - [x] Support for external metaphone libraries
+
+### Phase 2: Database Migration
+- [ ] Convert SQLite database to DuckDB format
+- [ ] Migrate WKB geometries to DuckDB spatial types
+- [ ] Test database queries and performance
+- [ ] Add spatial indexes
+
+### Phase 3: Integration
+- [x] Implement parallel geocoding with joblib
+- [x] Add result caching (joblib Memory)
+- [x] Implement score/precision filtering
+- [x] Match output format exactly with Ruby version
+- [x] Update entrypoint.py to use Phase 1 modules
+- [ ] Full integration test with migrated database
+
+### Phase 4: Testing & Validation
+- [ ] Unit tests for all modules
+- [ ] Integration tests with test CSV file
+- [ ] Validate geocoding accuracy vs Ruby version
+- [ ] Performance benchmarking
+
+### Phase 5: Docker
+- [ ] Create new Dockerfile with Python base image
+- [ ] Remove Ruby and R dependencies
+- [ ] Test container build and execution
+- [ ] Update documentation
+
+## Architecture
+
+### Current (Ruby + R)
+```
+Docker → entrypoint.R → geocode.rb → Ruby Geocoder → SQLite (with C extensions)
+```
+
+### Target (Python)
+```
+Docker → entrypoint.py → geocoder_us/ → DuckDB (with spatial extension)
+```
+
+## Usage (when complete)
+
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Geocode addresses
+python entrypoint.py my_addresses.csv          # Default threshold 0.5
+python entrypoint.py my_addresses.csv 0.6      # Custom threshold
+python entrypoint.py my_addresses.csv all      # All results
+```
+
+## Testing Current Implementation
+
+The entrypoint can be run now but will return placeholder geocoding results:
+
+```bash
+python entrypoint.py test/my_address_file.csv
+```
+
+Output will show:
+- File reading and validation
+- Address preprocessing statistics
+- Placeholder geocoding message
+- Output file generation
+- Summary table (showing "not_implemented" status)
+
+## Development Notes
+
+- The `constants.py` module is a direct port of Ruby `constants.rb` (~670 lines)
+- The `TwoWayMap` class provides bidirectional lookup like Ruby's `Map` class
+- Address preprocessing functions match the logic from the `dht` R package
+- CLI interface matches the original R entrypoint arguments and output format
+
+## Next Steps
+
+To continue the migration:
+
+1. Start with `database.py` to establish DuckDB connection and basic queries
+2. Port `address.py` parsing logic from Ruby
+3. Integrate a metaphone library or implement the algorithm
+4. Test with small address samples before full database migration
+5. Validate results match the Ruby implementation exactly
diff --git a/entrypoint.py b/entrypoint.py
new file mode 100644
index 0000000..c5646a0
--- /dev/null
+++ b/entrypoint.py
@@ -0,0 +1,437 @@
+#!/usr/bin/env python3
+"""
+Geocoder entrypoint - Main CLI for geocoding US addresses.
+
+This script reads a CSV file with an 'address' column, geocodes the addresses,
+and writes the results to a new CSV file with geocoding metadata.
+
+Usage:
+    python entrypoint.py <filename> [score_threshold]
+    
+Arguments:
+    filename: Path to input CSV file (must contain 'address' column)
+    score_threshold: Minimum geocoding score (0.0-1.0) or 'all' (default: 0.5)
+
+Example:
+    python entrypoint.py my_addresses.csv 0.6
+"""
+
+import argparse
+import sys
+from pathlib import Path
+from typing import Optional, Union, Dict, Any
+
+import pandas as pd
+from tabulate import tabulate
+
+from geocoder_us import __version__
+from geocoder_us.preprocessing import (
+    clean_address,
+    address_is_po_box,
+    address_is_institutional,
+    address_is_nonaddress
+)
+from geocoder_us.address import Address
+from geocoder_us.database import GeocoderDatabase
+from joblib import Memory, Parallel, delayed
+
+
+def parse_arguments() -> argparse.Namespace:
+    """Parse command-line arguments."""
+    parser = argparse.ArgumentParser(
+        description="Geocode US street addresses using DuckDB",
+        formatter_class=argparse.RawDescriptionHelpFormatter,
+        epilog="""
+Examples:
+  %(prog)s addresses.csv              # Use default threshold (0.5)
+  %(prog)s addresses.csv 0.6          # Use 0.6 threshold
+  %(prog)s addresses.csv all          # Return all geocodes
+        """
+    )
+    
+    parser.add_argument(
+        "filename",
+        type=str,
+        help="Input CSV file with 'address' column"
+    )
+    
+    parser.add_argument(
+        "score_threshold",
+        type=str,
+        nargs="?",
+        default="0.5",
+        help="Minimum score threshold (0.0-1.0) or 'all' (default: 0.5)"
+    )
+    
+    parser.add_argument(
+        "--version",
+        action="version",
+        version=f"%(prog)s {__version__}"
+    )
+    
+    return parser.parse_args()
+
+
+def validate_score_threshold(threshold_str: str) -> Union[float, str]:
+    """
+    Validate and convert score threshold argument.
+    
+    Args:
+        threshold_str: Threshold string from command line
+        
+    Returns:
+        Float value between 0 and 1, or "all"
+        
+    Raises:
+        ValueError: If threshold is invalid
+    """
+    if threshold_str.lower() == "all":
+        return "all"
+    
+    try:
+        threshold = float(threshold_str)
+        if not 0.0 <= threshold <= 1.0:
+            raise ValueError("Score threshold must be between 0.0 and 1.0")
+        return threshold
+    except ValueError as e:
+        raise ValueError(f"Invalid score threshold '{threshold_str}': {e}")
+
+
+def read_input_file(filename: str) -> pd.DataFrame:
+    """
+    Read and validate input CSV file.
+    
+    Args:
+        filename: Path to CSV file
+        
+    Returns:
+        DataFrame with address data
+        
+    Raises:
+        FileNotFoundError: If file doesn't exist
+        ValueError: If 'address' column is missing
+    """
+    filepath = Path(filename)
+    if not filepath.exists():
+        raise FileNotFoundError(f"File not found: {filename}")
+    
+    print(f"Reading input file: {filename}")
+    df = pd.read_csv(filepath)
+    
+    if "address" not in df.columns:
+        raise ValueError("Input file must contain an 'address' column")
+    
+    print(f"Loaded {len(df)} addresses")
+    return df
+
+
+def preprocess_addresses(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    Clean and flag addresses before geocoding.
+    
+    Args:
+        df: DataFrame with 'address' column
+        
+    Returns:
+        DataFrame with additional preprocessing columns
+    """
+    print("Preprocessing addresses...")
+    
+    # Clean addresses
+    df["address"] = df["address"].fillna("").astype(str).apply(clean_address)
+    
+    # Flag bad addresses
+    df["po_box"] = df["address"].apply(address_is_po_box)
+    df["cincy_inst_foster_addr"] = df["address"].apply(address_is_institutional)
+    df["non_address_text"] = df["address"].apply(address_is_nonaddress)
+    
+    # Count flagged addresses
+    n_po_box = df["po_box"].sum()
+    n_institutional = df["cincy_inst_foster_addr"].sum()
+    n_nonaddress = df["non_address_text"].sum()
+    
+    print(f"  PO Box addresses: {n_po_box}")
+    print(f"  Institutional addresses: {n_institutional}")
+    print(f"  Non-address text: {n_nonaddress}")
+    
+    return df
+
+
+def geocode_addresses(df: pd.DataFrame, score_threshold: Union[float, str]) -> pd.DataFrame:
+    """
+    Geocode addresses using parallel processing with caching.
+    
+    Args:
+        df: DataFrame with preprocessed addresses
+        score_threshold: Minimum score or "all"
+        
+    Returns:
+        DataFrame with geocoding results
+    """
+    print("Geocoding...")
+    
+    # Filter addresses to geocode (exclude flagged ones unless threshold is "all")
+    if score_threshold == "all":
+        addresses_to_geocode = df["address"].tolist()
+        indices_to_geocode = df.index.tolist()
+    else:
+        mask = ~(df["po_box"] | df["cincy_inst_foster_addr"] | df["non_address_text"])
+        addresses_to_geocode = df.loc[mask, "address"].tolist()
+        indices_to_geocode = df.loc[mask].index.tolist()
+    
+    if not addresses_to_geocode:
+        print("  No addresses to geocode after filtering")
+        # Add empty geocoding columns
+        df = _add_empty_geocode_columns(df)
+        return df
+    
+    print(f"  Processing {len(addresses_to_geocode)} addresses...")
+    
+    # Set up caching
+    cache_dir = "./.geocoding_cache"
+    memory = Memory(cache_dir, verbose=0)
+    
+    # Cached geocoding function
+    @memory.cache
+    def geocode_single_address(address_str: str) -> Dict[str, Any]:
+        """
+        Geocode a single address with caching.
+        
+        Args:
+            address_str: Address string
+            
+        Returns:
+            Dictionary with geocoding results
+        """
+        try:
+            # Parse the address
+            addr = Address(address_str)
+            
+            # TODO: Query database once it's migrated
+            # For now, return parsed address components
+            return {
+                'matched_street': addr.street[0] if addr.street else None,
+                'matched_city': addr.city[0] if addr.city else None,
+                'matched_state': addr.state if addr.state else None,
+                'matched_zip': addr.zip if addr.zip else None,
+                'precision': 'street' if addr.number else 'city',  # Placeholder
+                'score': 0.8 if addr.number and addr.street else 0.5,  # Placeholder
+                'lat': None,  # Requires database
+                'lon': None,  # Requires database
+                'geocode_result': 'parsed',  # Placeholder
+            }
+        except Exception as e:
+            print(f"    Error geocoding '{address_str}': {e}")
+            return {
+                'matched_street': None,
+                'matched_city': None,
+                'matched_state': None,
+                'matched_zip': None,
+                'precision': None,
+                'score': None,
+                'lat': None,
+                'lon': None,
+                'geocode_result': 'error',
+            }
+    
+    # Parallel geocoding with progress
+    print("  Geocoding in parallel with caching...")
+    results = Parallel(n_jobs=-1, verbose=1)(
+        delayed(geocode_single_address)(addr) 
+        for addr in addresses_to_geocode
+    )
+    
+    # Convert results to DataFrame
+    results_df = pd.DataFrame(results, index=indices_to_geocode)
+    
+    # Merge with original DataFrame
+    for col in results_df.columns:
+        if col not in df.columns:
+            df[col] = None
+        df.loc[results_df.index, col] = results_df[col]
+    
+    # Fill missing values for addresses that weren't geocoded
+    df = _add_empty_geocode_columns(df)
+    
+    print(f"  Geocoding complete!")
+    return df
+
+
+def _add_empty_geocode_columns(df: pd.DataFrame) -> pd.DataFrame:
+    """
+    Add empty geocoding columns if they don't exist.
+    
+    Args:
+        df: DataFrame
+        
+    Returns:
+        DataFrame with geocoding columns
+    """
+    geocode_cols = {
+        'matched_street': None,
+        'matched_city': None,
+        'matched_state': None,
+        'matched_zip': None,
+        'precision': None,
+        'score': None,
+        'lat': None,
+        'lon': None,
+        'geocode_result': 'not_geocoded',
+    }
+    
+    for col, default in geocode_cols.items():
+        if col not in df.columns:
+            df[col] = default
+        else:
+            df[col] = df[col].fillna(default)
+    
+    return df
+
+
+def write_output_file(df: pd.DataFrame, input_filename: str, score_threshold: Union[float, str]) -> str:
+    """
+    Write geocoded results to output file.
+    
+    Args:
+        df: DataFrame with geocoding results
+        input_filename: Original input filename
+        score_threshold: Score threshold used
+        
+    Returns:
+        Output filename
+    """
+    input_path = Path(input_filename)
+    stem = input_path.stem
+    suffix = input_path.suffix
+    
+    # Apply score threshold filtering if not "all"
+    if score_threshold != "all":
+        df = apply_score_threshold(df, float(score_threshold))
+    
+    # Format output filename: input_geocoder_v4.0.0_score_threshold_0.5.csv
+    threshold_str = str(score_threshold).replace(".", "_")
+    output_filename = f"{stem}_geocoder_v{__version__}_score_threshold_{threshold_str}{suffix}"
+    
+    df.to_csv(output_filename, index=False)
+    print(f"Output written to: {output_filename}")
+    
+    return output_filename
+
+
+def apply_score_threshold(df: pd.DataFrame, threshold: float) -> pd.DataFrame:
+    """
+    Apply score threshold to filter geocoding results.
+    
+    Sets lat/lon to None for low-scoring or imprecise geocodes,
+    and updates geocode_result accordingly.
+    
+    Args:
+        df: DataFrame with geocoding results
+        threshold: Minimum acceptable score (0.0-1.0)
+        
+    Returns:
+        DataFrame with filtered results
+    """
+    # Classify geocoding results
+    def classify_result(row):
+        # Check for flagged addresses first
+        if row.get('po_box', False):
+            return 'po_box'
+        if row.get('cincy_inst_foster_addr', False):
+            return 'cincy_inst_foster_addr'
+        if row.get('non_address_text', False):
+            return 'non_address_text'
+        
+        # Check geocoding quality
+        score = row.get('score')
+        precision = row.get('precision')
+        
+        if score is None or precision is None:
+            return 'not_geocoded'
+        
+        # Imprecise if not "street" or "range" precision, or low score
+        if precision not in ['street', 'range'] or score < threshold:
+            return 'imprecise_geocode'
+        
+        return 'geocoded'
+    
+    # Apply classification
+    df['geocode_result'] = df.apply(classify_result, axis=1)
+    
+    # Set coordinates to None for imprecise geocodes
+    mask = df['geocode_result'] == 'imprecise_geocode'
+    df.loc[mask, 'lat'] = None
+    df.loc[mask, 'lon'] = None
+    
+    return df
+
+
+def print_summary(df: pd.DataFrame) -> None:
+    """
+    Print geocoding results summary.
+    
+    Args:
+        df: DataFrame with geocoding results
+    """
+    if "geocode_result" not in df.columns:
+        return
+    
+    print("\nGeocoding Summary:")
+    print("=" * 60)
+    
+    # Count by geocode result
+    summary = df["geocode_result"].value_counts().reset_index()
+    summary.columns = ["geocode_result", "n"]
+    summary["percent"] = (summary["n"] / len(df) * 100).round(1)
+    summary["n (%)"] = summary.apply(lambda x: f"{x['n']} ({x['percent']}%)", axis=1)
+    
+    # Print table
+    table = tabulate(
+        summary[["geocode_result", "n (%)"]],
+        headers=["Result", "Count (%)"],
+        tablefmt="simple",
+        showindex=False
+    )
+    print(table)
+    
+    # Print success rate
+    if "geocoded" in summary["geocode_result"].values:
+        success_row = summary[summary["geocode_result"] == "geocoded"].iloc[0]
+        print(f"\nSuccessfully geocoded: {success_row['n']} of {len(df)} ({success_row['percent']}%)")
+
+
+def main() -> int:
+    """Main entry point."""
+    try:
+        # Parse arguments
+        args = parse_arguments()
+        score_threshold = validate_score_threshold(args.score_threshold)
+        
+        print(f"Geocoder v{__version__}")
+        print(f"Score threshold: {score_threshold}")
+        print("-" * 60)
+        
+        # Read input
+        df = read_input_file(args.filename)
+        
+        # Preprocess
+        df = preprocess_addresses(df)
+        
+        # Geocode
+        df = geocode_addresses(df, score_threshold)
+        
+        # Write output
+        write_output_file(df, args.filename, score_threshold)
+        
+        # Print summary
+        print_summary(df)
+        
+        return 0
+        
+    except Exception as e:
+        print(f"Error: {e}", file=sys.stderr)
+        return 1
+
+
+if __name__ == "__main__":
+    sys.exit(main())
diff --git a/geocoder_us/__init__.py b/geocoder_us/__init__.py
new file mode 100644
index 0000000..5906dbd
--- /dev/null
+++ b/geocoder_us/__init__.py
@@ -0,0 +1,9 @@
+"""
+Geocoder US - Python geocoding library for US addresses.
+
+This package provides geocoding functionality for US street addresses using
+DuckDB with spatial extensions.
+"""
+
+__version__ = "4.0.0"
+__author__ = "DeGAUSS Team"
diff --git a/geocoder_us/address.py b/geocoder_us/address.py
new file mode 100644
index 0000000..12478b6
--- /dev/null
+++ b/geocoder_us/address.py
@@ -0,0 +1,298 @@
+"""
+Address parsing module for US addresses.
+
+This module provides the Address class for parsing and normalizing US street addresses.
+Ported from Ruby Geocoder::US address.rb.
+"""
+
+import re
+from typing import List, Optional, Tuple
+from geocoder_us.constants import (
+    DIRECTIONAL, PREFIX_TYPE, SUFFIX_TYPE, STATE
+)
+
+
+class Address:
+    """
+    Parses and normalizes US street addresses.
+    
+    Takes a raw address string and breaks it into components:
+    - Street number (number, prenum, sufnum)
+    - Street name
+    - City
+    - State
+    - ZIP code (zip, plus4)
+    """
+    
+    # Regex patterns for matching address components
+    PATTERNS = {
+        'number': re.compile(r'^(\d+\W|[a-z]+)?(\d+)([a-z]?)\b', re.IGNORECASE),
+        'street': re.compile(r'(?:\b(?:\d+\w*|[a-z\'-]+)\s*)+', re.IGNORECASE),
+        'city': re.compile(r'(?:\b[a-z\'-]+\s*)+', re.IGNORECASE),
+        'state': re.compile(STATE.regexp.pattern + r'\s*$', re.IGNORECASE),
+        'zip': re.compile(r'(\d{5})(?:-\d{4})?\s*$'),
+        'at': re.compile(r'\s(at|@|and|&)\s', re.IGNORECASE),
+        'po_box': re.compile(r'\b[Pp]*(OST|ost)*\.?\s*[Oo0]*(ffice|FFICE)*\.?\s*[Bb][Oo0][Xx]\b'),
+    }
+    
+    def __init__(self, text: str):
+        """
+        Initialize address parser with raw text.
+        
+        Args:
+            text: Raw address string
+        """
+        if not text or not text.strip():
+            raise ValueError("Address text cannot be empty")
+        
+        self.text = text.strip()
+        self.original_text = self.text
+        
+        # Address components
+        self.prenum: str = ""
+        self.number: str = ""
+        self.sufnum: str = ""
+        self.street: List[str] = []
+        self.city: List[str] = []
+        self.state: str = ""
+        self.full_state: str = ""
+        self.zip: str = ""
+        self.plus4: str = ""
+        
+        # Parse the address
+        self._parse()
+    
+    def _clean(self, text: str) -> str:
+        """
+        Clean address text by removing special characters and normalizing whitespace.
+        
+        Args:
+            text: Raw text to clean
+            
+        Returns:
+            Cleaned text
+        """
+        text = text.strip()
+        # Remove special characters (keep alphanumeric, space, comma, apostrophe, ampersand, slash, hyphen)
+        text = re.sub(r'[^a-z0-9 ,\'&@/\-]+', '', text, flags=re.IGNORECASE)
+        # Normalize whitespace
+        text = re.sub(r'\s+', ' ', text)
+        return text
+    
+    def _parse(self) -> None:
+        """
+        Parse the address text into components.
+        
+        Parsing order:
+        1. ZIP code (from end)
+        2. State (from end)
+        3. Street number (from beginning)
+        4. Street name (middle)
+        5. City (remaining)
+        """
+        text = self.text.lower()
+        
+        # Parse ZIP code (last occurrence)
+        zip_matches = list(self.PATTERNS['zip'].finditer(text))
+        if zip_matches:
+            match = zip_matches[-1]
+            self.zip = match.group(1)
+            # Extract plus4 if present
+            if '-' in match.group(0):
+                self.plus4 = match.group(0).split('-')[1].strip()
+            # Remove from text
+            text = text[:match.start()] + text[match.end():]
+            text = re.sub(r'\s*,?\s*$', '', text)
+        
+        # Parse state (last occurrence after ZIP removal)
+        state_matches = list(self.PATTERNS['state'].finditer(text))
+        if state_matches:
+            match = state_matches[-1]
+            state_text = match.group(0).strip()
+            self.full_state = state_text
+            # Convert to 2-letter abbreviation
+            self.state = STATE.get_case_insensitive(state_text, state_text)
+            # Remove from text
+            text = text[:match.start()] + text[match.end():]
+            text = re.sub(r'\s*,?\s*$', '', text)
+        
+        # Parse street number (first occurrence)
+        number_match = self.PATTERNS['number'].search(text)
+        if number_match:
+            self.prenum = number_match.group(1) or ""
+            self.number = number_match.group(2) or ""
+            self.sufnum = number_match.group(3) or ""
+            # Clean up
+            self.prenum = self.prenum.strip()
+            self.number = self.number.strip()
+            self.sufnum = self.sufnum.strip()
+            # Remove from text
+            text = text[:number_match.start()] + text[number_match.end():]
+            text = re.sub(r'^\s*,?\s*', '', text)
+        
+        # Parse street names
+        street_matches = self.PATTERNS['street'].findall(text)
+        if street_matches:
+            self.street = [s.strip() for s in street_matches if s.strip()]
+            self.street = self._expand_streets(self.street)
+        
+        # Parse city (remaining text)
+        city_matches = self.PATTERNS['city'].findall(text)
+        if city_matches:
+            # Take the last match as the city
+            city_text = city_matches[-1].strip() if city_matches else ""
+            if city_text:
+                self.city = [city_text.lower()]
+                self.city = list(set(self.city))  # Remove duplicates
+        
+        # Special case: if no city but state has same name (e.g., "New York")
+        if self.state and self.full_state and self.state.lower() != self.full_state.lower():
+            self.city.append(self.full_state.lower())
+    
+    def _expand_streets(self, streets: List[str]) -> List[str]:
+        """
+        Expand street names by generating variants with abbreviations.
+        
+        Args:
+            streets: List of street name variants
+            
+        Returns:
+            Expanded list with abbreviation variants
+        """
+        if not streets or not streets[0]:
+            return []
+        
+        # Strip and lowercase
+        streets = [s.strip().lower() for s in streets if s]
+        expanded = set(streets)
+        
+        # Add variants with abbreviated street types
+        for street in streets:
+            # Try prefix types
+            for full, abbr in PREFIX_TYPE.items():
+                if full.lower() in street:
+                    expanded.add(street.replace(full.lower(), abbr.lower()))
+            
+            # Try suffix types
+            for full, abbr in SUFFIX_TYPE.items():
+                if full.lower() in street:
+                    expanded.add(street.replace(full.lower(), abbr.lower()))
+            
+            # Try directionals
+            for full, abbr in DIRECTIONAL.items():
+                if full.lower() in street:
+                    expanded.add(street.replace(full.lower(), abbr.lower()))
+        
+        return list(expanded)
+    
+    def street_parts(self) -> List[str]:
+        """
+        Generate all possible street name substrings for matching.
+        
+        Returns:
+            List of street name variants for database queries
+        """
+        strings = []
+        
+        for street in self.street:
+            tokens = street.split()
+            # Generate all contiguous substrings
+            for i in range(len(tokens)):
+                for j in range(i, len(tokens)):
+                    substring = ' '.join(tokens[i:j+1])
+                    strings.append(substring)
+        
+        # Remove duplicates
+        strings = list(set(strings))
+        
+        # Filter out pure abbreviations and directionals (optional)
+        # This helps reduce false matches
+        filtered = []
+        for s in strings:
+            # Keep if not just a directional or common abbreviation
+            if len(s) > 2 or s.isdigit():
+                filtered.append(s)
+        
+        return filtered if filtered else strings
+    
+    def city_parts(self) -> List[str]:
+        """
+        Generate all possible city name substrings for matching.
+        
+        Returns:
+            List of city name variants for database queries
+        """
+        strings = []
+        
+        for city in self.city:
+            tokens = city.split()
+            # Generate all contiguous substrings (reverse order for cities)
+            for i in range(len(tokens) - 1, -1, -1):
+                for j in range(i, len(tokens)):
+                    substring = ' '.join(tokens[i:j+1])
+                    strings.append(substring)
+        
+        # Remove duplicates
+        return list(set(strings))
+    
+    def is_po_box(self) -> bool:
+        """
+        Check if this address is a PO Box.
+        
+        Returns:
+            True if address is a PO Box
+        """
+        return bool(self.PATTERNS['po_box'].search(self.original_text))
+    
+    def is_intersection(self) -> bool:
+        """
+        Check if this address is a street intersection.
+        
+        Returns:
+            True if address appears to be an intersection (contains "at", "&", etc.)
+        """
+        return bool(self.PATTERNS['at'].search(self.original_text))
+    
+    def to_dict(self) -> dict:
+        """
+        Convert address to dictionary representation.
+        
+        Returns:
+            Dictionary with all address components
+        """
+        return {
+            'text': self.original_text,
+            'number': self.number,
+            'prenum': self.prenum,
+            'sufnum': self.sufnum,
+            'street': self.street,
+            'city': self.city,
+            'state': self.state,
+            'zip': self.zip,
+            'plus4': self.plus4,
+            'is_po_box': self.is_po_box(),
+            'is_intersection': self.is_intersection()
+        }
+    
+    def __str__(self) -> str:
+        """String representation of parsed address."""
+        parts = []
+        if self.number:
+            parts.append(f"{self.prenum}{self.number}{self.sufnum}".strip())
+        if self.street:
+            parts.append(self.street[0] if self.street else "")
+        if self.city:
+            parts.append(self.city[0] if self.city else "")
+        if self.state:
+            parts.append(self.state)
+        if self.zip:
+            zip_part = self.zip
+            if self.plus4:
+                zip_part += f"-{self.plus4}"
+            parts.append(zip_part)
+        
+        return ", ".join(p for p in parts if p)
+    
+    def __repr__(self) -> str:
+        """Developer representation."""
+        return f"Address('{self.original_text}') -> {str(self)}"
diff --git a/geocoder_us/constants.py b/geocoder_us/constants.py
new file mode 100644
index 0000000..479e8ec
--- /dev/null
+++ b/geocoder_us/constants.py
@@ -0,0 +1,952 @@
+"""
+Constants for US address parsing and normalization.
+
+This module contains mappings for:
+- Directional prefixes/suffixes (North, South, etc.)
+- Street type qualifiers (Alternate, Business, etc.)
+- Street type prefixes and suffixes with canonical abbreviations
+- US state and territory names and abbreviations
+
+Ported from Ruby Geocoder::US constants.
+"""
+
+import re
+from typing import Dict, Pattern
+
+
+class TwoWayMap(dict):
+    """
+    A two-way mapping dictionary that allows lookup by key or value.
+    Supports case-insensitive lookups and builds a regex pattern for matching.
+    """
+    
+    def __init__(self, mapping: Dict[str, str]):
+        super().__init__()
+        # Add original mappings
+        for k, v in mapping.items():
+            self[k] = v
+        # Add lowercase versions
+        for k, v in list(self.items()):
+            self[k.lower()] = self.get(k, v)
+            self[v.lower()] = v
+        # Build regex pattern
+        all_terms = list(mapping.keys()) + list(mapping.values())
+        self.regexp = re.compile(
+            r'\b(' + '|'.join(re.escape(term) for term in all_terms) + r')\b',
+            re.IGNORECASE
+        )
+    
+    def get_case_insensitive(self, key: str, default=None):
+        """Get value with case-insensitive lookup."""
+        return self.get(key.lower(), default)
+
+
+# Directional prefixes and suffixes
+# Maps compass directions (English and Spanish) to 1-2 letter abbreviations
+DIRECTIONAL = TwoWayMap({
+    "North": "N",
+    "South": "S",
+    "East": "E",
+    "West": "W",
+    "Northeast": "NE",
+    "Northwest": "NW",
+    "Southeast": "SE",
+    "Southwest": "SW",
+    "Norte": "N",
+    "Sur": "S",
+    "Este": "E",
+    "Oeste": "O",
+    "Noreste": "NE",
+    "Noroeste": "NO",
+    "Sudeste": "SE",
+    "Sudoeste": "SO"
+})
+
+
+# Prefix qualifiers (e.g., "Alternate Main Street")
+PREFIX_QUALIFIER = TwoWayMap({
+    "Alternate": "Alt",
+    "Business": "Bus",
+    "Bypass": "Byp",
+    "Extended": "Exd",
+    "Historic": "Hst",
+    "Loop": "Lp",
+    "Old": "Old",
+    "Private": "Pvt",
+    "Public": "Pub",
+    "Spur": "Spr",
+})
+
+
+# Suffix qualifiers (e.g., "Main Street Extension")
+SUFFIX_QUALIFIER = TwoWayMap({
+    "Access": "Acc",
+    "Alternate": "Alt",
+    "Business": "Bus",
+    "Bypass": "Byp",
+    "Connector": "Con",
+    "Extended": "Exd",
+    "Extension": "Exn",
+    "Loop": "Lp",
+    "Private": "Pvt",
+    "Public": "Pub",
+    "Scenic": "Scn",
+    "Spur": "Spr",
+    "Ramp": "Rmp",
+    "Underpass": "Unp",
+    "Overpass": "Ovp",
+})
+
+
+# Canonical prefix street types from TIGER/Line documentation
+PREFIX_CANONICAL = {
+    "Arcade": "Arc",
+    "Autopista": "Autopista",
+    "Avenida": "Ave",
+    "Avenue": "Ave",
+    "Boulevard": "Blvd",
+    "Bulevar": "Bulevar",
+    "Bureau of Indian Affairs Highway": "BIA Hwy",
+    "Bureau of Indian Affairs Road": "BIA Rd",
+    "Bureau of Indian Affairs Route": "BIA Rte",
+    "Bureau of Land Management Road": "BLM Rd",
+    "Bypass": "Byp",
+    "Calle": "Cll",
+    "Calleja": "Calleja",
+    "Callejón": "Callejón",
+    "Caminito": "Cmt",
+    "Camino": "Cam",
+    "Carretera": "Carr",
+    "Cerrada": "Cer",
+    "Círculo": "Cír",
+    "Commons": "Cmns",
+    "Corte": "Corte",
+    "County Highway": "Co Hwy",
+    "County Lane": "Co Ln",
+    "County Road": "Co Rd",
+    "County Route": "Co Rte",
+    "County State Aid Highway": "Co St Aid Hwy",
+    "County Trunk Highway": "Co Trunk Hwy",
+    "County Trunk Road": "Co Trunk Rd",
+    "Court": "Ct",
+    "Delta Road": "Delta Rd",
+    "District of Columbia Highway": "DC Hwy",
+    "Driveway": "Driveway",
+    "Entrada": "Ent",
+    "Expreso": "Expreso",
+    "Expressway": "Expy",
+    "Farm Road": "Farm Rd",
+    "Farm-to-Market Road": "FM",
+    "Fire Control Road": "Fire Cntrl Rd",
+    "Fire District Road": "Fire Dist Rd",
+    "Fire Lane": "Fire Ln",
+    "Fire Road": "Fire Rd",
+    "Fire Route": "Fire Rte",
+    "Fire Trail": "Fire Trl",
+    "Forest Highway": "Forest Hwy",
+    "Forest Road": "Forest Rd",
+    "Forest Route": "Forest Rte",
+    "Forest Service Road": "FS Rd",
+    "Highway": "Hwy",
+    "Indian Route": "Indian Rte",
+    "Indian Service Route": "Indian Svc Rte",
+    "Interstate Highway": "I-",
+    "Lane": "Ln",
+    "Logging Road": "Logging Rd",
+    "Loop": "Loop",
+    "National Forest Development Road": "Nat For Dev Rd",
+    "Navajo Service Route": "Navajo Svc Rte",
+    "Parish Road": "Parish Rd",
+    "Pasaje": "Pasaje",
+    "Paseo": "Pso",
+    "Passage": "Psge",
+    "Placita": "Pla",
+    "Plaza": "Plz",
+    "Point": "Pt",
+    "Puente": "Puente",
+    "Ranch Road": "Ranch Rd",
+    "Ranch to Market Road": "RM",
+    "Reservation Highway": "Resvn Hwy",
+    "Road": "Rd",
+    "Route": "Rte",
+    "Row": "Row",
+    "Rue": "Rue",
+    "Ruta": "Ruta",
+    "Sector": "Sec",
+    "Sendero": "Sendero",
+    "Service Road": "Svc Rd",
+    "Skyway": "Skwy",
+    "Square": "Sq",
+    "State Forest Service Road": "St FS Rd",
+    "State Highway": "State Hwy",
+    "State Loop": "State Loop",
+    "State Road": "State Rd",
+    "State Route": "State Rte",
+    "State Spur": "State Spur",
+    "State Trunk Highway": "St Trunk Hwy",
+    "Terrace": "Ter",
+    "Town Highway": "Town Hwy",
+    "Town Road": "Town Rd",
+    "Township Highway": "Twp Hwy",
+    "Township Road": "Twp Rd",
+    "Trail": "Trl",
+    "Tribal Road": "Tribal Rd",
+    "Tunnel": "Tunl",
+    "US Forest Service Highway": "USFS Hwy",
+    "US Forest Service Road": "USFS Rd",
+    "US Highway": "US Hwy",
+    "US Route": "US Rte",
+    "Vereda": "Ver",
+    "Via": "Via",
+    "Vista": "Vis",
+}
+
+
+# Alternate prefix street types (USPS accepted variants)
+PREFIX_ALTERNATE = {
+    "Av": "Ave",
+    "Aven": "Ave",
+    "Avenu": "Ave",
+    "Avenue": "Ave",
+    "Avn": "Ave",
+    "Avnue": "Ave",
+    "Boul": "Blvd",
+    "Boulv": "Blvd",
+    "Bypa": "Byp",
+    "Bypas": "Byp",
+    "Byps": "Byp",
+    "Crt": "Ct",
+    "Exp": "Expy",
+    "Expr": "Expy",
+    "Express": "Expy",
+    "Expw": "Expy",
+    "Highwy": "Hwy",
+    "Hiway": "Hwy",
+    "Hiwy": "Hwy",
+    "Hway": "Hwy",
+    "Lanes": "Ln",
+    "Loops": "Loop",
+    "Plza": "Plz",
+    "Sqr": "Sq",
+    "Sqre": "Sq",
+    "Squ": "Sq",
+    "Terr": "Ter",
+    "Tr": "Trl",
+    "Trails": "Trl",
+    "Trls": "Trl",
+    "Tunel": "Tunl",
+    "Tunls": "Tunl",
+    "Tunnels": "Tunl",
+    "Tunnl": "Tunl",
+    "Vdct": "Via",
+    "Viadct": "Via",
+    "Viaduct": "Via",
+    "Vist": "Vis",
+    "Vst": "Vis",
+    "Vsta": "Vis"
+}
+
+
+# Merged prefix types (canonical + alternates)
+PREFIX_TYPE = TwoWayMap({**PREFIX_CANONICAL, **PREFIX_ALTERNATE})
+
+
+# Canonical suffix street types from TIGER/Line documentation
+SUFFIX_CANONICAL = {
+    "Alley": "Aly",
+    "Arcade": "Arc",
+    "Avenida": "Ave",
+    "Avenue": "Ave",
+    "Beltway": "Beltway",
+    "Boulevard": "Blvd",
+    "Bridge": "Brg",
+    "Bypass": "Byp",
+    "Causeway": "Cswy",
+    "Circle": "Cir",
+    "Common": "Cmn",
+    "Commons": "Cmns",
+    "Corners": "Cors",
+    "Court": "Ct",
+    "Courts": "Cts",
+    "Crescent": "Cres",
+    "Crest": "Crst",
+    "Crossing": "Xing",
+    "Cutoff": "Cutoff",
+    "Drive": "Dr",
+    "Driveway": "Driveway",
+    "Esplanade": "Esplanade",
+    "Estates": "Ests",
+    "Expressway": "Expy",
+    "Forest Highway": "Forest Hwy",
+    "Fork": "Frk",
+    "Four-Wheel Drive Trail": "4WD Trl",
+    "Freeway": "Fwy",
+    "Grade": "Grade",
+    "Heights": "Hts",
+    "Highway": "Hwy",
+    "Jeep Trail": "Jeep Trl",
+    "Landing": "Lndg",
+    "Lane": "Ln",
+    "Loop": "Loop",
+    "Motorway": "Mtwy",
+    "Park": "Park",
+    "Parkway": "Pkwy",
+    "Pass": "Pass",
+    "Path": "Path",
+    "Pike": "Pike",
+    "Place": "Pl",
+    "Plaza": "Plz",
+    "Point": "Pt",
+    "Port": "Prt",
+    "Ranch": "Rnch",
+    "Ramp": "Ramp",
+    "Rest": "Rst",
+    "Ridge": "Rdg",
+    "Rise": "Rise",
+    "Road": "Rd",
+    "Route": "Rte",
+    "Row": "Row",
+    "Skyway": "Skwy",
+    "Spring": "Spg",
+    "Square": "Sq",
+    "Station": "Sta",
+    "Street": "St",
+    "Terrace": "Ter",
+    "Throughway": "Trwy",
+    "Trace": "Trce",
+    "Track": "Trak",
+    "Trail": "Trl",
+    "Tunnel": "Tunl",
+    "Turnpike": "Tpke",
+    "Valley": "Vly",
+    "Viaduct": "Via",
+    "View": "Vw",
+    "Village": "Vlg",
+    "Walk": "Walk",
+    "Way": "Way",
+    "Wells": "Wls",
+}
+
+
+# Alternate suffix street types (USPS accepted variants)
+SUFFIX_ALTERNATE = {
+    "Aly": "Alley",
+    "Anex": "Annex",
+    "Annex": "Annex",
+    "Annx": "Annex",
+    "Arc": "Arcade",
+    "Av": "Ave",
+    "Aven": "Ave",
+    "Avenu": "Ave",
+    "Avenue": "Ave",
+    "Avn": "Ave",
+    "Avnue": "Ave",
+    "Bayoo": "Bayou",
+    "Bayou": "Bayou",
+    "Bch": "Beach",
+    "Beach": "Beach",
+    "Bend": "Bend",
+    "Bg": "Burg",
+    "Bgs": "Burgs",
+    "Blf": "Bluff",
+    "Blfs": "Bluffs",
+    "Bluf": "Bluff",
+    "Bluff": "Bluff",
+    "Bluffs": "Bluffs",
+    "Blvd": "Blvd",
+    "Bnd": "Bend",
+    "Bot": "Bottom",
+    "Bottm": "Bottom",
+    "Bottom": "Bottom",
+    "Boul": "Blvd",
+    "Boulv": "Blvd",
+    "Br": "Branch",
+    "Branch": "Branch",
+    "Brdge": "Bridge",
+    "Brg": "Bridge",
+    "Bridge": "Bridge",
+    "Brk": "Brook",
+    "Brks": "Brooks",
+    "Brook": "Brook",
+    "Brooks": "Brooks",
+    "Burg": "Burg",
+    "Burgs": "Burgs",
+    "Byp": "Byp",
+    "Bypa": "Byp",
+    "Bypas": "Byp",
+    "Bypass": "Byp",
+    "Byps": "Byp",
+    "Byu": "Bayou",
+    "Camp": "Camp",
+    "Canyn": "Canyon",
+    "Canyon": "Canyon",
+    "Cape": "Cape",
+    "Causeway": "Cswy",
+    "Causwa": "Cswy",
+    "Cen": "Center",
+    "Cent": "Center",
+    "Center": "Center",
+    "Centers": "Centers",
+    "Centr": "Center",
+    "Centre": "Center",
+    "Cir": "Cir",
+    "Circ": "Cir",
+    "Circl": "Cir",
+    "Circle": "Cir",
+    "Circles": "Circles",
+    "Cirs": "Circles",
+    "Ck": "Creek",
+    "Clf": "Cliff",
+    "Clfs": "Cliffs",
+    "Cliff": "Cliff",
+    "Cliffs": "Cliffs",
+    "Clb": "Club",
+    "Club": "Club",
+    "Cmn": "Cmn",
+    "Cmns": "Cmns",
+    "Cmp": "Camp",
+    "Cnter": "Center",
+    "Cntr": "Center",
+    "Cnyn": "Canyon",
+    "Common": "Cmn",
+    "Commons": "Cmns",
+    "Cor": "Corner",
+    "Corner": "Corner",
+    "Corners": "Cors",
+    "Cors": "Cors",
+    "Course": "Course",
+    "Court": "Ct",
+    "Courts": "Cts",
+    "Cove": "Cove",
+    "Coves": "Coves",
+    "Cp": "Camp",
+    "Cpe": "Cape",
+    "Cr": "Creek",
+    "Crcl": "Cir",
+    "Crcle": "Cir",
+    "Crecent": "Cres",
+    "Creek": "Creek",
+    "Cres": "Cres",
+    "Crescent": "Cres",
+    "Crest": "Crst",
+    "Crk": "Creek",
+    "Crossing": "Xing",
+    "Crossroad": "Xrd",
+    "Crossroads": "Xrds",
+    "Crse": "Course",
+    "Crsent": "Cres",
+    "Crsnt": "Cres",
+    "Crssng": "Xing",
+    "Crst": "Crst",
+    "Crt": "Ct",
+    "Cswy": "Cswy",
+    "Ct": "Ct",
+    "Ctr": "Center",
+    "Ctrs": "Centers",
+    "Cts": "Cts",
+    "Curv": "Curve",
+    "Curve": "Curve",
+    "Cv": "Cove",
+    "Cvs": "Coves",
+    "Cyn": "Canyon",
+    "Dale": "Dale",
+    "Dam": "Dam",
+    "Div": "Divide",
+    "Divide": "Divide",
+    "Dl": "Dale",
+    "Dm": "Dam",
+    "Dr": "Dr",
+    "Driv": "Dr",
+    "Drive": "Dr",
+    "Drives": "Drives",
+    "Drs": "Drives",
+    "Drv": "Dr",
+    "Dv": "Divide",
+    "Dvd": "Divide",
+    "Est": "Estate",
+    "Estate": "Estate",
+    "Estates": "Ests",
+    "Ests": "Ests",
+    "Exp": "Expy",
+    "Expr": "Expy",
+    "Express": "Expy",
+    "Expressway": "Expy",
+    "Expw": "Expy",
+    "Expy": "Expy",
+    "Ext": "Extension",
+    "Extension": "Extension",
+    "Extensions": "Extensions",
+    "Extn": "Extension",
+    "Extnsn": "Extension",
+    "Exts": "Extensions",
+    "Fall": "Fall",
+    "Falls": "Falls",
+    "Ferry": "Ferry",
+    "Field": "Field",
+    "Fields": "Fields",
+    "Flat": "Flat",
+    "Flats": "Flats",
+    "Fld": "Field",
+    "Flds": "Fields",
+    "Fls": "Falls",
+    "Flt": "Flat",
+    "Flts": "Flats",
+    "Ford": "Ford",
+    "Fords": "Fords",
+    "Forest": "Forest",
+    "Forests": "Forests",
+    "Forg": "Forge",
+    "Forge": "Forge",
+    "Forges": "Forges",
+    "Fork": "Frk",
+    "Forks": "Forks",
+    "Fort": "Fort",
+    "Frd": "Ford",
+    "Frds": "Fords",
+    "Freeway": "Fwy",
+    "Freewy": "Fwy",
+    "Frg": "Forge",
+    "Frgs": "Forges",
+    "Frk": "Frk",
+    "Frks": "Forks",
+    "Frry": "Ferry",
+    "Frst": "Forest",
+    "Frt": "Fort",
+    "Frway": "Fwy",
+    "Frwy": "Fwy",
+    "Fry": "Ferry",
+    "Ft": "Fort",
+    "Fwy": "Fwy",
+    "Garden": "Garden",
+    "Gardens": "Gardens",
+    "Gardn": "Garden",
+    "Gateway": "Gateway",
+    "Gatewy": "Gateway",
+    "Gatway": "Gateway",
+    "Gdn": "Garden",
+    "Gdns": "Gardens",
+    "Glen": "Glen",
+    "Glens": "Glens",
+    "Gln": "Glen",
+    "Glns": "Glens",
+    "Grden": "Garden",
+    "Grdn": "Garden",
+    "Grdns": "Gardens",
+    "Green": "Green",
+    "Greens": "Greens",
+    "Grn": "Green",
+    "Grns": "Greens",
+    "Grov": "Grove",
+    "Grove": "Grove",
+    "Groves": "Groves",
+    "Grv": "Grove",
+    "Grvs": "Groves",
+    "Gtway": "Gateway",
+    "Gtwy": "Gateway",
+    "Harb": "Harbor",
+    "Harbor": "Harbor",
+    "Harbors": "Harbors",
+    "Harbr": "Harbor",
+    "Haven": "Haven",
+    "Havn": "Haven",
+    "Hbr": "Harbor",
+    "Hbrs": "Harbors",
+    "Heights": "Hts",
+    "Highway": "Hwy",
+    "Highwy": "Hwy",
+    "Hill": "Hill",
+    "Hills": "Hills",
+    "Hiway": "Hwy",
+    "Hiwy": "Hwy",
+    "Hl": "Hill",
+    "Hllw": "Hollow",
+    "Hls": "Hills",
+    "Hollow": "Hollow",
+    "Hollows": "Hollows",
+    "Holw": "Hollow",
+    "Holws": "Hollows",
+    "Hrbor": "Harbor",
+    "Ht": "Hts",
+    "Hts": "Hts",
+    "Hvn": "Haven",
+    "Hway": "Hwy",
+    "Hwy": "Hwy",
+    "Inlet": "Inlet",
+    "Inlt": "Inlet",
+    "Is": "Island",
+    "Island": "Island",
+    "Islands": "Islands",
+    "Isle": "Isle",
+    "Isles": "Isles",
+    "Islnd": "Island",
+    "Islnds": "Islands",
+    "Iss": "Islands",
+    "Jct": "Junction",
+    "Jction": "Junction",
+    "Jctn": "Junction",
+    "Jctns": "Junctions",
+    "Jcts": "Junctions",
+    "Junction": "Junction",
+    "Junctions": "Junctions",
+    "Junctn": "Junction",
+    "Juncton": "Junction",
+    "Key": "Key",
+    "Keys": "Keys",
+    "Knl": "Knoll",
+    "Knls": "Knolls",
+    "Knol": "Knoll",
+    "Knoll": "Knoll",
+    "Knolls": "Knolls",
+    "Ky": "Key",
+    "Kys": "Keys",
+    "Lake": "Lake",
+    "Lakes": "Lakes",
+    "Land": "Land",
+    "Landing": "Lndg",
+    "Lane": "Ln",
+    "Lanes": "Ln",
+    "Lck": "Lock",
+    "Lcks": "Locks",
+    "Ldg": "Lodge",
+    "Ldge": "Lodge",
+    "Lf": "Loaf",
+    "Lgt": "Light",
+    "Lgts": "Lights",
+    "Light": "Light",
+    "Lights": "Lights",
+    "Lk": "Lake",
+    "Lks": "Lakes",
+    "Ln": "Ln",
+    "Lndg": "Lndg",
+    "Lndng": "Lndg",
+    "Loaf": "Loaf",
+    "Lock": "Lock",
+    "Locks": "Locks",
+    "Lodg": "Lodge",
+    "Lodge": "Lodge",
+    "Loop": "Loop",
+    "Loops": "Loop",
+    "Mall": "Mall",
+    "Manor": "Manor",
+    "Manors": "Manors",
+    "Mdw": "Meadow",
+    "Mdws": "Meadows",
+    "Meadow": "Meadow",
+    "Meadows": "Meadows",
+    "Medows": "Meadows",
+    "Mews": "Mews",
+    "Mill": "Mill",
+    "Mills": "Mills",
+    "Mission": "Mission",
+    "Missn": "Mission",
+    "Ml": "Mill",
+    "Mls": "Mills",
+    "Mnr": "Manor",
+    "Mnrs": "Manors",
+    "Mnt": "Mount",
+    "Mntain": "Mountain",
+    "Mntn": "Mountain",
+    "Mntns": "Mountains",
+    "Motorway": "Mtwy",
+    "Mount": "Mount",
+    "Mountain": "Mountain",
+    "Mountains": "Mountains",
+    "Mountin": "Mountain",
+    "Msn": "Mission",
+    "Mssn": "Mission",
+    "Mt": "Mount",
+    "Mtin": "Mountain",
+    "Mtn": "Mountain",
+    "Mtns": "Mountains",
+    "Mtwy": "Mtwy",
+    "Nck": "Neck",
+    "Neck": "Neck",
+    "Opas": "Overpass",
+    "Orch": "Orchard",
+    "Orchard": "Orchard",
+    "Orchrd": "Orchard",
+    "Oval": "Oval",
+    "Overpass": "Overpass",
+    "Ovl": "Oval",
+    "Park": "Park",
+    "Parks": "Parks",
+    "Parkway": "Pkwy",
+    "Parkways": "Parkways",
+    "Parkwy": "Pkwy",
+    "Pass": "Pass",
+    "Passage": "Psge",
+    "Path": "Path",
+    "Paths": "Path",
+    "Pike": "Pike",
+    "Pikes": "Pike",
+    "Pine": "Pine",
+    "Pines": "Pines",
+    "Pk": "Park",
+    "Pkway": "Pkwy",
+    "Pkwy": "Pkwy",
+    "Pkwys": "Parkways",
+    "Pky": "Pkwy",
+    "Pl": "Pl",
+    "Place": "Pl",
+    "Plain": "Plain",
+    "Plains": "Plains",
+    "Plaza": "Plz",
+    "Pln": "Plain",
+    "Plns": "Plains",
+    "Plz": "Plz",
+    "Plza": "Plz",
+    "Pne": "Pine",
+    "Pnes": "Pines",
+    "Point": "Pt",
+    "Points": "Points",
+    "Port": "Prt",
+    "Ports": "Ports",
+    "Pr": "Prairie",
+    "Prairie": "Prairie",
+    "Prk": "Park",
+    "Prr": "Prairie",
+    "Prt": "Prt",
+    "Prts": "Ports",
+    "Psge": "Psge",
+    "Pt": "Pt",
+    "Pts": "Points",
+    "Rad": "Radial",
+    "Radial": "Radial",
+    "Radiel": "Radial",
+    "Radl": "Radial",
+    "Ramp": "Ramp",
+    "Ranch": "Rnch",
+    "Ranches": "Ranches",
+    "Rapid": "Rapid",
+    "Rapids": "Rapids",
+    "Rd": "Rd",
+    "Rdg": "Rdg",
+    "Rdge": "Rdg",
+    "Rdgs": "Ridges",
+    "Rds": "Roads",
+    "Rest": "Rst",
+    "Ridge": "Rdg",
+    "Ridges": "Ridges",
+    "Rise": "Rise",
+    "Riv": "River",
+    "River": "River",
+    "Rivr": "River",
+    "Rnch": "Rnch",
+    "Rnchs": "Ranches",
+    "Road": "Rd",
+    "Roads": "Roads",
+    "Route": "Rte",
+    "Row": "Row",
+    "Rpd": "Rapid",
+    "Rpds": "Rapids",
+    "Rst": "Rst",
+    "Rte": "Rte",
+    "Rue": "Rue",
+    "Run": "Run",
+    "Rvr": "River",
+    "Shl": "Shoal",
+    "Shls": "Shoals",
+    "Shoal": "Shoal",
+    "Shoals": "Shoals",
+    "Shoar": "Shore",
+    "Shoars": "Shores",
+    "Shore": "Shore",
+    "Shores": "Shores",
+    "Shr": "Shore",
+    "Shrs": "Shores",
+    "Skwy": "Skwy",
+    "Skyway": "Skwy",
+    "Smt": "Summit",
+    "Spg": "Spg",
+    "Spgs": "Springs",
+    "Spng": "Spg",
+    "Spngs": "Springs",
+    "Spring": "Spg",
+    "Springs": "Springs",
+    "Sprng": "Spg",
+    "Sprngs": "Springs",
+    "Spur": "Spur",
+    "Spurs": "Spur",
+    "Sq": "Sq",
+    "Sqr": "Sq",
+    "Sqre": "Sq",
+    "Sqrs": "Squares",
+    "Sqs": "Squares",
+    "Squ": "Sq",
+    "Square": "Sq",
+    "Squares": "Squares",
+    "St": "St",
+    "Sta": "Sta",
+    "Station": "Sta",
+    "Statn": "Sta",
+    "Stn": "Sta",
+    "Str": "St",
+    "Stra": "Stra",
+    "Strav": "Stra",
+    "Strave": "Stra",
+    "Straven": "Stra",
+    "Stravenue": "Stra",
+    "Stravn": "Stra",
+    "Stream": "Stream",
+    "Street": "St",
+    "Streets": "Streets",
+    "Streme": "Stream",
+    "Strm": "Stream",
+    "Strt": "St",
+    "Strvn": "Stra",
+    "Strvnue": "Stra",
+    "Sts": "Streets",
+    "Sumit": "Summit",
+    "Sumitt": "Summit",
+    "Summit": "Summit",
+    "Smt": "Summit",
+    "Ter": "Ter",
+    "Terr": "Ter",
+    "Terrace": "Ter",
+    "Throughway": "Trwy",
+    "Tpke": "Tpke",
+    "Tr": "Trl",
+    "Trace": "Trce",
+    "Traces": "Trce",
+    "Track": "Trak",
+    "Tracks": "Trak",
+    "Trafficway": "Trfy",
+    "Trail": "Trl",
+    "Trailer": "Trlr",
+    "Trails": "Trl",
+    "Trak": "Trak",
+    "Trce": "Trce",
+    "Trfy": "Trfy",
+    "Trk": "Trak",
+    "Trks": "Trak",
+    "Trl": "Trl",
+    "Trlr": "Trlr",
+    "Trlrs": "Trlrs",
+    "Trls": "Trl",
+    "Trnpk": "Tpke",
+    "Trwy": "Trwy",
+    "Tunel": "Tunl",
+    "Tunl": "Tunl",
+    "Tunls": "Tunl",
+    "Tunnel": "Tunl",
+    "Tunnels": "Tunl",
+    "Tunnl": "Tunl",
+    "Turnpike": "Tpke",
+    "Turnpk": "Tpke",
+    "Un": "Union",
+    "Underpass": "Unp",
+    "Union": "Union",
+    "Unions": "Unions",
+    "Unp": "Unp",
+    "Uns": "Unions",
+    "Upas": "Unp",
+    "Valley": "Vly",
+    "Valleys": "Valleys",
+    "Vally": "Vly",
+    "Vdct": "Via",
+    "Via": "Via",
+    "Viadct": "Via",
+    "Viaduct": "Via",
+    "View": "Vw",
+    "Views": "Views",
+    "Vill": "Village",
+    "Villag": "Village",
+    "Village": "Vlg",
+    "Villages": "Villages",
+    "Ville": "Ville",
+    "Villg": "Village",
+    "Villiage": "Village",
+    "Vist": "Vis",
+    "Vista": "Vis",
+    "Vl": "Ville",
+    "Vlg": "Vlg",
+    "Vlgs": "Villages",
+    "Vlly": "Vly",
+    "Vly": "Vly",
+    "Vlys": "Valleys",
+    "Vst": "Vis",
+    "Vsta": "Vis",
+    "Vw": "Vw",
+    "Vws": "Views",
+    "Walk": "Walk",
+    "Walks": "Walks",
+    "Wall": "Wall",
+    "Way": "Way",
+    "Ways": "Ways",
+    "Well": "Well",
+    "Wells": "Wls",
+    "Wl": "Well",
+    "Wls": "Wls",
+    "Wy": "Way",
+    "Xing": "Xing",
+    "Xrd": "Xrd",
+    "Xrds": "Xrds",
+}
+
+
+# Merged suffix types (canonical + alternates)
+SUFFIX_TYPE = TwoWayMap({**SUFFIX_CANONICAL, **SUFFIX_ALTERNATE})
+
+
+# US States and territories
+STATE = TwoWayMap({
+    "Alabama": "AL",
+    "Alaska": "AK",
+    "American Samoa": "AS",
+    "Arizona": "AZ",
+    "Arkansas": "AR",
+    "California": "CA",
+    "Colorado": "CO",
+    "Connecticut": "CT",
+    "Delaware": "DE",
+    "District of Columbia": "DC",
+    "Federated States of Micronesia": "FM",
+    "Florida": "FL",
+    "Georgia": "GA",
+    "Guam": "GU",
+    "Hawaii": "HI",
+    "Idaho": "ID",
+    "Illinois": "IL",
+    "Indiana": "IN",
+    "Iowa": "IA",
+    "Kansas": "KS",
+    "Kentucky": "KY",
+    "Louisiana": "LA",
+    "Maine": "ME",
+    "Marshall Islands": "MH",
+    "Maryland": "MD",
+    "Massachusetts": "MA",
+    "Michigan": "MI",
+    "Minnesota": "MN",
+    "Mississippi": "MS",
+    "Missouri": "MO",
+    "Montana": "MT",
+    "Nebraska": "NE",
+    "Nevada": "NV",
+    "New Hampshire": "NH",
+    "New Jersey": "NJ",
+    "New Mexico": "NM",
+    "New York": "NY",
+    "North Carolina": "NC",
+    "North Dakota": "ND",
+    "Northern Mariana Islands": "MP",
+    "Ohio": "OH",
+    "Oklahoma": "OK",
+    "Oregon": "OR",
+    "Palau": "PW",
+    "Pennsylvania": "PA",
+    "Puerto Rico": "PR",
+    "Rhode Island": "RI",
+    "South Carolina": "SC",
+    "South Dakota": "SD",
+    "Tennessee": "TN",
+    "Texas": "TX",
+    "Utah": "UT",
+    "Vermont": "VT",
+    "Virgin Islands": "VI",
+    "Virginia": "VA",
+    "Washington": "WA",
+    "West Virginia": "WV",
+    "Wisconsin": "WI",
+    "Wyoming": "WY"
+})
diff --git a/geocoder_us/database.py b/geocoder_us/database.py
new file mode 100644
index 0000000..10ec8c9
--- /dev/null
+++ b/geocoder_us/database.py
@@ -0,0 +1,249 @@
+"""
+Database interface for geocoding with DuckDB.
+
+This module provides the database layer for querying street address data
+using DuckDB with spatial extensions.
+"""
+
+import duckdb
+from typing import List, Dict, Optional, Any
+from pathlib import Path
+import threading
+
+
+class GeocoderDatabase:
+    """
+    Interface to DuckDB geocoding database with spatial support.
+    
+    This class manages connections to the geocoder database and provides
+    methods for querying street range data, places, and features.
+    """
+    
+    # Scoring weights for address matching
+    STREET_WEIGHT = 3.0
+    NUMBER_WEIGHT = 2.0
+    PARITY_WEIGHT = 1.25
+    CITY_WEIGHT = 1.0
+    
+    def __init__(self, db_path: str, threadsafe: bool = True):
+        """
+        Initialize database connection.
+        
+        Args:
+            db_path: Path to DuckDB database file
+            threadsafe: Whether to use thread-safe access
+        """
+        self.db_path = db_path
+        self.threadsafe = threadsafe
+        self._lock = threading.Lock() if threadsafe else None
+        self._conn: Optional[duckdb.DuckDBPyConnection] = None
+        
+        # Initialize connection
+        self._connect()
+    
+    def _connect(self) -> None:
+        """
+        Establish connection to database and load extensions.
+        """
+        if not Path(self.db_path).exists():
+            raise FileNotFoundError(f"Database not found: {self.db_path}")
+        
+        # Create connection
+        self._conn = duckdb.connect(self.db_path, read_only=True)
+        
+        # Load spatial extension
+        try:
+            self._conn.execute("INSTALL spatial;")
+            self._conn.execute("LOAD spatial;")
+        except Exception as e:
+            print(f"Warning: Could not load spatial extension: {e}")
+        
+        # Load fuzzystrsim extension for Levenshtein distance
+        try:
+            self._conn.execute("INSTALL fuzzystrsim;")
+            self._conn.execute("LOAD fuzzystrsim;")
+        except Exception as e:
+            print(f"Warning: Could not load fuzzystrsim extension: {e}")
+    
+    def _execute(self, query: str, params: Optional[tuple] = None) -> List[Dict[str, Any]]:
+        """
+        Execute query with optional parameters.
+        
+        Args:
+            query: SQL query string
+            params: Query parameters
+            
+        Returns:
+            List of result rows as dictionaries
+        """
+        if self.threadsafe and self._lock:
+            with self._lock:
+                return self._execute_query(query, params)
+        else:
+            return self._execute_query(query, params)
+    
+    def _execute_query(self, query: str, params: Optional[tuple]) -> List[Dict[str, Any]]:
+        """
+        Internal query execution.
+        
+        Args:
+            query: SQL query
+            params: Parameters
+            
+        Returns:
+            Query results
+        """
+        if not self._conn:
+            self._connect()
+        
+        # Execute query
+        if params:
+            result = self._conn.execute(query, params)
+        else:
+            result = self._conn.execute(query)
+        
+        # Fetch all results
+        rows = result.fetchall()
+        columns = [desc[0] for desc in result.description] if result.description else []
+        
+        # Convert to list of dictionaries
+        return [dict(zip(columns, row)) for row in rows]
+    
+    def places_by_zip(self, city: str, zip_code: str) -> List[Dict[str, Any]]:
+        """
+        Query places by ZIP code.
+        
+        Args:
+            city: City name
+            zip_code: 5-digit ZIP code
+            
+        Returns:
+            List of matching places with Levenshtein distance scores
+        """
+        # TODO: Implement once database schema is finalized
+        query = """
+            SELECT *, levenshtein(?, city) AS city_score
+            FROM place
+            WHERE zip = ?
+            ORDER BY priority DESC
+        """
+        return self._execute(query, (city, zip_code))
+    
+    def places_by_city(self, city: str, city_tokens: List[str], state: Optional[str] = None) -> List[Dict[str, Any]]:
+        """
+        Query places by city name with metaphone matching.
+        
+        Args:
+            city: City name
+            city_tokens: City name tokens for metaphone matching
+            state: Optional state filter
+            
+        Returns:
+            List of matching places
+        """
+        # TODO: Implement with metaphone matching once schema is ready
+        # This will use DuckDB's ability to create custom functions or
+        # use the metaphone results from Python
+        pass
+    
+    def features_by_street(self, street: str, street_tokens: List[str]) -> List[Dict[str, Any]]:
+        """
+        Query features (street segments) by street name.
+        
+        Args:
+            street: Street name
+            street_tokens: Street name tokens for matching
+            
+        Returns:
+            List of matching features with Levenshtein scores
+        """
+        # TODO: Implement once database schema is finalized
+        # This will query the feature table with metaphone-based matching
+        pass
+    
+    def features_by_street_and_zip(self, street: str, street_tokens: List[str], 
+                                   zip_codes: List[str]) -> List[Dict[str, Any]]:
+        """
+        Query features by street name and ZIP codes.
+        
+        Args:
+            street: Street name
+            street_tokens: Street tokens
+            zip_codes: List of ZIP codes to filter
+            
+        Returns:
+            Matching features
+        """
+        # TODO: Implement with ZIP filter
+        pass
+    
+    def ranges_by_feature(self, feature_ids: List[int], number: str, 
+                         prenum: Optional[str] = None) -> List[Dict[str, Any]]:
+        """
+        Query address ranges for given features.
+        
+        Args:
+            feature_ids: Feature IDs to query
+            number: Street number
+            prenum: Optional prefix number
+            
+        Returns:
+            Matching ranges sorted by address number proximity
+        """
+        # TODO: Implement range queries
+        pass
+    
+    def geocode_address(self, address: str) -> List[Dict[str, Any]]:
+        """
+        Main geocoding method (placeholder).
+        
+        This will be the primary interface for geocoding an address string.
+        
+        Args:
+            address: Address string to geocode
+            
+        Returns:
+            List of geocoding results with scores
+        """
+        # TODO: Implement full geocoding pipeline:
+        # 1. Parse address
+        # 2. Query places by ZIP/city
+        # 3. Query features by street
+        # 4. Query ranges for address number
+        # 5. Calculate coordinates
+        # 6. Rank results by score
+        
+        raise NotImplementedError("Full geocoding pipeline not yet implemented")
+    
+    def close(self) -> None:
+        """Close database connection."""
+        if self._conn:
+            self._conn.close()
+            self._conn = None
+    
+    def __enter__(self):
+        """Context manager entry."""
+        return self
+    
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        """Context manager exit."""
+        self.close()
+    
+    def __del__(self):
+        """Cleanup on deletion."""
+        self.close()
+
+
+# Convenience function for creating database instance
+def connect_database(db_path: str = "/opt/geocoder.db", threadsafe: bool = True) -> GeocoderDatabase:
+    """
+    Create a database connection.
+    
+    Args:
+        db_path: Path to DuckDB database
+        threadsafe: Enable thread-safe access
+        
+    Returns:
+        GeocoderDatabase instance
+    """
+    return GeocoderDatabase(db_path, threadsafe)
diff --git a/geocoder_us/metaphone.py b/geocoder_us/metaphone.py
new file mode 100644
index 0000000..bb2c711
--- /dev/null
+++ b/geocoder_us/metaphone.py
@@ -0,0 +1,195 @@
+"""
+Metaphone phonetic matching for address geocoding.
+
+This module provides phonetic matching functionality using the Metaphone algorithm,
+which helps match street names despite spelling variations.
+"""
+
+from typing import Optional
+import re
+
+
+# Simple Metaphone implementation based on the Ruby version
+# This is a simplified version - can be replaced with python-metaphone library
+class Metaphone:
+    """
+    Metaphone phonetic algorithm for fuzzy string matching.
+    
+    This implementation follows the standard Metaphone rules for converting
+    words into phonetic codes that sound similar.
+    """
+    
+    # Metaphone transformation rules (pattern, replacement)
+    RULES = [
+        # Remove doubled consonants except 'c'
+        (re.compile(r'([bcdfghjklmnpqrstvwxyz])\1+', re.IGNORECASE), r'\1'),
+        
+        # Initial patterns
+        (re.compile(r'^ae', re.IGNORECASE), 'E'),
+        (re.compile(r'^[gkp]n', re.IGNORECASE), 'N'),
+        (re.compile(r'^wr', re.IGNORECASE), 'R'),
+        (re.compile(r'^x', re.IGNORECASE), 'S'),
+        (re.compile(r'^wh', re.IGNORECASE), 'W'),
+        
+        # Terminal patterns
+        (re.compile(r'mb$', re.IGNORECASE), 'M'),
+        
+        # Middle patterns
+        (re.compile(r'(?!^)sch', re.IGNORECASE), 'SK'),
+        (re.compile(r'th', re.IGNORECASE), '0'),
+        (re.compile(r't?ch|sh', re.IGNORECASE), 'X'),
+        (re.compile(r'c(?=ia)', re.IGNORECASE), 'X'),
+        (re.compile(r'[st](?=i[ao])', re.IGNORECASE), 'X'),
+        (re.compile(r's?c(?=[iey])', re.IGNORECASE), 'S'),
+        (re.compile(r'[cq]', re.IGNORECASE), 'K'),
+        (re.compile(r'dg(?=[iey])', re.IGNORECASE), 'J'),
+        (re.compile(r'd', re.IGNORECASE), 'T'),
+        (re.compile(r'g(?=h[^aeiou])', re.IGNORECASE), ''),
+        (re.compile(r'gn(ed)?', re.IGNORECASE), 'N'),
+        (re.compile(r'([^g]|^)g(?=[iey])', re.IGNORECASE), r'\1J'),
+        (re.compile(r'g+', re.IGNORECASE), 'K'),
+        (re.compile(r'ph', re.IGNORECASE), 'F'),
+        (re.compile(r'([aeiou])h(?=\b|[^aeiou])', re.IGNORECASE), r'\1'),
+        (re.compile(r'[wy](?![aeiou])', re.IGNORECASE), ''),
+        (re.compile(r'z', re.IGNORECASE), 'S'),
+        (re.compile(r'v', re.IGNORECASE), 'F'),
+        (re.compile(r'(?!^)[aeiou]+', re.IGNORECASE), ''),
+    ]
+    
+    @classmethod
+    def encode(cls, text: str, max_length: int = 0) -> str:
+        """
+        Convert text to Metaphone phonetic code.
+        
+        Args:
+            text: Input text to encode
+            max_length: Maximum length of output (0 = unlimited)
+            
+        Returns:
+            Metaphone code
+        """
+        if not text:
+            return ""
+        
+        # Normalize: lowercase and remove non-alphabetic characters
+        text = re.sub(r'[^a-z]', '', text.lower())
+        
+        if not text:
+            return ""
+        
+        # Apply Metaphone rules
+        for pattern, replacement in cls.RULES:
+            text = pattern.sub(replacement, text)
+        
+        # Uppercase result
+        result = text.upper()
+        
+        # Limit length if requested
+        if max_length > 0:
+            result = result[:max_length]
+        
+        return result
+    
+    @classmethod
+    def encode_multiple(cls, text: str, max_length: int = 0) -> str:
+        """
+        Encode multiple words separated by spaces.
+        
+        Args:
+            text: Space-separated words
+            max_length: Maximum length per word
+            
+        Returns:
+            Space-separated Metaphone codes
+        """
+        if not text:
+            return ""
+        
+        words = text.strip().split()
+        codes = [cls.encode(word, max_length) for word in words]
+        return ' '.join(code for code in codes if code)
+
+
+def metaphone(text: str, max_length: int = 5) -> str:
+    """
+    Convenience function for metaphone encoding.
+    
+    Args:
+        text: Text to encode
+        max_length: Maximum length of code (default: 5)
+        
+    Returns:
+        Metaphone phonetic code
+    """
+    return Metaphone.encode(text, max_length)
+
+
+def metaphone_match(text1: str, text2: str, max_length: int = 5) -> bool:
+    """
+    Check if two texts match phonetically.
+    
+    Args:
+        text1: First text
+        text2: Second text
+        max_length: Maximum code length for comparison
+        
+    Returns:
+        True if metaphone codes match
+    """
+    code1 = metaphone(text1, max_length)
+    code2 = metaphone(text2, max_length)
+    return code1 == code2 and len(code1) > 0
+
+
+def metaphone_similarity(text1: str, text2: str, max_length: int = 5) -> float:
+    """
+    Calculate phonetic similarity between two texts.
+    
+    Args:
+        text1: First text
+        text2: Second text
+        max_length: Maximum code length
+        
+    Returns:
+        Similarity score between 0.0 and 1.0
+    """
+    code1 = metaphone(text1, max_length)
+    code2 = metaphone(text2, max_length)
+    
+    if not code1 or not code2:
+        return 0.0
+    
+    if code1 == code2:
+        return 1.0
+    
+    # Calculate character-level similarity
+    matches = sum(c1 == c2 for c1, c2 in zip(code1, code2))
+    max_len = max(len(code1), len(code2))
+    
+    return matches / max_len if max_len > 0 else 0.0
+
+
+# For compatibility with external metaphone libraries
+try:
+    from metaphone import doublemetaphone
+    
+    def metaphone_double(text: str) -> tuple:
+        """
+        Use Double Metaphone if available (more accurate).
+        
+        Args:
+            text: Text to encode
+            
+        Returns:
+            Tuple of (primary code, secondary code)
+        """
+        return doublemetaphone(text)
+    
+    HAS_DOUBLE_METAPHONE = True
+except ImportError:
+    HAS_DOUBLE_METAPHONE = False
+    
+    def metaphone_double(text: str) -> tuple:
+        """Fallback if doublemetaphone not available."""
+        code = metaphone(text)
+        return (code, code)
diff --git a/geocoder_us/preprocessing.py b/geocoder_us/preprocessing.py
new file mode 100644
index 0000000..2a1d8a9
--- /dev/null
+++ b/geocoder_us/preprocessing.py
@@ -0,0 +1,99 @@
+"""
+Address preprocessing utilities.
+
+This module provides functions for cleaning and validating addresses,
+ported from the dht R package functionality.
+"""
+
+import re
+from typing import Optional
+
+
+def clean_address(address: str) -> str:
+    """
+    Clean an address string by normalizing whitespace and removing
+    special characters.
+    
+    Args:
+        address: Raw address string
+        
+    Returns:
+        Cleaned address string
+    """
+    if not address or not isinstance(address, str):
+        return ""
+    
+    # Strip leading/trailing whitespace
+    cleaned = address.strip()
+    
+    # Normalize internal whitespace
+    cleaned = re.sub(r'\s+', ' ', cleaned)
+    
+    # Remove special characters but keep basic punctuation
+    cleaned = re.sub(r'[^a-zA-Z0-9 ,.\-#&@/]', '', cleaned)
+    
+    return cleaned
+
+
+def address_is_po_box(address: str) -> bool:
+    """
+    Check if an address is a PO Box.
+    
+    Args:
+        address: Address string to check
+        
+    Returns:
+        True if address appears to be a PO Box
+    """
+    if not address:
+        return False
+    
+    # Pattern matches: PO Box, P.O. Box, P O Box, etc.
+    po_box_pattern = r'\b[Pp]*(OST|ost)*\.?\s*[Oo0]*(ffice|FFICE)*\.?\s*[Bb][Oo0][Xx]\b'
+    return bool(re.search(po_box_pattern, address))
+
+
+def address_is_institutional(address: str) -> bool:
+    """
+    Check if an address is a known Cincinnati institutional address.
+    
+    This is specific to the Cincinnati area institutional/foster addresses
+    that should not be geocoded to protect privacy.
+    
+    Args:
+        address: Address string to check
+        
+    Returns:
+        True if address is flagged as institutional
+    """
+    if not address:
+        return False
+    
+    # Cincinnati Children's Hospital Medical Center
+    if "3333 BURNET" in address.upper():
+        return True
+    
+    # Add other institutional addresses as needed
+    return False
+
+
+def address_is_nonaddress(address: str) -> bool:
+    """
+    Check if the address field contains non-address text.
+    
+    Args:
+        address: Address string to check
+        
+    Returns:
+        True if field is blank or contains placeholder text
+    """
+    if not address or not address.strip():
+        return True
+    
+    # Check for common placeholder values
+    non_address_values = {
+        "foreign", "verify", "unknown", "na", "n/a", "none",
+        "not applicable", "missing"
+    }
+    
+    return address.lower().strip() in non_address_values
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..fa9cf73
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,22 @@
+# Python Geocoder Requirements
+# For US address geocoding with DuckDB
+
+# Core dependencies
+pandas>=2.0.0
+duckdb>=0.9.0
+tabulate>=0.9.0
+
+# For address parsing and string matching
+python-Levenshtein>=0.21.0
+metaphone>=0.6
+
+# CLI and utilities
+click>=8.1.0  # Alternative to argparse if needed
+rich>=13.0.0  # For better console output
+
+# Parallel processing and caching
+joblib>=1.3.0
+
+# Testing (optional, for development)
+pytest>=7.4.0
+pytest-cov>=4.1.0
diff --git a/test_integration.py b/test_integration.py
new file mode 100644
index 0000000..b079b78
--- /dev/null
+++ b/test_integration.py
@@ -0,0 +1,88 @@
+#!/usr/bin/env python3
+"""
+Integration test for the geocoder pipeline.
+
+Tests the full workflow from CSV input to geocoded output.
+"""
+
+import sys
+import os
+import pandas as pd
+from pathlib import Path
+
+sys.path.insert(0, '/home/runner/work/geocoder/geocoder')
+
+# Create test CSV
+test_data = pd.DataFrame({
+    'id': [1, 2, 3, 4, 5],
+    'address': [
+        '123 Main St, Springfield, IL 62701',
+        '1600 Pennsylvania Ave, Washington, DC 20500',
+        'PO Box 123, Anytown, CA 90210',
+        '3333 BURNET AVE CINCINNATI, OH 45229',
+        'unknown'
+    ]
+})
+
+test_file = '/tmp/test_addresses.csv'
+test_data.to_csv(test_file, index=False)
+print(f"Created test file: {test_file}")
+print(f"Test data:\n{test_data}\n")
+
+# Test the entrypoint
+print("=" * 60)
+print("Testing geocoder entrypoint")
+print("=" * 60)
+
+# Import and run
+from entrypoint import (
+    read_input_file,
+    preprocess_addresses,
+    geocode_addresses,
+    write_output_file,
+    print_summary
+)
+
+try:
+    # Read input
+    df = read_input_file(test_file)
+    print(f"\n✓ Read {len(df)} addresses")
+    
+    # Preprocess
+    df = preprocess_addresses(df)
+    print(f"\n✓ Preprocessed addresses")
+    print(f"  Flagged addresses:")
+    print(f"    PO Box: {df['po_box'].sum()}")
+    print(f"    Institutional: {df['cincy_inst_foster_addr'].sum()}")
+    print(f"    Non-address: {df['non_address_text'].sum()}")
+    
+    # Geocode
+    df = geocode_addresses(df, score_threshold=0.5)
+    print(f"\n✓ Geocoded addresses")
+    
+    # Check results
+    print(f"\nResults preview:")
+    cols_to_show = ['address', 'matched_street', 'matched_city', 'matched_state', 'score', 'geocode_result']
+    print(df[cols_to_show].to_string())
+    
+    # Write output
+    output_file = write_output_file(df, test_file, 0.5)
+    print(f"\n✓ Wrote output file: {output_file}")
+    
+    # Print summary
+    print_summary(df)
+    
+    # Clean up
+    os.remove(test_file)
+    if os.path.exists(output_file):
+        os.remove(output_file)
+    
+    print("\n" + "=" * 60)
+    print("✓ Integration test passed!")
+    print("=" * 60)
+    
+except Exception as e:
+    print(f"\n✗ Integration test failed: {e}")
+    import traceback
+    traceback.print_exc()
+    sys.exit(1)
diff --git a/test_modules.py b/test_modules.py
new file mode 100644
index 0000000..d04f6f5
--- /dev/null
+++ b/test_modules.py
@@ -0,0 +1,77 @@
+#!/usr/bin/env python3
+"""
+Quick test script for geocoder_us modules.
+
+Tests address parsing, metaphone encoding, and basic functionality.
+"""
+
+import sys
+sys.path.insert(0, '/home/runner/work/geocoder/geocoder')
+
+from geocoder_us.address import Address
+from geocoder_us.metaphone import metaphone, metaphone_similarity
+from geocoder_us import constants
+
+print("=" * 60)
+print("Testing geocoder_us modules")
+print("=" * 60)
+
+# Test 1: Address Parsing
+print("\n1. Address Parsing Test")
+print("-" * 40)
+test_addresses = [
+    "1600 Pennsylvania Ave Washington DC 20500",
+    "3333 BURNET AVE CINCINNATI, OH 45229",
+    "123 Main St, Springfield, IL 62701",
+    "PO Box 123, Anytown, CA 90210"
+]
+
+for addr_str in test_addresses:
+    try:
+        addr = Address(addr_str)
+        print(f"\nInput:  {addr_str}")
+        print(f"Parsed: {addr}")
+        print(f"  Number: {addr.number}")
+        print(f"  Street: {addr.street[:2] if len(addr.street) > 2 else addr.street}")
+        print(f"  City: {addr.city}")
+        print(f"  State: {addr.state}")
+        print(f"  ZIP: {addr.zip}")
+        print(f"  PO Box: {addr.is_po_box()}")
+    except Exception as e:
+        print(f"Error parsing '{addr_str}': {e}")
+
+# Test 2: Metaphone Encoding
+print("\n\n2. Metaphone Encoding Test")
+print("-" * 40)
+test_words = [
+    ("Main", "Maine"),
+    ("Street", "Streat"),
+    ("Avenue", "Avenu"),
+    ("Washington", "Washinton")
+]
+
+for word1, word2 in test_words:
+    code1 = metaphone(word1)
+    code2 = metaphone(word2)
+    similarity = metaphone_similarity(word1, word2)
+    print(f"{word1:15} -> {code1:10} | {word2:15} -> {code2:10} | Sim: {similarity:.2f}")
+
+# Test 3: Constants
+print("\n\n3. Constants Test")
+print("-" * 40)
+print(f"States loaded: {len(constants.STATE)} entries")
+print(f"Street suffixes: {len(constants.SUFFIX_TYPE)} entries")
+print(f"Sample state lookup: 'Ohio' -> '{constants.STATE.get('Ohio', 'NOT FOUND')}'")
+print(f"Sample state lookup: 'CA' -> '{constants.STATE.get('CA', 'NOT FOUND')}'")
+
+# Test 4: Street Parts Generation
+print("\n\n4. Street Parts Generation Test")
+print("-" * 40)
+addr = Address("123 North Main Street Springfield IL")
+parts = addr.street_parts()
+print(f"Address: {addr.original_text}")
+print(f"Street parts ({len(parts)}): {parts[:5]}")
+
+print("\n" + "=" * 60)
+print("All tests completed!")
+print("=" * 60)