Notion Processing AI Data Pipeline

An intelligent data pipeline that extracts, classifies, and summarizes documents from Notion databases using AI/LLM technology.

Features

📥 Document Extraction: Extract raw documents from Notion databases via API
🏷️ AI Classification: Use LLM to classify documents into project/knowledge categories with sub-categories
📊 Weekly Summaries: Generate comprehensive weekly reports of processed documents with mindset analysis
📈 Interactive Dashboard: Streamlit-based dashboard for visualizing weekly summaries and trends
🗄️ Supabase Storage: Store all data and processing results in Supabase (PostgreSQL)
🔄 Pipeline Orchestration: Modular pipeline design for flexible processing
📈 Statistics & Monitoring: Track processing status and statistics

Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Notion API    │───▶│   Extraction    │───▶│   Supabase      │
│   (Documents)   │    │   (Raw Data)    │    │   (Storage)     │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                                │                       │
                                ▼                       ▼
                       ┌─────────────────┐    ┌─────────────────┐
                       │ Classification  │    │   Statistics    │
                       │   (LLM/AI)      │    │   & Monitoring  │
                       └─────────────────┘    └─────────────────┘
                                │
                                ▼
                       ┌─────────────────┐
                       │   Summarization │
                       │   (Weekly Reports)│
                       └─────────────────┘

Document Classification

Main Categories

PROJECT: Documents related to project work, tasks, features, bugs, planning
KNOWLEDGE: Documents containing knowledge, documentation, tutorials

Sub-Categories

Project Sub-Categories

feature_request: Requests for new features or functionality
bug_report: Reports of bugs or issues
planning: Project planning, roadmaps, timelines
research: Research findings, analysis, investigations

Knowledge Sub-Categories

tutorial: Step-by-step guides, how-to documents
reference: Reference materials, documentation, specifications
best_practice: Best practices, guidelines, standards
case_study: Case studies, examples, success stories
documentation: Technical documentation, API docs, etc.

Mindset Analysis

The system now includes advanced mindset analysis capabilities that go beyond simple document classification to provide insights into your thinking patterns, interests, and mental state.

Features

🧠 Content Analysis: Analyzes document content to understand your interests and focus areas
📈 Pattern Recognition: Identifies recurring themes and thinking patterns across your documents
🎯 Mindset Indicators: Detects mindset characteristics like learning focus, project orientation, or research tendencies
📊 AI-Powered Insights: Uses LLM to generate human-like insights about your cognitive patterns

Mindset Analysis Methods

1. Content-Based Analysis

Analyzes the actual text content of your documents
Identifies themes, topics, and writing patterns
Provides insights into your current interests and focus areas

2. Pattern Recognition

Examines document types and categories for patterns
Identifies dominant thinking modes (learning, project management, research, etc.)
Tracks changes in focus over time

3. AI-Generated Insights

Uses OpenAI's GPT models to generate natural language insights
Provides context-aware analysis of your mindset
Offers personalized recommendations based on your patterns

Usage Examples

from notion_processing.summarizer import WeeklySummarizer

# Initialize summarizer
summarizer = WeeklySummarizer(api_key="your_openai_api_key")

# Get detailed mindset insights
insights = summarizer.get_mindset_insights()
print(f"Mindset indicators: {insights['mindset_indicators']}")

# Generate AI-powered weekly summary with mindset focus
summary = summarizer.run_weekly_summary()
print(f"AI Summary: {summary.summary_text}")
print(f"Key Insights: {summary.key_insights}")

Mindset Indicators

The system can identify various mindset characteristics:

Learning Focus: High concentration on educational content and skill development
Project Management: Active engagement with planning and execution tasks
Research Orientation: Analytical thinking and investigation patterns
Personal Reflection: Strong self-awareness and introspective content
Creative Thinking: Innovation-focused and idea-generation patterns

Quick Start

1. Prerequisites

Python 3.12+
Supabase account and project
Notion API token
OpenAI API key
uv package manager

2. Installation

# Clone the repository
git clone <repository-url>
cd notion_processing

# Install dependencies with uv
uv sync

# Copy environment file
cp env.example .env

3. Database Setup

Option A: Supabase (Recommended)

Create a Supabase project:
- Go to https://supabase.com
- Sign up/login and create a new project
- Wait for the project to be ready
Get your connection string:
- Go to Project Settings > Database
- Copy the 'Connection string' > 'URI'
- It should look like: postgresql://postgres:[YOUR-PASSWORD]@db.[YOUR-PROJECT-REF].supabase.co:5432/postgres
Test your connection:
```
python migrate_to_supabase.py --test
```

Option B: Local PostgreSQL (Development)

If you prefer to use local PostgreSQL for development:

# Uncomment the services in docker-compose.yml
# Start PostgreSQL and pgAdmin
docker-compose up -d

# The database will be available at:
# - PostgreSQL: localhost:5432
# - pgAdmin: http://localhost:8080 (admin@example.com / admin)

4. Configuration

Edit .env file with your credentials:

# Notion API Configuration
NOTION_TOKEN=your_notion_integration_token_here
NOTION_DATABASE_ID=your_notion_database_id_here

# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here

# Database Configuration
# For Supabase (recommended):
DATABASE_URL=postgresql://postgres:[YOUR-PASSWORD]@db.[YOUR-PROJECT-REF].supabase.co:5432/postgres

# For local PostgreSQL (development):
# DATABASE_URL=postgresql://notion_user:notion_password@localhost:5432/notion_processing

5. Setup Database Tables

uv run python -m notion_processing.cli setup

6. Run the Pipeline

# Run complete pipeline
uv run python -m notion_processing.cli run

# Or run individual steps
uv run python -m notion_processing.cli extract --limit 10
uv run python -m notion_processing.cli classify
uv run python -m notion_processing.cli summarize

7. Setup Authentication (Optional)

The dashboard now includes Supabase authentication for secure access:

# Test authentication setup
python test_auth.py

# Configure authentication (see AUTHENTICATION_SETUP.md for details)
# 1. Set up Supabase project
# 2. Configure .streamlit/secrets.toml
# 3. Test the setup

8. View Dashboard

# Generate sample data for testing (optional)
make sample-data

# Start the interactive dashboard
make dashboard

# Or run directly with streamlit
uv run streamlit run streamlit_app.py

### 9. Run Mindset Analysis

```bash
# Run the mindset analysis example
uv run python example_mindset_analysis.py

This will generate both detailed mindset insights and AI-powered weekly summaries focused on understanding your thinking patterns and interests.


The dashboard will be available at `http://localhost:8501`

> **Note**: If authentication is enabled, you'll need to log in or create an account to access the dashboard.

> **Note**: If you don't have any weekly summaries yet, you can generate sample data using `make sample-data` to test the dashboard functionality.

## Dashboard

The interactive Streamlit dashboard provides comprehensive visualization and analysis of your weekly summaries.

### Features

- **📊 Overview Metrics**: Total weeks, documents, and averages
- **📈 Trend Analysis**: Charts showing document types and sub-categories over time
- **📋 Weekly Details**: Detailed view of each week's summary with:
  - Summary text and key insights
  - **📄 Document List**: View all documents used to create each summary
  - Document type and sub-category breakdowns
  - Interactive charts and visualizations
- **📋 Raw Data Table**: Exportable data table with all summary information
- **🔐 Authentication**: Secure login system with Supabase
- **📅 Date Filtering**: Filter summaries by date range

### Document List Feature

The dashboard now includes a comprehensive document list for each weekly summary:

- **Document Titles**: See the actual titles of all documents processed
- **Creation Dates**: View when each document was created
- **Last Edited**: Track when documents were last modified
- **Direct Links**: Click to open documents directly in Notion
- **Document Count**: See exactly how many documents contributed to each summary

This feature helps you understand exactly which documents influenced each weekly summary and provides transparency into the summarization process.

### Dashboard Sections

1. **Overview**: Key metrics and statistics
2. **Trends**: Interactive charts for document types, sub-categories, and total documents
3. **Weekly Details**: Detailed breakdown of selected weekly summaries
4. **Raw Data**: Tabular view with export functionality

### Running the Dashboard

```bash
# Using Makefile (recommended)
make dashboard

# Direct streamlit command
uv run streamlit run streamlit_app.py

# With custom port
uv run streamlit run streamlit_app.py --server.port 8502

Migration from Local PostgreSQL to Supabase

If you're migrating from a local PostgreSQL setup to Supabase:

Quick Migration

Run the migration helper:
```
python migrate_to_supabase.py
```
Follow the step-by-step instructions provided by the migration script
Test your connection:
```
python migrate_to_supabase.py --test
```

Manual Migration Steps

Create a Supabase project at https://supabase.com
Get your connection string from Project Settings > Database > Connection string > URI

Update your .env file:

# Replace your local DATABASE_URL with Supabase URL
DATABASE_URL=postgresql://postgres:[YOUR-PASSWORD]@db.[YOUR-PROJECT-REF].supabase.co:5432/postgres

Test the connection:

uv run python -m notion_processing.cli setup

Run your application - tables will be created automatically

Benefits of Supabase

No local database setup required
Automatic backups and scaling
Built-in authentication and real-time features
Free tier available
Production-ready infrastructure

Usage

CLI Commands

# Show available commands
uv run python -m notion_processing.cli --help

# Run complete pipeline
uv run python -m notion_processing.cli run [--limit N] [--date YYYY-MM-DD]

# Extract documents only
uv run python -m notion_processing.cli extract [--limit N]

# Classify documents only
uv run python -m notion_processing.cli classify

# Generate weekly summary only
uv run python -m notion_processing.cli summarize [--date YYYY-MM-DD]

# Show processing statistics
uv run python -m notion_processing.cli stats

# Setup database tables
uv run python -m notion_processing.cli setup

# Show current configuration
uv run python -m notion_processing.cli config

# Run interactive dashboard
make dashboard

Programmatic Usage

from notion_processing.pipeline import NotionProcessingPipeline

# Initialize pipeline
pipeline = NotionProcessingPipeline()

# Setup database
pipeline.setup_database()

# Run complete pipeline
result = pipeline.run_full_pipeline(limit=10)

# Run individual steps
extracted_count = pipeline.run_extraction_only(limit=10)
classified_count = pipeline.run_classification_only()
summary = pipeline.run_summary_only()

# Get statistics
pipeline.display_processing_stats()

Database Schema

Tables

notion_documents: Raw documents from Notion
document_classifications: AI classification results
weekly_summaries: Generated weekly reports
processing_records: Processing status tracking

Key Fields

Document tracking with Notion IDs
Classification confidence scores
Processing timestamps
Error handling and retry logic

Configuration

Environment Variables

Variable	Description	Required
`NOTION_TOKEN`	Notion integration token	Yes
`NOTION_DATABASE_ID`	Notion database ID	Yes
`OPENAI_API_KEY`	OpenAI API key	Yes
`DATABASE_URL`	PostgreSQL connection URL	Yes
`LLM_MODEL`	LLM model for classification/summarization	No (default: gpt-4)

Notion Setup

Create a Notion integration at https://www.notion.so/my-integrations
Share your database with the integration
Get the database ID from the URL: https://notion.so/workspace/{database_id}?v=...

Development

Project Structure

notion_processing/
├── notion_processing/
│   ├── __init__.py
│   ├── models.py          # Data models and enums
│   ├── database.py        # Database configuration and models
│   ├── extractor.py       # Notion document extraction
│   ├── classifier.py      # LLM-based classification
│   ├── summarizer.py      # Weekly summary generation
│   ├── pipeline.py        # Main pipeline orchestrator
│   └── cli.py            # Command-line interface
├── tests/                 # Test suite
├── main.py               # Entry point
├── pyproject.toml        # Dependencies and project config
├── docker-compose.yml    # Database setup
├── env.example          # Environment variables template
├── Makefile             # Development commands
└── README.md            # This file

Running Tests

# Install development dependencies
uv sync --extra dev

# Run tests
uv run pytest tests/ -v

# Run with coverage
uv run pytest tests/ --cov=notion_processing --cov-report=html

Code Quality

# Format code
uv run black notion_processing/ tests/
uv run isort notion_processing/ tests/

# Type checking
uv run mypy notion_processing/

# Linting
uv run flake8 notion_processing/ tests/

Using Makefile

# Show all available commands
make help

# Complete development setup
make dev-setup

# Run quality checks
make quality

# Run tests
make test

# Format code
make format

UV Package Management

This project uses uv for fast Python package management:

# Add a new dependency
uv add package_name

# Add a development dependency
uv add --dev package_name

# Remove a dependency
uv remove package_name

# Update all dependencies
uv lock --upgrade

# Sync dependencies
uv sync

Monitoring and Logging

The pipeline uses structured logging with structlog:

JSON logging for production
Console logging for development
Error tracking and debugging information
Processing statistics and metrics

Error Handling

Graceful handling of API rate limits
Retry logic for transient failures
Detailed error logging and reporting
Processing status tracking

Performance Considerations

Batch processing for efficiency
Rate limiting for API calls
Database connection pooling
Content length limits for LLM calls

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

[Add your license here]

Support

For issues and questions:

Check the documentation
Review existing issues
Create a new issue with detailed information

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
notion_processing		notion_processing
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AUTHENTICATION_SETUP.md		AUTHENTICATION_SETUP.md
DATABASE_DUMP.md		DATABASE_DUMP.md
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
env.example		env.example
main.py		main.py
pyproject.toml		pyproject.toml
streamlit_app.py		streamlit_app.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Notion Processing AI Data Pipeline

Features

Architecture

Document Classification

Main Categories

Sub-Categories

Project Sub-Categories

Knowledge Sub-Categories

Mindset Analysis

Features

Mindset Analysis Methods

1. Content-Based Analysis

2. Pattern Recognition

3. AI-Generated Insights

Usage Examples

Mindset Indicators

Quick Start

1. Prerequisites

2. Installation

3. Database Setup

Option A: Supabase (Recommended)

Option B: Local PostgreSQL (Development)

4. Configuration

5. Setup Database Tables

6. Run the Pipeline

7. Setup Authentication (Optional)

8. View Dashboard

Migration from Local PostgreSQL to Supabase

Quick Migration

Manual Migration Steps

Benefits of Supabase

Usage

CLI Commands

Programmatic Usage

Database Schema

Tables

Key Fields

Configuration

Environment Variables

Notion Setup

Development

Project Structure

Running Tests

Code Quality

Using Makefile

UV Package Management

Monitoring and Logging

Error Handling

Performance Considerations

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages