An intelligent data pipeline that extracts, classifies, and summarizes documents from Notion databases using AI/LLM technology.
- π₯ Document Extraction: Extract raw documents from Notion databases via API
- π·οΈ AI Classification: Use LLM to classify documents into project/knowledge categories with sub-categories
- π Weekly Summaries: Generate comprehensive weekly reports of processed documents with mindset analysis
- π Interactive Dashboard: Streamlit-based dashboard for visualizing weekly summaries and trends
- ποΈ Supabase Storage: Store all data and processing results in Supabase (PostgreSQL)
- π Pipeline Orchestration: Modular pipeline design for flexible processing
- π Statistics & Monitoring: Track processing status and statistics
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Notion API βββββΆβ Extraction βββββΆβ Supabase β
β (Documents) β β (Raw Data) β β (Storage) β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β Classification β β Statistics β
β (LLM/AI) β β & Monitoring β
βββββββββββββββββββ βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Summarization β
β (Weekly Reports)β
βββββββββββββββββββ
- PROJECT: Documents related to project work, tasks, features, bugs, planning
- KNOWLEDGE: Documents containing knowledge, documentation, tutorials
feature_request: Requests for new features or functionalitybug_report: Reports of bugs or issuesplanning: Project planning, roadmaps, timelinesresearch: Research findings, analysis, investigations
tutorial: Step-by-step guides, how-to documentsreference: Reference materials, documentation, specificationsbest_practice: Best practices, guidelines, standardscase_study: Case studies, examples, success storiesdocumentation: Technical documentation, API docs, etc.
The system now includes advanced mindset analysis capabilities that go beyond simple document classification to provide insights into your thinking patterns, interests, and mental state.
- π§ Content Analysis: Analyzes document content to understand your interests and focus areas
- π Pattern Recognition: Identifies recurring themes and thinking patterns across your documents
- π― Mindset Indicators: Detects mindset characteristics like learning focus, project orientation, or research tendencies
- π AI-Powered Insights: Uses LLM to generate human-like insights about your cognitive patterns
- Analyzes the actual text content of your documents
- Identifies themes, topics, and writing patterns
- Provides insights into your current interests and focus areas
- Examines document types and categories for patterns
- Identifies dominant thinking modes (learning, project management, research, etc.)
- Tracks changes in focus over time
- Uses OpenAI's GPT models to generate natural language insights
- Provides context-aware analysis of your mindset
- Offers personalized recommendations based on your patterns
from notion_processing.summarizer import WeeklySummarizer
# Initialize summarizer
summarizer = WeeklySummarizer(api_key="your_openai_api_key")
# Get detailed mindset insights
insights = summarizer.get_mindset_insights()
print(f"Mindset indicators: {insights['mindset_indicators']}")
# Generate AI-powered weekly summary with mindset focus
summary = summarizer.run_weekly_summary()
print(f"AI Summary: {summary.summary_text}")
print(f"Key Insights: {summary.key_insights}")The system can identify various mindset characteristics:
- Learning Focus: High concentration on educational content and skill development
- Project Management: Active engagement with planning and execution tasks
- Research Orientation: Analytical thinking and investigation patterns
- Personal Reflection: Strong self-awareness and introspective content
- Creative Thinking: Innovation-focused and idea-generation patterns
- Python 3.12+
- Supabase account and project
- Notion API token
- OpenAI API key
- uv package manager
# Clone the repository
git clone <repository-url>
cd notion_processing
# Install dependencies with uv
uv sync
# Copy environment file
cp env.example .env-
Create a Supabase project:
- Go to https://supabase.com
- Sign up/login and create a new project
- Wait for the project to be ready
-
Get your connection string:
- Go to Project Settings > Database
- Copy the 'Connection string' > 'URI'
- It should look like:
postgresql://postgres:[YOUR-PASSWORD]@db.[YOUR-PROJECT-REF].supabase.co:5432/postgres
-
Test your connection:
python migrate_to_supabase.py --test
If you prefer to use local PostgreSQL for development:
# Uncomment the services in docker-compose.yml
# Start PostgreSQL and pgAdmin
docker-compose up -d
# The database will be available at:
# - PostgreSQL: localhost:5432
# - pgAdmin: http://localhost:8080 (admin@example.com / admin)Edit .env file with your credentials:
# Notion API Configuration
NOTION_TOKEN=your_notion_integration_token_here
NOTION_DATABASE_ID=your_notion_database_id_here
# OpenAI API Configuration
OPENAI_API_KEY=your_openai_api_key_here
# Database Configuration
# For Supabase (recommended):
DATABASE_URL=postgresql://postgres:[YOUR-PASSWORD]@db.[YOUR-PROJECT-REF].supabase.co:5432/postgres
# For local PostgreSQL (development):
# DATABASE_URL=postgresql://notion_user:notion_password@localhost:5432/notion_processinguv run python -m notion_processing.cli setup# Run complete pipeline
uv run python -m notion_processing.cli run
# Or run individual steps
uv run python -m notion_processing.cli extract --limit 10
uv run python -m notion_processing.cli classify
uv run python -m notion_processing.cli summarizeThe dashboard now includes Supabase authentication for secure access:
# Test authentication setup
python test_auth.py
# Configure authentication (see AUTHENTICATION_SETUP.md for details)
# 1. Set up Supabase project
# 2. Configure .streamlit/secrets.toml
# 3. Test the setup# Generate sample data for testing (optional)
make sample-data
# Start the interactive dashboard
make dashboard
# Or run directly with streamlit
uv run streamlit run streamlit_app.py
### 9. Run Mindset Analysis
```bash
# Run the mindset analysis example
uv run python example_mindset_analysis.pyThis will generate both detailed mindset insights and AI-powered weekly summaries focused on understanding your thinking patterns and interests.
The dashboard will be available at `http://localhost:8501`
> **Note**: If authentication is enabled, you'll need to log in or create an account to access the dashboard.
> **Note**: If you don't have any weekly summaries yet, you can generate sample data using `make sample-data` to test the dashboard functionality.
## Dashboard
The interactive Streamlit dashboard provides comprehensive visualization and analysis of your weekly summaries.
### Features
- **π Overview Metrics**: Total weeks, documents, and averages
- **π Trend Analysis**: Charts showing document types and sub-categories over time
- **π Weekly Details**: Detailed view of each week's summary with:
- Summary text and key insights
- **π Document List**: View all documents used to create each summary
- Document type and sub-category breakdowns
- Interactive charts and visualizations
- **π Raw Data Table**: Exportable data table with all summary information
- **π Authentication**: Secure login system with Supabase
- **π
Date Filtering**: Filter summaries by date range
### Document List Feature
The dashboard now includes a comprehensive document list for each weekly summary:
- **Document Titles**: See the actual titles of all documents processed
- **Creation Dates**: View when each document was created
- **Last Edited**: Track when documents were last modified
- **Direct Links**: Click to open documents directly in Notion
- **Document Count**: See exactly how many documents contributed to each summary
This feature helps you understand exactly which documents influenced each weekly summary and provides transparency into the summarization process.
### Dashboard Sections
1. **Overview**: Key metrics and statistics
2. **Trends**: Interactive charts for document types, sub-categories, and total documents
3. **Weekly Details**: Detailed breakdown of selected weekly summaries
4. **Raw Data**: Tabular view with export functionality
### Running the Dashboard
```bash
# Using Makefile (recommended)
make dashboard
# Direct streamlit command
uv run streamlit run streamlit_app.py
# With custom port
uv run streamlit run streamlit_app.py --server.port 8502
If you're migrating from a local PostgreSQL setup to Supabase:
-
Run the migration helper:
python migrate_to_supabase.py
-
Follow the step-by-step instructions provided by the migration script
-
Test your connection:
python migrate_to_supabase.py --test
-
Create a Supabase project at https://supabase.com
-
Get your connection string from Project Settings > Database > Connection string > URI
-
Update your
.envfile:# Replace your local DATABASE_URL with Supabase URL DATABASE_URL=postgresql://postgres:[YOUR-PASSWORD]@db.[YOUR-PROJECT-REF].supabase.co:5432/postgres -
Test the connection:
uv run python -m notion_processing.cli setup
-
Run your application - tables will be created automatically
- No local database setup required
- Automatic backups and scaling
- Built-in authentication and real-time features
- Free tier available
- Production-ready infrastructure
# Show available commands
uv run python -m notion_processing.cli --help
# Run complete pipeline
uv run python -m notion_processing.cli run [--limit N] [--date YYYY-MM-DD]
# Extract documents only
uv run python -m notion_processing.cli extract [--limit N]
# Classify documents only
uv run python -m notion_processing.cli classify
# Generate weekly summary only
uv run python -m notion_processing.cli summarize [--date YYYY-MM-DD]
# Show processing statistics
uv run python -m notion_processing.cli stats
# Setup database tables
uv run python -m notion_processing.cli setup
# Show current configuration
uv run python -m notion_processing.cli config
# Run interactive dashboard
make dashboardfrom notion_processing.pipeline import NotionProcessingPipeline
# Initialize pipeline
pipeline = NotionProcessingPipeline()
# Setup database
pipeline.setup_database()
# Run complete pipeline
result = pipeline.run_full_pipeline(limit=10)
# Run individual steps
extracted_count = pipeline.run_extraction_only(limit=10)
classified_count = pipeline.run_classification_only()
summary = pipeline.run_summary_only()
# Get statistics
pipeline.display_processing_stats()- notion_documents: Raw documents from Notion
- document_classifications: AI classification results
- weekly_summaries: Generated weekly reports
- processing_records: Processing status tracking
- Document tracking with Notion IDs
- Classification confidence scores
- Processing timestamps
- Error handling and retry logic
| Variable | Description | Required |
|---|---|---|
NOTION_TOKEN |
Notion integration token | Yes |
NOTION_DATABASE_ID |
Notion database ID | Yes |
OPENAI_API_KEY |
OpenAI API key | Yes |
DATABASE_URL |
PostgreSQL connection URL | Yes |
LLM_MODEL |
LLM model for classification/summarization | No (default: gpt-4) |
- Create a Notion integration at https://www.notion.so/my-integrations
- Share your database with the integration
- Get the database ID from the URL:
https://notion.so/workspace/{database_id}?v=...
notion_processing/
βββ notion_processing/
β βββ __init__.py
β βββ models.py # Data models and enums
β βββ database.py # Database configuration and models
β βββ extractor.py # Notion document extraction
β βββ classifier.py # LLM-based classification
β βββ summarizer.py # Weekly summary generation
β βββ pipeline.py # Main pipeline orchestrator
β βββ cli.py # Command-line interface
βββ tests/ # Test suite
βββ main.py # Entry point
βββ pyproject.toml # Dependencies and project config
βββ docker-compose.yml # Database setup
βββ env.example # Environment variables template
βββ Makefile # Development commands
βββ README.md # This file
# Install development dependencies
uv sync --extra dev
# Run tests
uv run pytest tests/ -v
# Run with coverage
uv run pytest tests/ --cov=notion_processing --cov-report=html# Format code
uv run black notion_processing/ tests/
uv run isort notion_processing/ tests/
# Type checking
uv run mypy notion_processing/
# Linting
uv run flake8 notion_processing/ tests/# Show all available commands
make help
# Complete development setup
make dev-setup
# Run quality checks
make quality
# Run tests
make test
# Format code
make formatThis project uses uv for fast Python package management:
# Add a new dependency
uv add package_name
# Add a development dependency
uv add --dev package_name
# Remove a dependency
uv remove package_name
# Update all dependencies
uv lock --upgrade
# Sync dependencies
uv syncThe pipeline uses structured logging with structlog:
- JSON logging for production
- Console logging for development
- Error tracking and debugging information
- Processing statistics and metrics
- Graceful handling of API rate limits
- Retry logic for transient failures
- Detailed error logging and reporting
- Processing status tracking
- Batch processing for efficiency
- Rate limiting for API calls
- Database connection pooling
- Content length limits for LLM calls
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
[Add your license here]
For issues and questions:
- Check the documentation
- Review existing issues
- Create a new issue with detailed information