hover/Roadmap.md at main · Good-Native/hover

✅ Stage 0: Project Setup & Infrastructure

✅ Development Environment Setup

Initialise GitHub repository
Set up branch protection
Resolve naming issues and override branch protection for admins
Create dev/prod branches
Set up local development environment
Add initial documentation

✅ Go Project Structure

✅ Production Infrastructure Setup

✅ Stage 1: Core Setup & Basic Crawling

✅ Core API Implementation

Initialise Go project structure and dependencies
Set up basic API endpoints
Set up environment variables and configs
Implement basic health checks and monitoring
Add basic error monitoring with Sentry
Set up endpoint performance tracking
Add graceful shutdown handling
Implement configuration validation

✅ Enhance Crawler Results

Set up Colly crawler configuration
Implement concurrent crawling logic
Add basic error handling
Add rate limiting (fixed client IP detection)
Add retry logic
Handle different response types/errors
Implement cache validation checks
Add crawler-specific error tracking
Set up crawler performance monitoring

✅ Set up Turso for storing results

✅ Stage 2: Multi-domain Support & Job Queue Architecture

✅ Job Queue Architecture

Design job and task data structures
Implement persistent job storage in database
Create worker pool for concurrent URL processing
Add job management API (create, start, cancel, status)
Implement database retry logic for job operations to handle transient errors
Enhance error reporting and monitoring

✅ Sitemap Integration

Implement sitemap.xml parser
Add URL filtering based on path patterns
Handle sitemap index files
Process multiple sitemaps
Implement robust URL normalisation in sitemap processing
Add improved error handling for malformed URLs

✅ Link Discovery & Crawling

Extract links from crawled pages
Filter links to stay within target domain
Basic link discovery logic
Queue discovered links for processing

✅ Job Management API

Create job endpoints (create/list/get/cancel)
Add progress calculation and reporting
Store recent crawled pages in job history
Implement multi-domain support

✅ Stage 3: PostgreSQL Migration & Performance Optimisation

✅ Fly.io Production Setup

Set up production environment on Fly.io
Deploy and test rate limiting in production
Configure auto-scaling rules
Set up production logging
Implement monitoring alerts
Configure backup strategies (Supabase handles automatically)

✅ Performance Optimisation

Implement caching layer
Optimise database queries
Configure rate limiting with proper client IP detection
Add performance monitoring
Made decision to switch to postgres at this point

✅ PostgreSQL Migration

✅ PostgreSQL Setup and Infrastructure

Set up PostgreSQL on Fly.io
- Create database instance
- Configure connection settings
- Configure security settings

✅ Database Layer Replacement

Implement PostgreSQL schema
- Convert SQLite schema to PostgreSQL syntax
- Add proper indexes
- Implement connection pooling
Replace database access layer
- Update db package to use PostgreSQL
- Add health checks and monitoring
- Implement efficient error handling

✅ Task Queue and Worker Redesign

Implement PostgreSQL-based task queue
- Use row-level locking with SELECT FOR UPDATE SKIP LOCKED
- Optimise for concurrent access
- Plan task prioritisation implementation (docs created)
Redesign worker pool
- Create single global worker pool
- Implement optimised task acquisition

✅ URL Processing Improvements

Enhanced sitemap processing
- Implement robust URL normalisation
- Add support for relative URLs in sitemaps
- Improve error handling for malformed URLs
Improve URL validation
- Better handling of URL variations
- Consistent URL formatting throughout the codebase

✅ Code Refactoring

✅ Code Cleanup

✅ Final Transition

Update core endpoints to use new implementation
Remove SQLite-specific code
Clean up dependencies and imports
Update configuration and documentation

🟡 Stage 4: Core Authentication & MVP Interface

✅ Implement Supabase Authentication

Configure Supabase Auth settings
Implement JWT validation middleware in Go
Add social login providers configuration (Google, Facebook, Slack, GitHub, Microsoft, Figma, LinkedIn + Email)
Set up user session handling and token validation
Implement comprehensive auth error handling
Create user registration with auto-organisation creation
Configure custom domain authentication (hover.auth.goodnative.co)
Implement account linking for multiple auth providers per user (handled by Supabase Auth via auth.identities table)

✅ Connect user data to PostgreSQL

Design user data schema with Row Level Security
Implement user profile storage
Add user preferences handling
Configure PostgreSQL policies for data access
Create database operations for users and organisations

✅ Simple Organisation Sharing

Organisation model implemented:

Auto-create organisation when user signs up
Create shared access to all jobs/tasks/reports within organisation

✅ API-First Architecture Development (Completed v0.4.2)

Comprehensive RESTful API Infrastructure
- Standardised response format with request IDs and consistent error handling
- Interface-agnostic RESTful endpoints (/v1/* structure)
- Comprehensive middleware stack (CORS, logging, rate limiting)
- Proper HTTP status codes and structured error responses
Multi-Interface Authentication Foundations
- JWT-based authentication with Supabase integration
- Authentication middleware for protected endpoints

✅ MVP Interface Development (Completed v0.5.3)

🟡 Template + Data Binding Implementation (Completed v0.5.5)

✅ Task prioritisation & URL processing

✅ Recurring Job Scheduling (Completed v0.18.0)

🟡 Webflow App Integration (Completed v0.23.0)

✅ Slack Integration (Completed v0.20.0)

Slack Application Development
- OAuth flow for installing GNH Slack app to workspaces
- Bot tokens stored securely in Supabase Vault
- Auto-linking users to Slack workspaces via database triggers
- Supabase Slack OIDC support for user authentication
Notification Delivery
- Job completion notifications via Slack DMs
- Error notifications when jobs fail
- API endpoints for workspace management and user preferences

✅ Google Analytics 4 Integration (Completed)

🎯 STAGE 5: MVP LAUNCH PREPARATION (Current)

5.0 Finalise outstanding actions above

GA
Account settings / management (settings page operational — billing awaits Paddle in 5.2)

5.1: Webflow Job Triggering & Polish

5.2: Payment Infrastructure

5.3: Branding & UI Cleanup

5.4: Marketing Page

5.5: Webflow Marketplace Submission

Full details in Webflow Marketplace

Marketplace Preparation
- Complete Webflow App listing (description, screenshots, demo video)
- Prepare support documentation and setup guide
- Create terms of service and privacy policy
Submission & Approval
- Submit app to Webflow marketplace for review
- Address feedback and make required changes
- Obtain marketplace approval

5.6: Pre-Launch Polish & Testing

5.7: Launch & First Customers

Soft Launch
- Make app available to first 10 users
- Monitor system performance and error rates
- Provide responsive support to early adopters
Iterative Improvements
- Gather user feedback on critical issues
- Address bugs and usability problems
- Track key metrics (signup rate, job success, retention)

⚪ Stage 6: Post-MVP Enhancements

🔴 WordPress Integration

WordPress Plugin Development
- Create basic WordPress plugin for Hover
- Plugin configuration interface for domain settings
- Display crawl results and statistics in WordPress admin
- Trigger manual crawls from WordPress dashboard
WordPress.org Submission
- Prepare plugin listing and screenshots
- Submit plugin to WordPress plugin directory
- Address review feedback and obtain approval

🔴 Shopify Integration

Shopify App Development
- OAuth integration with Shopify
- Embedded app interface for store owners
- Display site health metrics in Shopify admin
- Automatic crawl triggers on theme publish
Shopify App Store Submission
- Complete app listing with demo and screenshots
- Submit to Shopify App Store for review
- Address feedback and obtain approval

Slack enhancements

Slash commands (/crawl sitedomain.com)
Threading with progress updates
Interactive message actions

🔴 Multi-Platform Authentication Architecture

🔴 Platform SDK Development

⚪ Stage 7: Scale & Advanced Features

🔴 Supabase Platform Integration

🔴 API & Integration Enhancements

🔴 Infrastructure & Operations

⚪ Stage 7: Feature Refinement & Launch Preparation

🔴 Security & Compliance

Core app functionality
- Path inclusion/exclusion rules
- Domain blocklist/allowlist for crawler (prevent crawling specific domains)
Enhanced Authentication
- Test and refine multi-provider account linking
- Member invitation system for organisations
Audit & Security Features
- Secure admin endpoints properly with system_role authentication (internal/api/admin.go:11,25)
- GDPR compliance features (data export, deletion audit trails)

🔴 Launch & Marketing

Marketing Infrastructure
- Simple Webflow marketing page with product explanation
- Basic navigation structure and call-to-action
- User documentation and help resources
Launch Preparation
- Complete marketplace submission process
- Set up support channels and user onboarding
- Implement usage analytics and tracking

🔴 Data Archiving & Retention

Implement two-tier data storage strategy
- Use Supabase Storage for "hot" data (recent logs, debug files)
- Implement Cloudflare R2 for "cold" storage of historical HTML page captures
- Create automated Go job to handle data lifecycle (e.g., move files > 30 days to R2)
- Update database to track storage location (hot/cold) for each archived file

🟡 Alpha Data Retention

Retention policy for alpha
- Auto-delete crawler logs and stored HTML older than 90 days

🔴 Content Storage & Change Tracking

Implement Semantic Hashing for change detection - Implementation Plan
- Add content_hash and html_storage_path columns to tasks table
- Add latest_content_hash column to pages table
- Implement HTML parsing and canonical content extraction in Go worker
- Store HTML in Supabase Storage only when semantic hash changes

✅ Code Quality & Maintenance (Completed)

🔴 Robots.txt Compliance Auditing

Track and audit robots.txt filtering decisions
- Add optional logging table for blocked URLs during job processing
- Record URL, path, matching disallow pattern, and job context
- Create admin endpoint to review filtering decisions
- Add metrics for blocked vs allowed URL ratios per domain
- Enable/disable audit logging per job for performance

FilesExpand file tree

Roadmap.md

Latest commit

History

Roadmap.md

File metadata and controls

✅ Stage 0: Project Setup & Infrastructure

✅ Development Environment Setup

✅ Go Project Structure

✅ Production Infrastructure Setup

✅ Stage 1: Core Setup & Basic Crawling

✅ Core API Implementation

✅ Enhance Crawler Results

✅ Set up Turso for storing results

✅ Stage 2: Multi-domain Support & Job Queue Architecture

✅ Job Queue Architecture

✅ Sitemap Integration

✅ Link Discovery & Crawling

✅ Job Management API

✅ Stage 3: PostgreSQL Migration & Performance Optimisation

✅ Fly.io Production Setup

✅ Performance Optimisation

✅ PostgreSQL Migration

✅ PostgreSQL Setup and Infrastructure

✅ Database Layer Replacement

✅ Task Queue and Worker Redesign

✅ URL Processing Improvements

✅ Code Refactoring

✅ Code Cleanup

✅ Final Transition

🟡 Stage 4: Core Authentication & MVP Interface

✅ Implement Supabase Authentication

✅ Connect user data to PostgreSQL

✅ Simple Organisation Sharing

✅ API-First Architecture Development (Completed v0.4.2)

✅ MVP Interface Development (Completed v0.5.3)

🟡 Template + Data Binding Implementation (Completed v0.5.5)

✅ Task prioritisation & URL processing

✅ Recurring Job Scheduling (Completed v0.18.0)

🟡 Webflow App Integration (Completed v0.23.0)

✅ Slack Integration (Completed v0.20.0)

✅ Google Analytics 4 Integration (Completed)

🎯 STAGE 5: MVP LAUNCH PREPARATION (Current)

5.0 Finalise outstanding actions above

5.1: Webflow Job Triggering & Polish

5.2: Payment Infrastructure

5.3: Branding & UI Cleanup

5.4: Marketing Page

5.5: Webflow Marketplace Submission

5.6: Pre-Launch Polish & Testing

5.7: Launch & First Customers

⚪ Stage 6: Post-MVP Enhancements

🔴 WordPress Integration

🔴 Shopify Integration

Slack enhancements

🔴 Multi-Platform Authentication Architecture

🔴 Platform SDK Development

⚪ Stage 7: Scale & Advanced Features

🔴 Supabase Platform Integration

🔴 API & Integration Enhancements

🔴 Infrastructure & Operations

⚪ Stage 7: Feature Refinement & Launch Preparation

🔴 Security & Compliance

🔴 Launch & Marketing

🔴 Data Archiving & Retention

🟡 Alpha Data Retention

🔴 Content Storage & Change Tracking

✅ Code Quality & Maintenance (Completed)

🔴 Robots.txt Compliance Auditing