- Initialise GitHub repository
- Set up branch protection
- Resolve naming issues and override branch protection for admins
- Create dev/prod branches
- Set up local development environment
- Add initial documentation
- Initialise Go project
- Set up dependency management
- Create project structure
- Add basic configs
- Set up testing framework
- Set up dev/prod environments
- Configure environment variables
- Set up secrets management
- Create Dockerfile and container setup
- Configure Fly.io
- Set up Fly.io account and project
- Configure deployment settings
- Set up environment variables in Fly.io
- Create deployment workflow
- Add health check endpoint monitoring
- Test production deployment
- Initial Sentry.io connection
- Initialise Go project structure and dependencies
- Set up basic API endpoints
- Set up environment variables and configs
- Implement basic health checks and monitoring
- Add basic error monitoring with Sentry
- Set up endpoint performance tracking
- Add graceful shutdown handling
- Implement configuration validation
- Set up Colly crawler configuration
- Implement concurrent crawling logic
- Add basic error handling
- Add rate limiting (fixed client IP detection)
- Add retry logic
- Handle different response types/errors
- Implement cache validation checks
- Add crawler-specific error tracking
- Set up crawler performance monitoring
- Design database schema
- Set up Turso connection and config
- Implement data models and queries
- Add basic error handling
- Add retry logic
- Add database performance monitoring
- Set up query error tracking
- Design job and task data structures
- Implement persistent job storage in database
- Create worker pool for concurrent URL processing
- Add job management API (create, start, cancel, status)
- Implement database retry logic for job operations to handle transient errors
- Enhance error reporting and monitoring
- Implement sitemap.xml parser
- Add URL filtering based on path patterns
- Handle sitemap index files
- Process multiple sitemaps
- Implement robust URL normalisation in sitemap processing
- Add improved error handling for malformed URLs
- Extract links from crawled pages
- Filter links to stay within target domain
- Basic link discovery logic
- Queue discovered links for processing
- Create job endpoints (create/list/get/cancel)
- Add progress calculation and reporting
- Store recent crawled pages in job history
- Implement multi-domain support
- Set up production environment on Fly.io
- Deploy and test rate limiting in production
- Configure auto-scaling rules
- Set up production logging
- Implement monitoring alerts
- Configure backup strategies (Supabase handles automatically)
- Implement caching layer
- Optimise database queries
- Configure rate limiting with proper client IP detection
- Add performance monitoring
- Made decision to switch to postgres at this point
- Set up PostgreSQL on Fly.io
- Create database instance
- Configure connection settings
- Configure security settings
- Implement PostgreSQL schema
- Convert SQLite schema to PostgreSQL syntax
- Add proper indexes
- Implement connection pooling
- Replace database access layer
- Update db package to use PostgreSQL
- Add health checks and monitoring
- Implement efficient error handling
- Implement PostgreSQL-based task queue
- Use row-level locking with SELECT FOR UPDATE SKIP LOCKED
- Optimise for concurrent access
- Plan task prioritisation implementation (docs created)
- Redesign worker pool
- Create single global worker pool
- Implement optimised task acquisition
- Enhanced sitemap processing
- Implement robust URL normalisation
- Add support for relative URLs in sitemaps
- Improve error handling for malformed URLs
- Improve URL validation
- Better handling of URL variations
- Consistent URL formatting throughout the codebase
- Eliminate duplicate code
- Move database operations to a unified interface
- Consolidate similar functions into single implementations
- Move functions to appropriate packages
- Remove global state
- Implement proper dependency injection
- Replace global DB instance with passed parameters
- Improve transaction management with DbQueue
- Standardise naming conventions
- Use consistent function names across packages
- Clarify responsibilities between packages
- Remove redundant worker pool creation
- Eliminate duplicate worker pools in API handlers
- Ensure single global worker pool is used consistently
- Simplify middleware stack
- Reduce excessive transaction monitoring
- Optimise Sentry integrations
- Remove unnecessary wrapping functions
- Clean up API endpoints
- Document endpoints to consolidate or remove
- Plan endpoint implementation simplification
- Standardise error handling approach
- Implementation plan completed in docs/plans/api-cleanup.md
- Fix metrics collection (plan created)
- Document metrics to expose
- Plan for unused metrics tracking removal
- Identify relevant PostgreSQL metrics to add
- Remove depth functionality
- Remove
depthcolumn fromtaskstable - Remove
max_depthcolumn fromjobstable - Update
EnqueueURLsfunction to remove depth parameter - Update type definitions to remove depth fields
- Remove depth-related logic from link discovery process
- Update documentation to remove depth references
- Remove
- Update core endpoints to use new implementation
- Remove SQLite-specific code
- Clean up dependencies and imports
- Update configuration and documentation
- Configure Supabase Auth settings
- Implement JWT validation middleware in Go
- Add social login providers configuration (Google, Facebook, Slack, GitHub, Microsoft, Figma, LinkedIn + Email)
- Set up user session handling and token validation
- Implement comprehensive auth error handling
- Create user registration with auto-organisation creation
- Configure custom domain authentication (hover.auth.goodnative.co)
- Implement account linking for multiple auth providers per user (handled by Supabase Auth via auth.identities table)
- Design user data schema with Row Level Security
- Implement user profile storage
- Add user preferences handling
- Configure PostgreSQL policies for data access
- Create database operations for users and organisations
Organisation model implemented:
- Auto-create organisation when user signs up
- Create shared access to all jobs/tasks/reports within organisation
- Comprehensive RESTful API Infrastructure
- Standardised response format with request IDs and consistent error handling
- Interface-agnostic RESTful endpoints (
/v1/*structure) - Comprehensive middleware stack (CORS, logging, rate limiting)
- Proper HTTP status codes and structured error responses
- Multi-Interface Authentication Foundations
- JWT-based authentication with Supabase integration
- Authentication middleware for protected endpoints
- Dashboard Demonstration Infrastructure
- Working vanilla JavaScript dashboard with modern UI design
- API integration for job statistics and progress tracking
(
/v1/dashboard/stats,/v1/jobs) - Stable production deployment without Web Components dependencies
- Responsive design with professional styling and user experience
- Template + Data Binding Foundation
- Architecture documentation for template-based integration approach
- Attribute-based event handling system (
gnh-action,gnh-data-*) - Event delegation framework for extensible functionality
- Demonstration of template approach in production dashboard
- Core Data Binding Library
- Basic attribute-based event handling (
gnh-action="refresh-dashboard") - JavaScript library for
data-gnh-bindattribute processing - Template engine for
data-gnh-templaterepeated content - Authentication integration with conditional element display
(
data-gnh-auth) - Form handling with
data-gnh-formand validation (data-gnh-validate) - Style and attribute binding (
data-gnh-bind-style,data-gnh-bind-attr)
- Basic attribute-based event handling (
- Enhanced Job Management
- Real-time job progress updates via data binding
- Job creation forms with template-based validation
- Error handling and user feedback systems
- Advanced filtering and search capabilities
- User Experience Features
- Account settings and profile management templates
- Notification system integration
-
Stop duplicate domain crawls oncurrently, close old job
- When creating a job, check if there's an active job for this user
- If so, cancel the old job
-
Task Prioritisation
- Prioritisation by page hierarchy and importance
- Implement link priority ordering for header links (1st: 1.000, 2nd: 0.990, etc.)
- Apply priority ordering logic to all discovered page links
-
Robots.txt Compliance
- Parse and honour robots.txt crawl-delay directives
- Filter URLs against Disallow/Allow patterns before enqueueing
- Cache robots.txt rules at job level to prevent repeated fetches
- Fail manual URL creation if robots.txt cannot be checked
- Filter dynamically discovered links against robots rules
-
URL Processing Enhancements
- Filter out links that are hidden via inline
styleattributes. - Remove anchor links from link discovery
- Support compressed sitemaps (.xml.gz and other formats)
- If sitemap can't be found, setup job with / page and start as normal finding links through pages
- Only store source_url if page was found ON a page and redirect_url if it's a redirect AND it doesn't match the domain/path of the task
- Filter out links that are hidden via inline
-
Considering impact of and plan updates Go v1.25 release
-
Blocking Avoidance
- Series of tweaks to reduce blocking
- Scheduler System Implementation
- Database schema with schedulers table and scheduler_id foreign key
- Support for 6, 12, 24, and 48-hour intervals
- Background service polls for ready schedules every 30 seconds
- Jobs created from schedulers marked with source_type='scheduler'
- Scheduler management API endpoints (create, update, delete, list)
- Dashboard UI for managing schedules (enable/disable, view jobs, delete)
- Schedule dropdown in job creation modal for optional recurring schedules
- Comprehensive error handling with structured logging
- Input validation and rollback logic for failed operations
- Webflow OAuth Connection
- Register as Webflow developer and create App
- OAuth flow with HMAC-signed state for CSRF protection
- Token storage in Supabase Vault with automatic cleanup
- User identity display via
authorized_user:readscope - Dashboard UI showing connection status and username
- Shared OAuth utilities extracted from Slack integration
- Webflow Site Selection
- List user's accessible Webflow sites via
/v2/sitesendpoint - Site picker UI in dashboard connections panel with search/pagination
- Per-site settings stored in
webflow_site_settingstable - Connection management endpoints (list/get/delete)
- List user's accessible Webflow sites via
- Manual Job Triggering (Completed v0.24.0)
- Jobs automatically triggered when schedule or auto-publish enabled
- Jobs can be triggered via scheduler or webhooks
- Show last crawl status (via general job list)
- Scheduling Configuration
- Connect Webflow sites to existing scheduler system
- Schedule dropdown for recurring cache warming (None/6h/12h/24h/48h)
- Per-site schedule management in dashboard
- Automatic scheduler creation/update/deletion based on interval selection
- Run on Publish (Webhooks)
- "Auto-crawl on publish" toggle in site configuration
- Register
site_publishwebhook with Webflow API (per-site control) - Webhook endpoint to receive publish events (org-scoped and legacy token-based)
- Webhook signature verification (NOTE: Webflow v2 doesn't provide signatures yet)
- Trigger cache warming job on verified publish events with auto_publish validation
- Platform-org mapping for workspace-based webhook resolution
- Slack Application Development
- OAuth flow for installing GNH Slack app to workspaces
- Bot tokens stored securely in Supabase Vault
- Auto-linking users to Slack workspaces via database triggers
- Supabase Slack OIDC support for user authentication
- Notification Delivery
- Job completion notifications via Slack DMs
- Error notifications when jobs fail
- API endpoints for workspace management and user preferences
- OAuth Connection Setup (Steps 1-3)
- Google OAuth 2.0 configuration and credentials
- OAuth flow implementation with state token CSRF protection
- Account and property selection functionality
- Token storage in Supabase Vault with refresh logic
- Database schema for
user_ga_connectionstable - Dashboard UI for connecting/disconnecting GA4 properties
- Analytics Data Retrieval (Step 4)
- Implement GA4 Data API client (
analyticsdata/v1beta) - Fetch recent visitor/view data for each page path
- Query metric:
screenPageViewsonly - Support for 7, 28, and 180-day lookback periods
- Scheduled background sync service (opt-in per domain, no sync by default)
- Token refresh mechanism for expired access tokens
- Implement GA4 Data API client (
- Pages Table Integration (Step 5)
- Add analytics columns to
page_analyticstable:-
page_views_7d- Page views (last 7 days) -
page_views_28d- Page views (last 28 days) -
page_views_180d- Page views (last 180 days) -
fetched_at- Timestamp of last GA sync
-
- Atomic upsert logic to merge GA data with existing page records
- Add analytics columns to
- Task Prioritisation Enhancement (Step 6)
- Incorporate page view data into task priority calculation
- Prioritise high-traffic pages for earlier cache warming
- Automatically enabled when domain has linked GA account
- Data Export Integration (Step 7)
- Include page view metrics in CSV/JSON/Excel exports
- Add columns: Views (7d), Views (28d), Views (180d)
- Dashboard displays page view metrics alongside performance data
- GA
- Account settings / management (settings page operational — billing awaits Paddle in 5.2)
- Trigger immediate job when schedule or auto-publish enabled
- Extension Development
- Build Webflow Designer Extension using Designer Extension SDK
- Implement site health metrics display (broken links, slow pages)
- Add job management interface (view status, trigger crawls)
- Configuration panel for schedule and webhook settings
- Integration & Testing
- Connect extension to GNH API via OAuth
- Test extension in Webflow Designer workspace
- Handle error states and loading indicators
- Paddle Integration
- Set up Paddle account and configuration
- Implement subscription webhooks and payment flow
- Create subscription plans and checkout process
- Subscription Management
- Link subscriptions to organisations
- Handle subscription updates and plan changes
- Add subscription status checks
- Usage Tracking & Quotas
- Implement usage counters and basic limits
- Set up usage reporting functionality
- Implement organisation-level usage quotas
- Visual Design System
- Define colour palette, typography, spacing scales
- Create reusable CSS variables and utility classes
- Design logo and favicon assets
- Dashboard Redesign & Polish
- Ensure responsive layout at core to everything
- Optimise elements for dashboard vs. Webflow designer App
- Improve nav bar, settings & notifications layout
- Improve layout consistency and visual hierarchy
- Refine job cards, status indicators, and data tables
- Add loading states, empty states, and transitions
- Error States & Messaging
- Design clear error messages and recovery suggestions
- Improve validation feedback for forms
- Create consistent notification system
- Onboarding Flow
- Quick start flow - Crawl domain & create account
- Marketing page
- Webflow App + auth Webflow, set schedule, publish
- Welcome screen for new users - tick box/dismiss cards
- Quick start guide or tooltip tour
- Crawl domain, create a schedule
- Explain plans & update if required
- View results, export slow and error pages
- Integrate steps GA, Slack, Webflow
- Quick start flow - Crawl domain & create account
- Marketing Infrastructure
- Simple Webflow marketing page with product explanation
- Basic navigation structure and call-to-action
- Quick crawl & account creation
- User documentation and help resources
- Landing pages
- Cache warmer - make your site load faster
- Load speed - find slow pages
- Broken links - find the important ones
- Integrations - Slack, Webflow, Google Analytics
- Pricing page with subscription tiers
Full details in Webflow Marketplace
- Marketplace Preparation
- Complete Webflow App listing (description, screenshots, demo video)
- Prepare support documentation and setup guide
- Create terms of service and privacy policy
- Submission & Approval
- Submit app to Webflow marketplace for review
- Address feedback and make required changes
- Obtain marketplace approval
- Alpha Testing
- Internal testing with team members
- Beta testing with 3-5 friendly Webflow users
- Collect feedback and address critical issues
- Security & Compliance
- Final security audit of authentication flows
- Review RLS policies and data isolation
- Confirm GDPR/privacy compliance basics
- Responsive Design Cleanup
- Audit all pages/layouts at mobile (<480px), tablet (480-960px), and desktop (960px+) breakpoints
- Fix dashboard, settings, job details, and nav for small screens
- Test integration panels (Webflow sites grid, member lists, GA properties) at all breakpoints
- Ensure forms, modals, and toast notifications work on touch devices
- Soft Launch
- Make app available to first 10 users
- Monitor system performance and error rates
- Provide responsive support to early adopters
- Iterative Improvements
- Gather user feedback on critical issues
- Address bugs and usability problems
- Track key metrics (signup rate, job success, retention)
- WordPress Plugin Development
- Create basic WordPress plugin for Hover
- Plugin configuration interface for domain settings
- Display crawl results and statistics in WordPress admin
- Trigger manual crawls from WordPress dashboard
- WordPress.org Submission
- Prepare plugin listing and screenshots
- Submit plugin to WordPress plugin directory
- Address review feedback and obtain approval
- Shopify App Development
- OAuth integration with Shopify
- Embedded app interface for store owners
- Display site health metrics in Shopify admin
- Automatic crawl triggers on theme publish
- Shopify App Store Submission
- Complete app listing with demo and screenshots
- Submit to Shopify App Store for review
- Address feedback and obtain approval
- Slash commands (
/crawl sitedomain.com) - Threading with progress updates
- Interactive message actions
- Organisation-Based Data Model (Completed v0.19.0)
- Implement many-to-many user-organisation relationships
- Create organisation context switching logic
- Implement data isolation between organisations
- Add store/site entity linked to single organisation
- Platform Authentication Adapters
- Shopify OAuth and session management
- WordPress API key integration
- Map platform stores/sites to BB organisations
- Progressive account creation for platform users
- Unified User System
- Single BB user accessible via multiple platforms
- Platform context determines visible organisation
- Shadow accounts for store staff (auto-created on action)
- Account claiming and upgrade flows
- Core JavaScript SDK
- Extract data-binding system into standalone library
- Create platform-agnostic API client
- Implement organisation context management
- Add platform-specific authentication handlers
- Platform Adapters
- Shopify app bridge integration
- WordPress plugin integration helpers
- Platform-specific UI component adapters
- Event handling for platform contexts
- Real-time Features (See
SUPABASE-REALTIME.md) - 60%
COMPLETE
- Real-time notification badge updates via Postgres Changes subscription (v0.20.1)
- Real-time dashboard job list updates via WebSocket subscriptions (v0.20.1)
- Real-time job detail progress updates with per-job subscriptions (v0.20.1)
- Real-time dashboard stats without page refresh (requires API endpoint changes)
- Live presence indicators for multi-user organisations
- Database Optimisation
- Move CPU-intensive analytics queries to PostgreSQL functions
- Optimise task acquisition with database-side logic
- Enhance Row Level Security policies for multi-tenant usage
- Consolidate database connection settings into single configuration location and make them configurable via environment variables (internal/db/db.go:113-115)
- Backend Simplification via Supabase (See
supabase-simplification.md)
- Phase 1: Migrate stuck job cleanup to pg_cron
- Create
run_job_cleanup()PostgreSQL function - Schedule with
cron.schedule('job-cleanup', '* * * * *', ...) - Remove
CleanupStuckJobs()from Go worker monitors (~100 lines)
- Create
- Phase 2: Migrate notification delivery to Edge Functions
- Create
deliver-notificationEdge Function - Update
notify_job_status_change()trigger to call via pg_net - Remove Go notification listener and Slack delivery code (~451 lines)
- Remove
slack-go/slackdependency
- Create
- Phase 1: Migrate stuck job cleanup to pg_cron
- File Storage & Edge Functions
- Store crawler logs, sitemap caches, and error reports in Supabase Storage
- Create Edge Functions for webhook handling and scheduled tasks
- Handle Webflow publish events via Edge Functions
- Add managed Postgres proxy in front of edge/serverless workloads to shield the primary pool
- API Client Libraries
- Enhance core JavaScript client with advanced authentication
- Create interface-specific adapters
- Document API with OpenAPI specification
- Webhook System
- Implement webhook subscription for
site_publishevents - Verify webhook signatures using
x-webflow-signatureheaders - Create webhook system for job completion notifications
- Implement webhook subscription for
- API Key Management
- Create API key system for integrations
- Implement scoped permissions for different interfaces
- 1Password Secrets Management -
Implementation Plan
- Set up 1Password vault structure for Hover
- Configure flyctl shell plugin for local development
- Implement 1Password Service Account for GitHub Actions CI/CD
- Migrate secrets from GitHub Secrets to 1Password
- Database Management
- Set up backup schedule and automated recovery testing
- Implement data retention policies
- Create comprehensive database health monitoring
- Implement burst-protected connection classes (separate Supabase roles/DSNs for batch vs interactive traffic)
- Introduce read replica routing with lag monitoring and primary fallbacks
- Add tenant-level pool quotas with schema/role isolation to enforce fairness
- Scheduling & Automation
- Create configuration UI for scheduling options (completed v0.18.0)
- Implement recurring job scheduler for 6/12/24/48 hour intervals (completed v0.18.0)
- Background service checks for ready schedules every 30 seconds (completed v0.18.0)
- Automatic cache warming based on Webflow publish events
- Monitoring & Reporting
- Fix completion percentage to reflect actual completed vs skipped tasks (not always 100%) (internal/db/db.go:404)
- Publish OTEL metrics for connection pool saturation and wire Grafana alerts
- Incident runbook and escalation checklist
- Minimal status page for alpha
- Frontend Architecture Consideration
- Evaluate Vue/Svelte framework migration if dashboard exceeds 8000 LOC or team scaling requires modern framework (current: 4000 LOC vanilla JS with custom data binding, no build process - consider migration only if actual pain points emerge)
- Core app functionality
- Path inclusion/exclusion rules
- Domain blocklist/allowlist for crawler (prevent crawling specific domains)
- Enhanced Authentication
- Test and refine multi-provider account linking
- Member invitation system for organisations
- Audit & Security Features
- Secure admin endpoints properly with system_role authentication (internal/api/admin.go:11,25)
- GDPR compliance features (data export, deletion audit trails)
- Marketing Infrastructure
- Simple Webflow marketing page with product explanation
- Basic navigation structure and call-to-action
- User documentation and help resources
- Launch Preparation
- Complete marketplace submission process
- Set up support channels and user onboarding
- Implement usage analytics and tracking
- Implement two-tier data storage strategy
- Use Supabase Storage for "hot" data (recent logs, debug files)
- Implement Cloudflare R2 for "cold" storage of historical HTML page captures
- Create automated Go job to handle data lifecycle (e.g., move files > 30 days to R2)
- Update database to track storage location (hot/cold) for each archived file
- Retention policy for alpha
- Auto-delete crawler logs and stored HTML older than 90 days
- Implement Semantic Hashing for change detection -
Implementation Plan
- Add
content_hashandhtml_storage_pathcolumns totaskstable - Add
latest_content_hashcolumn topagestable - Implement HTML parsing and canonical content extraction in Go worker
- Store HTML in Supabase Storage only when semantic hash changes
- Add
- Increase Test Coverage -
Implementation Plan
- Set up Supabase test branch database infrastructure
- Add testify testing framework
- Create simplified test plan (Phase 1: 80-115 lines)
- Implement Phase 1 tests (GetJob, CreateJob, CancelJob, ProcessSitemapFallback)
- Implement integration tests (EnqueueJobURLs)
- Implement unit tests with mocks (CrawlerInterface refactoring)
- Enable Codecov reporting and Test Analytics
- Set up CI/CD with Supabase pooler URLs for IPv4 compatibility
- Fix test environment loading to use .env.test file
- Reorganise testing documentation into modular structure
- Fix critical test issues from expert review (P0/P1 priorities)
- Implement sqlmock tests for database operations
- Create comprehensive mock infrastructure (MockDB, DSN helpers)
- Implement Comprehensive API Testing - ✅ COMPLETED
- Code Quality Improvement - core quality gates now enforced in CI
- Phase 1: Automated formatting and ineffectual assignments cleanup
- Phase 2: Refactor high-complexity functions (processTask, processNextTask completed)
- Add golangci-lint to CI/CD pipeline with Go 1.25 compatibility
- Improve Go Report Card score from C to A
- Track and audit robots.txt filtering decisions
- Add optional logging table for blocked URLs during job processing
- Record URL, path, matching disallow pattern, and job context
- Create admin endpoint to review filtering decisions
- Add metrics for blocked vs allowed URL ratios per domain
- Enable/disable audit logging per job for performance