As AI companies increasingly rely on large-scale web scraping to train their models, traditional mechanisms for controlling data access (such as robots.txt) are no longer reliable. Many modern crawlers ignore or evade these rules, leaving website owners without meaningful ways to protect their content or understand how it is being used. BotBouncer addresses this gap by introducing an active, user-controlled system for managing and monitoring bot traffic. Our project lets website owners generate custom access policies through a simple interface, then enforces these policies at the network edge by blocking unauthorized bots before they reach the site. We also provide a real-time analytics pipeline that reveals which bots visited, what they attempted to access, and whether they complied with the rules. Together, these tools restore data agency to users and offer a clearer picture of how automated agents interact with the web, making BotBouncer a practical and human-centered step toward modern data governance.
- A comprehensive platform for managing, previewing, publishing, & viewing observability around robots.txt rules and analytics.
- Intended for testing and experimenting with bot blocking, rule composition, and how different user-agents are affected by a robots.txt configuration.
- Loads a live robots.txt (or a custom set of rules) and parses rules into a simple UI.
- Lets you add / remove rules, preview which paths are disallowed for specific user-agents, and publish rule changes to the configured publish endpoint.
- Polished UI built with Vite + React + TypeScript and Tailwind CSS for styling.
- Frontend is client-only; any real publishing/analytics functionality is performed by external APIs hosted behind AWS API Gateway
Where to look
- Main app logic: frontend/src/App.tsx
- Styling: frontend/src/index.css (Tailwind)
- Example env template: frontend/.env.example
Quick start (frontend)
-
Install deps: cd frontend npm install
-
Provide env values (Vite only loads real env files; .env.example is a template):
-
Run dev server: npm run dev Open the app (Vite default: http://localhost:5173)
Environment variables (keys only)
- VITE_LIVE_ROBOTS_URL
- VITE_ANALYTICS_API_URL
- VITE_PUBLISH_API_URL
Below is the system architecture for the project.
CloudFront → S3 logs → SQS → Lambda → DynamoDB; frontend reads robots.txt and calls analytics/publish endpoints (API Gateway).
