🌐 Read in: English | Português
Farol transforms technical public contract documents from São Paulo's government into accessible, analyzable information using AI-powered analysis, automated risk detection, and full-text search.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Public contracts in Brazil involve billions of taxpayer reais annually, but accessing and understanding this data is difficult for citizens. Contract documents are technical, scattered, and lack contextual analysis.
Farol's mission: Bridge the transparency gap by making public procurement data accessible, searchable, and understandable through AI-powered analysis and anomaly detection.
Automatic generation of plain-language summaries from complex contract documents using LLMs (OpenAI/Anthropic).
Automated risk scoring based on 8 criteria:
- Price outliers
- Amendment frequency
- Single-bid contracts
- Emergency procurement
- Supplier risk flags
- Execution delays
- Value concentration
- Historical patterns
Fast, PostgreSQL-backed search across all contracts, suppliers, and agencies with relevance ranking.
Visualize spending trends, top suppliers, agency activity, and risk distributions.
Open API for programmatic access to all contract data, summaries, and analytics.
- Node.js >= 20
- pnpm >= 9
- PostgreSQL >= 14
- Docker (optional, for local DB)
-
Clone the repository
git clone https://github.com/luansievers/farol.git cd farol -
Install dependencies
pnpm install
-
Configure environment
cp packages/api/.env.example packages/api/.env # Edit packages/api/.env with your settingsRequired environment variables:
DATABASE_URL- PostgreSQL connection stringAI_PROVIDER- "openai" or "anthropic"OPENAI_API_KEYorANTHROPIC_API_KEYSTORAGE_*- S3/MinIO configuration
-
Setup database
pnpm db:migrate # Run migrations pnpm db:seed # Seed initial data (optional)
-
Start development servers
pnpm dev:all # API (port 3000) + Web (port 5173)
docker-compose up -d # Starts PostgreSQL + MinIO
pnpm install
pnpm db:migrate
pnpm dev:all┌────────────────────────┐
│ Public Web UI │
│ (React + Vite) │ ← User-facing interface
└──────────┬─────────────┘
│
┌──────────▼─────────────┐
│ REST API (Hono) │ ← /api/contracts, /api/search
│ Zod + OpenAPI │ /api/suppliers, /api/agencies
└──────────┬─────────────┘
│
┌──────────▼─────────────┐
│ Data Pipeline (ETL) │ ← crawler → detail → parser
│ 8-stage workflow │ → summary → classify → anomaly
└──────────┬─────────────┘
│
┌──────┼──────┐
│ │ │
┌───▼──┐ ┌─▼──┐ ┌▼───┐
│ DB │ │ S3 │ │LLM │
│Prisma│ │/MIN│ │APIs│ ← PostgreSQL, MinIO, OpenAI/Anthropic
└──────┘ └────┘ └────┘
packages/
├── api/ # Hono backend + Prisma ORM
│ ├── src/
│ │ ├── modules/ # Feature modules
│ │ │ ├── api/ # REST endpoints
│ │ │ ├── crawler/ # PNCP data fetching
│ │ │ ├── parser/ # PDF text extraction
│ │ │ ├── summary/ # AI summarization
│ │ │ ├── classification/ # Categorization
│ │ │ ├── anomalies/ # Risk scoring
│ │ │ ├── database/ # Prisma client
│ │ │ ├── storage/ # S3/MinIO
│ │ │ └── ai/ # LLM utilities
│ │ └── generated/ # Prisma types
│ └── prisma/
│ └── schema.prisma
├── web/ # React + TanStack Router/Query
│ ├── src/
│ │ ├── routes/ # File-based routing
│ │ ├── components/ # UI components (shadcn/ui)
│ │ ├── hooks/ # TanStack Query hooks
│ │ └── lib/ # Utilities
└── shared/ # Shared TypeScript types
└── src/
├── dtos/ # Data Transfer Objects
└── enums/ # Shared enums
| Endpoint | Method | Description |
|---|---|---|
/api/contracts |
GET | List contracts with pagination/filters |
/api/contracts/:id |
GET | Get contract details + amendments |
/api/contracts/search |
GET | Full-text search |
/api/suppliers |
GET | List suppliers with stats |
/api/suppliers/:id |
GET | Supplier profile + contracts |
/api/agencies |
GET | List government agencies |
/api/agencies/:id |
GET | Agency profile + contracts |
/api/stats |
GET | Platform-wide statistics |
curl "http://localhost:3000/api/contracts?page=1&limit=20&status=active"{
"data": [
{
"id": "abc123",
"number": "001/2024",
"title": "Serviços de TI",
"value": 500000.00,
"supplier": { "id": "xyz", "name": "Tech Corp" },
"agency": { "id": "def", "name": "PMSP" },
"summary": "Contract for IT infrastructure services...",
"anomalyScore": 65,
"riskLevel": "MEDIUM"
}
],
"pagination": {
"page": 1,
"limit": 20,
"total": 1542
}
}| Criterion | Description | Weight |
|---|---|---|
| Price Outlier | Value 2+ standard deviations above category average | High |
| Amendment Frequency | More than 3 amendments (limit is 25% of value per law) | High |
| Single Bidder | Only one supplier participated in bidding | Medium |
| Emergency Procurement | Contract used emergency justification | Medium |
| Supplier Risk | Supplier has history of penalties/cancellations | High |
| Execution Delay | Contract execution delayed beyond 30 days | Low |
| Value Concentration | Supplier receives >10% of agency's total contracts | Medium |
| Historical Pattern | Deviation from agency's typical spending patterns | Low |
Score ranges:
- 0-30: Low risk (green)
- 31-60: Medium risk (yellow)
- 61-100: High risk (red)
Interactive API docs available at: http://localhost:3000/doc (Swagger UI)
1. crawler → Fetch contract list from PNCP API
2. detail → Fetch detailed contract data
3. parser → Extract text from PDF documents (OCR via tesseract.js)
4. summary → Generate AI summaries
5. classify → Categorize contracts
6. anomaly → Calculate anomaly scores
# Fetch contracts from PNCP
pnpm crawler # Fetch new contracts
pnpm crawler:week # Fetch last 7 days
pnpm crawler:month # Fetch last 30 days
# Fetch contract details
pnpm detail # Fetch details for pending contracts
pnpm detail:batch # Process in batches
pnpm detail:stats # Show processing statistics
pnpm detail:reset # Reset processing status
# Parse PDFs
pnpm parser # Parse pending PDFs
pnpm parser:batch # Process in batches
pnpm parser:stats # Show parsing statistics
pnpm parser:reset # Reset parsing status
# Generate summaries
pnpm summary # Generate summaries
pnpm summary:batch # Process in batches
pnpm summary:stats # Show summary statistics
pnpm summary:reset # Reset summary status
pnpm summary:regen # Regenerate existing summaries
# Classify contracts
pnpm classify # Classify pending contracts
pnpm classify:batch # Process in batches
pnpm classify:stats # Show classification statistics
pnpm classify:reset # Reset classification status
pnpm classify:reclassify # Reclassify all contracts
# Calculate anomalies
pnpm anomaly # Calculate scores
pnpm anomaly:batch # Process in batches
pnpm anomaly:stats # Show anomaly statistics
pnpm anomaly:reset # Reset scores
pnpm anomaly:recalculate # Recalculate all scores
pnpm anomaly:single <id> # Calculate for single contract# Run full pipeline automatically
pnpm auto-update # One-time full update
pnpm auto-update:start # Start continuous updates
pnpm auto-update:stats # Show update statistics# Development
pnpm dev:all # Start API + web in parallel
pnpm dev # API only (http://localhost:3000)
pnpm dev:web # Web only (http://localhost:5173)
# Build & Quality
pnpm build # Build all packages (shared → api → web)
pnpm test # Run vitest tests
pnpm lint # Lint all packages
pnpm typecheck # Type check all packages
# Database
pnpm db:generate # Generate Prisma client (run after schema changes)
pnpm db:migrate # Create/run migrations
pnpm db:studio # Open Prisma Studio UI
pnpm db:seed # Seed database
pnpm db:reset # Reset database (⚠️ deletes all data)Path Aliases:
- API:
@/*→./src/*,@modules/*→./src/modules/* - Web:
@/*→./src/* - Both:
@farol/shared→ shared package
Module Structure (API):
modules/
└── feature-name/
├── controllers/ # HTTP handlers
├── services/ # Business logic
├── dto/
│ ├── request/ # Input DTOs
│ └── response/ # Output DTOs
└── utils/ # Helper functions
Component Structure (Web):
src/
├── routes/ # TanStack Router (file-based)
├── components/ # UI components
│ ├── ui/ # shadcn/ui primitives
│ └── feature-name/ # Feature components
├── hooks/
│ └── queries/ # TanStack Query hooks
└── lib/
├── validations/ # Zod schemas
└── api.ts # API client
- Language: All code (functions, variables, comments, messages) in English
- DTOs: Separate request/response directories per module
- Validation: Zod schemas for API, class-validator for internal
- Database: Always run
pnpm db:generateafter Prisma schema changes - Type Safety: Strict TypeScript, no
any - Naming: Descriptive, imperative for functions (
getUserById, notuser)
Contributions welcome! Areas of interest:
- 🔍 Data Sources: Integrate CEIS, TCU, CNPJ data
- 📊 Analytics: Add new anomaly detection criteria
- 🎨 UI/UX: Improve visualizations and user experience
- 🧪 Testing: Increase test coverage
- 📖 Documentation: Improve guides and API docs
- 🌐 i18n: Internationalization support
- Fork & clone the repository
- Create a branch:
git checkout -b feature/my-feature - Make changes: Follow coding standards
- Test: Run
pnpm test,pnpm lint,pnpm typecheck - Commit: Use clear, imperative messages
feat: add supplier network analysis fix: correct anomaly score calculation docs: update API documentation - Push & PR: Submit pull request with description
<type>: <subject>
[optional body]
Types: feat, fix, docs, style, refactor, test, chore
- Code follows project conventions
- Tests added/updated
- Documentation updated
- No type errors (
pnpm typecheck) - No linting errors (
pnpm lint) - Builds successfully (
pnpm build)
- Integrate CEIS (Cadastro de Empresas Inidôneas)
- Connect TCU open data APIs
- Add CNPJ corporate network analysis
- Cross-reference supplier sanctions
- Supplier network visualization
- Temporal trend analysis
- Price prediction models
- Comparative benchmarking
- Public API with rate limiting
- Data export (CSV, JSON, Excel)
- Email alerts for flagged contracts
- User-submitted anomaly reports
- Auditor dashboard with advanced filters
- Batch analysis tools
- White-label deployment option
- Integration with official oversight systems
- Framework: Hono - Ultra-fast edge runtime
- Database: PostgreSQL + Prisma ORM
- Validation: Zod with OpenAPI generation
- Storage: S3-compatible (MinIO for local dev)
- AI: OpenAI GPT-4 / Anthropic Claude for summaries
- Language: TypeScript 5.9
- Framework: React 19
- Routing: TanStack Router
- State: TanStack Query
- UI: shadcn/ui (Radix + Tailwind CSS)
- Build: Vite 6
- Monorepo: pnpm workspaces
- Testing: Vitest
- Linting: ESLint + Prettier
- CI/CD: GitHub Actions (planned)
- Containers: Docker + Docker Compose
Frontend (Vercel):
vercel deploy --prodBackend (Railway):
- Connect GitHub repository
- Set environment variables
- Deploy from
packages/api
fly deploy --config packages/api/fly.toml# Build
docker build -t farol-api -f packages/api/Dockerfile .
docker build -t farol-web -f packages/web/Dockerfile .
# Run
docker-compose up -d# Database
DATABASE_URL=postgresql://user:pass@host:5432/farol
# AI Provider
AI_PROVIDER=openai
OPENAI_API_KEY=sk-...
# Storage
STORAGE_PROVIDER=s3
STORAGE_ENDPOINT=https://s3.amazonaws.com
STORAGE_BUCKET=farol-documents
STORAGE_ACCESS_KEY=...
STORAGE_SECRET_KEY=...
# Security
JWT_SECRET=...
API_RATE_LIMIT=100Q: How is user privacy handled? A: All data is public information from government sources (PNCP). We don't collect personal data from users.
Q: What license is Farol under? A: GPL-3.0. You can use, modify, and distribute freely, but must open-source derivative works.
Q: How often is data updated?
A: Daily incremental updates. Full refresh weekly. Use pnpm auto-update:start for continuous updates.
Q: Can I use Farol for commercial purposes? A: Yes, under GPL-3.0 terms. You must open-source any modifications.
Q: Can I self-host Farol? A: Yes! See Deployment section. Requires PostgreSQL + Node.js.
Q: How do I report security issues? A: Email security@farol.app (planned) or open a private GitHub advisory.
Q: Why does AI summarization use paid APIs? A: Quality and reliability. We support both OpenAI and Anthropic. Local models (Ollama) planned.
Farol is licensed under GPL-3.0. See LICENSE for full text.
- React: MIT
- Hono: MIT
- Prisma: Apache 2.0
- PostgreSQL: PostgreSQL License
- shadcn/ui: MIT
- TanStack: MIT
Full dependency licenses in node_modules/*/LICENSE.
- PNCP (Plataforma Nacional de Contratações Públicas): Official Brazilian government procurement portal
- Contract data used under open data principles (Lei de Acesso à Informação - LAI)
- Inspired by Operação Serenata de Amor
- Built with support from the Brazilian civic tech community
- API Reference (planned)
- Data Pipeline Guide (planned)
- Deployment Guide (planned)
- PNCP Portal
- Lei 14.133/2021 (Brazilian procurement law)
- TCU Open Data
- Querido Diário - Official gazette monitoring
- Brasil.IO - Brazilian open datasets
- Serenata de Amor - Congressional expense auditing
Built with ❤️ for transparency and civic participation







