A full-stack synthetic data platform with Retrieval-Augmented Generation (RAG) workflows and advanced multi-agent capabilities for querying financial data using natural language.
This platform demonstrates:
- RAG Workflow: Semantic retrieval + SQL generation from natural language
- Multi-Agent System: Orchestrated agents for retrieval, analysis, and enrichment
- Financial Data: 5+ tables with 5000+ rows of financial and portfolio data
- Vector Search: FAISS-based semantic search over database schema
- API Enrichment: Integration with Yahoo Finance and SEC EDGAR
backend/
├── api/ # FastAPI endpoints
├── db/ # Database models and connection
├── rag/ # Vector store, embeddings, SQL generation
├── agents/ # Multi-agent orchestration system
└── utils/ # Data loading utilities
- Retrieval Agent: Semantic search → SQL generation → Query execution
- Analysis Agent: Summarization, insights, and reasoning
- Enrichment Agent: Fetches external data (Yahoo Finance, SEC EDGAR)
companies- Company information and metadatafinancial_statements- Income statements, balance sheets, cash flow (5000+ rows)portfolio_companies- PE fund portfolio trackingperformance_metrics- ARR, MRR, churn, CAC, LTV (5000+ rows)market_data- Historical stock pricesquery_logs- Query history and debugging
- Python 3.8+ with pip
- Node.js 16+ and npm
- PostgreSQL 12+
Create a PostgreSQL database:
createdb RAGUpdate .env with your database credentials (already configured):
DATABASE_URL=postgresql://postgres:1234@localhost:5432/RAG
OPENAI_API_KEY=your_openai_key
SEC_EDGAR_API_KEY=your_sec_edgar_key
Install Python dependencies:
cd backend
pip install -r requirements.txtInitialize database and load data:
python setup_data.pyThis will:
- Create database tables
- Load Excel data (if available)
- Synthesize financial data from Yahoo Finance for 10 companies
- Generate 5000+ performance metrics
- Index database schema into FAISS vector store
Start the FastAPI server:
uvicorn api.main:app --reload --host 0.0.0.0 --port 8000API will be available at: http://localhost:8000
Install dependencies and start development server:
npm install
npm run devFrontend will be available at: http://localhost:5173
The platform can answer questions like:
- "What are the total liabilities in Company X?"
- "What's the YoY revenue growth in 2024?"
- "Show me all portfolio companies with ARR over 1M"
- "What is the average churn rate across all companies?"
- "List companies in the technology sector"
- "What's the current stock price for AAPL?"
POST /api/query- Submit natural language queryGET /api/history- View query historyGET /api/stats- Platform statisticsGET /api/health- Health checkPOST /api/index-schema- Re-index database schema
{
"success": true,
"query": "What are the total liabilities?",
"sql": "SELECT SUM(total_liabilities) FROM financial_statements",
"answer": "The total liabilities amount to $5.2M",
"summary": "Query executed successfully...",
"insights": ["Finding 1", "Finding 2"],
"data": [...],
"row_count": 10,
"relevant_tables": ["financial_statements"],
"enriched_data": {},
"execution_time_ms": 245.5,
"agent_flow": {
"retrieval": "completed",
"analysis": "completed",
"enrichment": "skipped"
}
}- FastAPI - Modern web framework
- SQLAlchemy - ORM for PostgreSQL
- OpenAI - GPT-4 for SQL generation and analysis
- FAISS - Vector similarity search
- Pandas - Data manipulation
- yfinance - Yahoo Finance API integration
- React 19 - UI framework
- Vite - Build tool and dev server
- User submits natural language query
- Vector search finds relevant database tables/columns
- GPT-4 generates SQL query with context
- Query executes against PostgreSQL
- Results analyzed and summarized by AI
- External APIs enriched data when relevant
- Generated SQL displayed
- Execution time tracked
- Relevant tables shown
- Agent reasoning logged
- All queries logged to database
- Excel Import: Portfolio company data
- Yahoo Finance: Real-time stock data, financials
- SEC EDGAR: Company filings
- Synthetic Data: 5000+ performance metrics
from backend.utils.data_loader import DataLoader
from backend.db.database import SessionLocal
db = SessionLocal()
loader = DataLoader(db)
loader.synthesize_financial_data(['TSLA', 'NVDA'], num_years=2)After database changes:
curl -X POST http://localhost:8000/api/index-schema.
├── backend/
│ ├── api/main.py # FastAPI application
│ ├── agents/
│ │ ├── retrieval_agent.py
│ │ ├── analysis_agent.py
│ │ ├── enrichment_agent.py
│ │ └── orchestrator.py
│ ├── rag/
│ │ ├── embeddings.py
│ │ ├── vector_store.py
│ │ ├── schema_indexer.py
│ │ └── sql_generator.py
│ ├── db/
│ │ ├── models.py
│ │ └── database.py
│ └── utils/
│ └── data_loader.py
├── src/
│ ├── App.jsx # React UI
│ └── App.css
└── README.md
- Add more sophisticated query planning
- Implement query result caching
- Add support for complex joins across 3+ tables
- Implement policy agent for query validation
- Add real-time data streaming
- Support for multi-turn conversations
- Add authentication and user management
- Deploy to cloud platform
MIT