An end-to-end modern data platform that transforms raw e-commerce data into AI-powered customer intelligence using BigQuery, dbt, Gemini, RAG, Vector Search, and Streamlit.
The platform enables business users to ask natural language questions about customers, churn risk, customer value, and marketing recommendations.
This project demonstrates a complete Analytics Engineering + AI workflow:
- Data ingestion
- Cloud data warehouse modeling
- dbt transformations
- Data quality testing
- Customer intelligence marts
- AI-ready semantic layer
- RAG pipeline
- Conversational analytics interface
Data Sources
Customers | Orders | Products | Campaigns
|
βΌ
BigQuery Raw Layer
|
βΌ
dbt Core
|
βββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ
Staging Models Intermediate Models
stg_customers int_customer_metrics
stg_orders int_customer_engagement
stg_campaigns int_campaign_performance
|
βΌ
Data Marts
customer_intelligence
campaign_intelligence
mart_customer_360
mart_ai_marketing_copilot
|
βΌ
AI / RAG Layer
Gemini Embeddings
|
βΌ
Chroma Vector Database
|
βΌ
RAG Engine
|
βΌ
Gemini LLM
|
βΌ
Streamlit AI Copilot
- Python 3.11
- Google Cloud Platform
- BigQuery
- dbt Core
- dbt BigQuery Adapter
- dbt Tests
- dbt Documentation
- Gemini Embeddings
- LangChain
- Chroma Vector Database
- Retrieval Augmented Generation (RAG)
- Streamlit
E-commerce Analytics
β
βββ dbt/
β |
β βββ models/
β β
β βββ staging/
β βββ intermediate/
β βββ marts/
β |
β βββ ai/
β βββ customer_intelligence.sql
β βββ customer_score.sql
β βββ llm_context_text.sql
β βββ mart_customer_360.sql
β βββ mart_ai_marketing_copilot.sql
β
β
βββ ai_engine/
β
β βββ bigquery_client.py
β βββ embeddings.py
β βββ rag_engine.py
β βββ prompts.py
β
β
βββ streamlit_app/
β
β βββ app.py
β
β
βββ requirements.txt
βββ README.md
βββ .env
E-commerce datasets:
- Customers
- Orders
- Products
- Campaigns
- User activity
Cleans and standardizes raw tables.
Example:
raw_customers
β
stg_customers
Creates reusable business logic:
- Customer engagement
- Purchase behavior
- Campaign metrics
Business-ready datasets:
Features:
- Customer Lifetime Value
- Purchase behavior
- Engagement segment
- Churn indicators
- Customer priority score
dbt creates LLM-ready text:
Example:
Customer 1024 is a premium customer.
High lifetime value.
Low churn risk.
Recommended campaign: Loyalty offer.
Generated using Gemini:
Customer Text
β
Gemini Embedding Model
β
Vector Representation
β
Chroma Database
User asks:
Which customers are likely to churn?
Process:
Question
β
Vector Search
β
Retrieve similar customer profiles
β
Gemini LLM
β
Business recommendation
Example output:
High risk customers:
Customer 2041
Reason:
- Low engagement
- No recent orders
Recommended Action:
Send win-back campaign
Run application:
streamlit run streamlit_app/app.pyUsers can ask:
- Which customers may churn?
- Who are my highest value customers?
- Recommend marketing actions
- Which campaign should I prioritize?
Create environment:
python3.11 -m venv venv
source venv/bin/activateInstall dependencies:
pip install -r requirements.txtTest connection:
dbt debugRun models:
dbt buildGenerate docs:
dbt docs generate
dbt docs servepython -m ai_engine.embeddingsstreamlit run streamlit_app/app.py- BigQuery Vector Search
- Vertex AI deployment
- BigQuery ML churn prediction
- Airflow orchestration
- CI/CD pipeline
- Docker + Cloud Run
- Real-time event streaming
Build a production-style AI analytics platform combining:
Data Engineering + Analytics Engineering + Generative AI
Raw Data β BigQuery β dbt β AI Marts β RAG β AI Copilot