Skip to content

Uma-Obbani/analytics-engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›’ E-Commerce AI Analytics Copilot

An end-to-end modern data platform that transforms raw e-commerce data into AI-powered customer intelligence using BigQuery, dbt, Gemini, RAG, Vector Search, and Streamlit.

The platform enables business users to ask natural language questions about customers, churn risk, customer value, and marketing recommendations.


πŸš€ Project Overview

This project demonstrates a complete Analytics Engineering + AI workflow:

  • Data ingestion
  • Cloud data warehouse modeling
  • dbt transformations
  • Data quality testing
  • Customer intelligence marts
  • AI-ready semantic layer
  • RAG pipeline
  • Conversational analytics interface

πŸ—οΈ Architecture

                 Data Sources

        Customers | Orders | Products | Campaigns

                         |
                         β–Ό

                  BigQuery Raw Layer

                         |
                         β–Ό

                     dbt Core

                         |

        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚                                 β”‚
        β–Ό                                 β–Ό

   Staging Models                 Intermediate Models

   stg_customers                  int_customer_metrics
   stg_orders                     int_customer_engagement
   stg_campaigns                  int_campaign_performance


                         |
                         β–Ό

                    Data Marts

             customer_intelligence

             campaign_intelligence

             mart_customer_360

             mart_ai_marketing_copilot


                         |
                         β–Ό

                  AI / RAG Layer


          Gemini Embeddings

                  |
                  β–Ό

          Chroma Vector Database

                  |
                  β–Ό

             RAG Engine

                  |
                  β–Ό

            Gemini LLM

                  |
                  β–Ό

          Streamlit AI Copilot


πŸ› οΈ Tech Stack

Data Engineering

  • Python 3.11
  • Google Cloud Platform
  • BigQuery

Analytics Engineering

  • dbt Core
  • dbt BigQuery Adapter
  • dbt Tests
  • dbt Documentation

AI Engineering

  • Gemini Embeddings
  • LangChain
  • Chroma Vector Database
  • Retrieval Augmented Generation (RAG)

Application Layer

  • Streamlit

πŸ“ Project Structure

E-commerce Analytics

β”‚
β”œβ”€β”€ dbt/
β”‚   |
β”‚   β”œβ”€β”€ models/
β”‚   β”‚
β”‚   β”œβ”€β”€ staging/
β”‚   β”œβ”€β”€ intermediate/
β”‚   └── marts/
β”‚        |
β”‚        └── ai/
β”‚             β”œβ”€β”€ customer_intelligence.sql
β”‚             β”œβ”€β”€ customer_score.sql
β”‚             β”œβ”€β”€ llm_context_text.sql
β”‚             β”œβ”€β”€ mart_customer_360.sql
β”‚             └── mart_ai_marketing_copilot.sql
β”‚
β”‚
β”œβ”€β”€ ai_engine/
β”‚
β”‚   β”œβ”€β”€ bigquery_client.py
β”‚   β”œβ”€β”€ embeddings.py
β”‚   β”œβ”€β”€ rag_engine.py
β”‚   └── prompts.py
β”‚
β”‚
β”œβ”€β”€ streamlit_app/
β”‚
β”‚   └── app.py
β”‚
β”‚
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
└── .env


πŸ”„ Data Pipeline Flow

1. Raw Data

E-commerce datasets:

  • Customers
  • Orders
  • Products
  • Campaigns
  • User activity

2. dbt Transformation Layer

Staging Layer

Cleans and standardizes raw tables.

Example:

raw_customers

        ↓

stg_customers


Intermediate Layer

Creates reusable business logic:

  • Customer engagement
  • Purchase behavior
  • Campaign metrics

Mart Layer

Business-ready datasets:

Customer Intelligence

Features:

  • Customer Lifetime Value
  • Purchase behavior
  • Engagement segment
  • Churn indicators
  • Customer priority score

πŸ€– AI Layer

AI Context Generation

dbt creates LLM-ready text:

Example:

Customer 1024 is a premium customer.
High lifetime value.
Low churn risk.
Recommended campaign: Loyalty offer.

Embeddings

Generated using Gemini:

Customer Text

      ↓

Gemini Embedding Model

      ↓

Vector Representation

      ↓

Chroma Database


πŸ”Ž RAG Workflow

User asks:

Which customers are likely to churn?

Process:

Question

   ↓

Vector Search

   ↓

Retrieve similar customer profiles

   ↓

Gemini LLM

   ↓

Business recommendation

Example output:

High risk customers:

Customer 2041

Reason:
- Low engagement
- No recent orders

Recommended Action:
Send win-back campaign


πŸ’¬ Streamlit AI Copilot

Run application:

streamlit run streamlit_app/app.py

Users can ask:

  • Which customers may churn?
  • Who are my highest value customers?
  • Recommend marketing actions
  • Which campaign should I prioritize?

βš™οΈ Setup

Create environment:

python3.11 -m venv venv

source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

dbt Commands

Test connection:

dbt debug

Run models:

dbt build

Generate docs:

dbt docs generate

dbt docs serve

Build Vector Store

python -m ai_engine.embeddings

Start AI Copilot

streamlit run streamlit_app/app.py

Future Enhancements

  • BigQuery Vector Search
  • Vertex AI deployment
  • BigQuery ML churn prediction
  • Airflow orchestration
  • CI/CD pipeline
  • Docker + Cloud Run
  • Real-time event streaming

Project Goal

Build a production-style AI analytics platform combining:

Data Engineering + Analytics Engineering + Generative AI

Raw Data β†’ BigQuery β†’ dbt β†’ AI Marts β†’ RAG β†’ AI Copilot

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors