Skip to content

hhphan/multi-cloud-serverless-rags

Repository files navigation

multi-cloud-serverless-rags

A production-grade RAG (Retrieval-Augmented Generation) pipeline implemented identically on AWS, Azure, and GCP using a shared adapter pattern. Same core/ logic, three cloud backends, zero code duplication.

Architecture

core/
  interfaces.py   # Embedder, Retriever, Generator ABCs
  chunker.py      # tiktoken 512-token chunks, 50 overlap
  prompt.py       # shared system prompt + build_prompt()
adapters/
  aws/            # BedrockEmbedder, OpenSearchRetriever, BedrockGenerator
  azure/          # AzureFoundryEmbedder, AISearchRetriever, AzureFoundryGenerator
  gcp/            # VertexEmbedder, FirestoreRetriever, GeminiGenerator
Cloud Ingest Embed Vector store Query Region
AWS Glue Python Shell Bedrock Titan V2 (1024-dim) OpenSearch Serverless Lambda + API GW ap-southeast-2
Azure Azure ML Scheduled Job AI Foundry text-embedding-3-large (3072-dim) AI Search (Basic) Azure Functions australiaeast
GCP Vertex AI Custom Training text-embedding-004 (768-dim) Firestore vector search Cloud Functions 2nd gen australia-southeast1

Screenshots

Azure AI Search — 100 documents indexed in mcrag-docs: Azure AI Search

Streamlit app (local) — Azure RAG answering a question: Streamlit local Azure

Deployed on Hugging Face Spaces — Azure: HF Space Azure

Deployed on Hugging Face Spaces — GCP: HF Space GCP


Prerequisites

  • Python 3.12+
  • Terraform >= 1.7
  • Cloud CLIs: aws (configured), az (logged in), gcloud (logged in)
  • pip install -r requirements.txt for the Streamlit app
cp .env.example .env   # fill in values as you deploy each cloud

AWS Setup

1. Configure credentials

aws configure          # enter Access Key ID, Secret, region ap-southeast-2
# or use a named profile:
aws configure --profile mcrag
export AWS_PROFILE=mcrag

Required IAM permissions for the deploying user/role:

  • AdministratorAccess (easiest for initial deploy), or scoped to: iam:*, lambda:*, apigateway:*, glue:*, s3:*, aoss:*, bedrock:*, cloudwatch:*, events:*

2. Deploy infrastructure

cd aws/terraform
terraform init
terraform apply -var-file=envs/dev.tfvars

Creates: OpenSearch Serverless collection, Glue job, Lambda, API Gateway, CloudWatch alarms.

3. Populate .env

python aws/scripts/update_env.py

Sets OPENSEARCH_ENDPOINT, RAG_API_ENDPOINT, RAG_API_KEY.

4. Run ingest

Ingest runs automatically via EventBridge daily at 02:00 UTC.
To trigger manually from the AWS Console → Glue → Jobs → mcrag-* → Run.

5. Query

python scripts/query_cli.py --cloud aws --question "What are recent advances in LLMs?"

AWS .env variables

aws/scripts/update_env.py writes the three deployment values automatically after terraform apply. The rest are defaults you can override in .env.

Variable Source Description
OPENSEARCH_ENDPOINT auto — update_env.py AOSS collection endpoint
RAG_API_ENDPOINT auto — update_env.py API Gateway invoke URL
RAG_API_KEY auto — update_env.py API Gateway API key
AWS_REGION manual AWS region (default: ap-southeast-2)
OPENSEARCH_INDEX manual Index name (default: rag-docs)
GENERATION_MODEL_ID manual Bedrock model (default: amazon.nova-pro-v1:0)
TOP_K manual Number of chunks to retrieve (default: 5)

Azure Setup

1. Configure credentials

az login
az account set --subscription <subscription-id>
az account show   # confirm correct subscription

The deploying identity needs the Contributor role on the subscription (or resource group), plus Cognitive Services Contributor for AI Foundry model deployments.

Register required resource providers if not already enabled:

az provider register --namespace Microsoft.MachineLearningServices
az provider register --namespace Microsoft.CognitiveServices
az provider register --namespace Microsoft.Search
az provider register --namespace Microsoft.Web
az provider register --namespace Microsoft.Insights

2. Build the function package

The query function needs Linux .so wheels. Build them locally (works on Windows too):

python azure/scripts/build_function.py

This cross-compiles manylinux2014_x86_64 wheels and zips the function into azure/build/.

3. Deploy infrastructure

cd azure/terraform
terraform init
terraform apply

Creates: Resource Group, Storage Account, AI Foundry (text-embedding-3-large + gpt-4o), AI Search (Basic), Azure ML workspace + compute cluster, Azure Function App.

4. Populate .env

python azure/scripts/update_env.py

Sets AZURE_AI_FOUNDRY_ENDPOINT, AZURE_AI_FOUNDRY_KEY, AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_KEY, AZURE_FUNCTION_ENDPOINT, AZURE_FUNCTION_CODE.

5. Create the daily ingest schedule

python azure/scripts/create_schedule.py

Schedules the Azure ML job to run daily at 02:00 UTC, fetching arXiv papers and indexing them into AI Search.

6. Query

python scripts/query_cli.py --cloud azure --question "What are recent advances in LLMs?"

Azure .env variables

azure/scripts/update_env.py writes all deployment values automatically after terraform apply.

Variable Source Description
AZURE_AI_FOUNDRY_ENDPOINT auto — update_env.py Cognitive Services endpoint
AZURE_AI_FOUNDRY_KEY auto — update_env.py Cognitive Services API key
AZURE_SEARCH_ENDPOINT auto — update_env.py AI Search endpoint
AZURE_SEARCH_KEY auto — update_env.py AI Search admin key
AZURE_FUNCTION_ENDPOINT auto — update_env.py Function App URL
AZURE_FUNCTION_CODE auto — update_env.py Function-level auth key
AZURE_EMBED_DEPLOYMENT manual Embedding deployment name (default: text-embedding-3-large)
AZURE_CHAT_DEPLOYMENT manual Chat deployment name (default: gpt-4o)
AZURE_SEARCH_INDEX manual Index name (default: mcrag-docs)

GCP Setup

1. Authenticate

gcloud auth application-default login
gcloud config set project <your-project-id>

2. Deploy infrastructure

cd gcp/terraform
terraform init
terraform apply -var-file=envs/dev.tfvars

Edit gcp/terraform/envs/dev.tfvars to set your project_id first.
Creates: Firestore (Native mode) + vector index, Vertex AI staging bucket, Cloud Function 2nd gen, service account + IAM bindings.

3. Populate .env

python gcp/scripts/update_env.py

Sets GCP_PROJECT, GCP_REGION, GCP_FIRESTORE_COLLECTION, GCP_FUNCTION_URL.

4. Run bulk ingest

python gcp/scripts/run_vertex_job.py --max-results 1000

Submits a Vertex AI Custom Training job that fetches arXiv papers, embeds them via text-embedding-004, and writes them to Firestore. Monitor at the GCP Console → Vertex AI → Training.

5. Query

python scripts/query_cli.py --cloud gcp --question "What are recent advances in LLMs?"

GCP .env variables

gcp/scripts/update_env.py writes all deployment values automatically after terraform apply.

Variable Source Description
GCP_PROJECT auto — update_env.py GCP project ID
GCP_REGION auto — update_env.py Region (default: australia-southeast1)
GCP_FIRESTORE_COLLECTION auto — update_env.py Firestore collection (default: mcrag-docs)
GCP_EMBED_MODEL auto — update_env.py Embedding model (default: text-embedding-004)
GCP_CHAT_MODEL auto — update_env.py Gemini model (default: gemini-2.5-flash)
GCP_FUNCTION_URL auto — update_env.py Cloud Function invoke URL

Hugging Face Spaces

The Streamlit app (app/) is deployed on HF Spaces as a public portfolio demo.

1. Create a Space

  1. Go to huggingface.co/new-space
  2. Name it (e.g. multi-cloud-rag-demo)
  3. Under SDK, select Docker → Streamlit
  4. Set visibility to Public
  5. Click Create Space

2. Configure .env

HF_TOKEN=hf_...                          # Settings → Access Tokens → Write token
HF_SPACE=<username>/multi-cloud-rag-demo  # <username>/<space-name>

3. Install dev dependencies

pip install -r requirements-dev.txt

4. Upload the app

python scripts/upload_to_hf_space.py

Reads HF_TOKEN and HF_SPACE from .env, uploads app/ to the Space repo. HF detects the push and rebuilds automatically (1–2 minutes).

5. Add secrets

Go to your Space → Settings → Variables and secrets and add:

Secret Value Used by
RAG_API_ENDPOINT API Gateway invoke URL pages/1_AWS.py
RAG_API_KEY API Gateway API key pages/1_AWS.py
AZURE_FUNCTION_ENDPOINT Azure Function URL pages/2_Azure.py
AZURE_FUNCTION_CODE Azure Function auth key pages/2_Azure.py
GCP_FUNCTION_URL Cloud Function URL pages/3_GCP.py

Click Restart Space after adding secrets.

6. Verify

Open https://huggingface.co/spaces/<username>/multi-cloud-rag-demo:

  • Each cloud page shows a green "Endpoint configured" notice when its secret is set
  • Type a question and press Enter — the app queries the cloud backend and displays the answer with arXiv source IDs
  • Pages without secrets set show a warning and stop gracefully

HF .env variables

Variable Description
HF_TOKEN HF write token (hf_...)
HF_SPACE Space repo ID (e.g. <username>/multi-cloud-rag-demo)

Local development

pip install -r requirements.txt
streamlit run app/Home.py

Reads from .env automatically. Each cloud page is independent — pages for clouds without .env values set will show a warning.

About

Multi-cloud serverless RAG pipeline — ingest arXiv papers into AWS, Azure or GCP and query them in plain English via Hugging Face Spaces

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors