🎵 Music Streaming Analytics (gRPC Microservices)

A modular gRPC-based microservices system that simulates a music streaming analytics pipeline, similar to real-world platforms like Spotify or YouTube Music. This project demonstrates distributed processing, inter-service communication, and containerized deployment using Docker.

🧩 High-Level Architecture

This system consists of three gRPC services and a client orchestrator:

Service	Description	Port
MapReduceService	Aggregates raw streaming data (song plays) into per-song counts (like total plays per artist/song).	`50051`
UserBehaviorService	Analyzes user-level statistics such as total listening time and favorite artist.	`50053`
RecommendationService	Uses play counts and user behavior to generate personalized recommendations and trending lists.	`50055`
Client	Orchestrates calls to all services, prints analytics results, and saves performance metrics.	—

All services communicate using Protocol Buffers (protobuf) definitions found under grpc/generated/.

⚙️ Pipeline Overview

Client → MapReduceService → UserBehaviorService → RecommendationService → Client

Client loads dataset (data/stream_data.csv).
Sends the data to MapReduceService, which:
- Counts how many times each song was played (artist - song_id).
- Returns aggregated play counts and metrics.
Sends the same dataset to UserBehaviorService, which:
- Calculates total listening time per user.
- Determines each user’s top artist.
- Lists top 5 most active users.
Sends results to RecommendationService, which:
- Identifies trending songs (global top 5).
- Recommends songs not from a user’s top artist.
Client aggregates all results and writes detailed JSON metrics to results/run_grpc_metrics.json.

🧠 Key Features

Category	Details
Concurrency	Each gRPC service uses `ThreadPoolExecutor` for parallel processing.
Metrics & Timing	Each service records processing time and outputs JSON metrics.
Isolation	Each service runs independently and communicates via defined protobuf schemas.
Containerization	All services are Dockerized and orchestrated using `docker-compose`.
Scalable	Can be easily extended for new analytics or recommendation models.

🧭 Running Locally (Without Docker)

You can also run all services manually using Python, simulating the distributed setup.

Step 1: Install Dependencies

Ensure you have Python 3.8+ installed. Then run:

pip install -r requirements.txt

This installs required packages such as grpcio, protobuf, and grpcio-tools.

Step 2: Generate gRPC Code from .proto File

If not already generated, create the Python gRPC bindings using:

python generate_proto.py

This script compiles music_service.proto into music_service_pb2.py and music_service_pb2_grpc.py under grpc/generated/.

Step 3: Start All gRPC Services

Each service must be started in a separate terminal to simulate a distributed microservices environment.

🧮 Terminal 1 — MapReduce Service

cd grpc/server
python mapreduce_stream_service.py

Expected Output:

[MapReduce] gRPC server started on port 50051

👥 Terminal 2 — UserBehavior Service

cd grpc/server
python user_behavior_service.py

Expected Output:

[UserBehavior] gRPC server started on port 50053

🎧 Terminal 3 — Recommendation Service

cd grpc/server
python recommendation_service.py

Expected Output:

[Recommendation] gRPC server started on port 50055

Step 4: Run the Client

Once all services are running, open a new terminal and execute:

cd  grpc/client
python client.py

This client will:

Load streaming data from data/stream_data.csv
Send it sequentially to all 3 services
Print detailed analytics to the terminal
Save the results to results/run_grpc_metrics.json

Step 5: View Results

After completion, open:

results/run_grpc_metrics.json

This file contains complete performance timings and analysis output for all services.

📊 Output Metrics

Each service writes runtime metrics to /tmp/ (inside container) and the client aggregates them into a final summary JSON.

Example Service Metrics:

{
  "processing_time": 0.342,
  "count_keys": 142,
  "num_users": 55,
  "num_trending": 5
}

Final Aggregated Output (`results/run_grpc_metrics.json`):

{
  "workflow": "Client → MapReduce → UserBehavior → Recommendation → Client",
  "performance": {
    "mapreduce_time": 0.34,
    "userbehavior_time": 0.28,
    "recommendation_time": 0.19,
    "total_workflow_time": 0.81
  },
  "mapreduce_results": {
    "top_songs": {"Artist1 - SongA": 53, "Artist2 - SongB": 41}
  },
  "userbehavior_results": {
    "top_users": ["U1", "U2", "U3"]
  },
  "recommendation_results": {
    "trending_songs": ["Artist2 - SongB", "Artist3 - SongC"]
  }
}

🐳 Docker Setup

Each service has its own Dockerfile, defined in the docker/ folder, and the setup is orchestrated via docker-compose.grpc.yml.

Run the system with Docker Compose:

docker-compose -f docker/docker-compose.grpc.yml up --build

This command starts the following containers:

grpc-mapreduce (port 50051)
grpc-userbehavior (port 50053)
grpc-recommendation (port 50055)

All containers are connected on the shared grpc-network bridge.

Run client locally:

Once the services are running, execute:

cd grpc/client
python client.py

📁 Folder Structure

cst352-main/
├── data/
│   └── stream_data.csv
├── docker/
│   ├── Dockerfile.grpc.mapreduce
│   ├── Dockerfile.grpc.userbehavior
│   ├── Dockerfile.grpc.recommendation
│   └── docker-compose.grpc.yml
├── grpc/
│   ├── client/
│   │   └── main_client.py
│   ├── server/
│   │   ├── mapreduce_server.py
│   │   ├── userbehavior_server.py
│   │   └── recommendation_server.py
│   ├── proto/
│   │   └── music_service.proto
│   └── generated/
│       ├── music_service_pb2.py
│       └── music_service_pb2_grpc.py
├── services/
├── requirements.txt
└── generate_proto.py

📊 Output Metrics

Each service writes runtime metrics to /tmp/ (inside container) and the client aggregates them into a final summary JSON.

Example Service Metrics:

{
  "processing_time": 0.342,
  "count_keys": 142,
  "num_users": 55,
  "num_trending": 5
}

Final Aggregated Output (`results/run_grpc_metrics.json`):

{
  "workflow": "Client → MapReduce → UserBehavior → Recommendation → Client",
  "performance": {
    "mapreduce_time": 0.34,
    "userbehavior_time": 0.28,
    "recommendation_time": 0.19,
    "total_workflow_time": 0.81
  },
  "mapreduce_results": {
    "top_songs": {"Artist1 - SongA": 53, "Artist2 - SongB": 41}
  },
  "userbehavior_results": {
    "top_users": ["U1", "U2", "U3"]
  },
  "recommendation_results": {
    "trending_songs": ["Artist2 - SongB", "Artist3 - SongC"]
  }
}

🔍 Evaluation Summary

Aspect	Rating	Comment
Architecture	⭐⭐⭐⭐☆ (4.5/5)	Well-structured microservice pattern with clear gRPC communication.
Code Quality	⭐⭐⭐⭐☆	Modular, clean, and uses concurrency effectively.
Scalability	⭐⭐⭐⭐☆	Each service is independent and easily deployable.
Error Handling	⭐⭐⭐☆	Could improve error propagation from gRPC layers.
Documentation	⭐⭐⭐	Functional; this README provides the missing overview.

🚀 Future Improvements

Add a REST gateway or frontend dashboard for visualization.
Use persistent storage (e.g., PostgreSQL or Redis) for historical analytics.
Implement advanced recommendation models using ML libraries.
Add Prometheus/Grafana for live performance monitoring.

📦 Requirements

To run locally without Docker:

pip install -r requirements.txt
python generate_proto.py
python grpc/server/mapreduce_server.py
python grpc/server/userbehavior_server.py
python grpc/server/recommendation_server.py
python grpc/client/main_client.py

🧾 License

🧑‍💻 Authors

Original repository: nandanjunior/cst352

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
docker		docker
grpc		grpc
grpc_mixed		grpc_mixed
rest		rest
services		services
xmlrpc		xmlrpc
.gitignore		.gitignore
README.md		README.md
generate_proto.py		generate_proto.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎵 Music Streaming Analytics (gRPC Microservices)

🧩 High-Level Architecture

⚙️ Pipeline Overview

🧠 Key Features

🧭 Running Locally (Without Docker)

Step 1: Install Dependencies

Step 2: Generate gRPC Code from .proto File

Step 3: Start All gRPC Services

🧮 Terminal 1 — MapReduce Service

👥 Terminal 2 — UserBehavior Service

🎧 Terminal 3 — Recommendation Service

Step 4: Run the Client

Step 5: View Results

📊 Output Metrics

Example Service Metrics:

Final Aggregated Output (`results/run_grpc_metrics.json`):

🐳 Docker Setup

Run the system with Docker Compose:

Run client locally:

📁 Folder Structure

📊 Output Metrics

Example Service Metrics:

Final Aggregated Output (`results/run_grpc_metrics.json`):

🔍 Evaluation Summary

🚀 Future Improvements

📦 Requirements

🧾 License

🧑‍💻 Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎵 Music Streaming Analytics (gRPC Microservices)

🧩 High-Level Architecture

⚙️ Pipeline Overview

🧠 Key Features

🧭 Running Locally (Without Docker)

Step 1: Install Dependencies

Step 2: Generate gRPC Code from .proto File

Step 3: Start All gRPC Services

🧮 Terminal 1 — MapReduce Service

👥 Terminal 2 — UserBehavior Service

🎧 Terminal 3 — Recommendation Service

Step 4: Run the Client

Step 5: View Results

📊 Output Metrics

Example Service Metrics:

Final Aggregated Output (results/run_grpc_metrics.json):

🐳 Docker Setup

Run the system with Docker Compose:

Run client locally:

📁 Folder Structure

📊 Output Metrics

Example Service Metrics:

Final Aggregated Output (results/run_grpc_metrics.json):

🔍 Evaluation Summary

🚀 Future Improvements

📦 Requirements

🧾 License

🧑‍💻 Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Final Aggregated Output (`results/run_grpc_metrics.json`):

Final Aggregated Output (`results/run_grpc_metrics.json`):

Packages