A modular gRPC-based microservices system that simulates a music streaming analytics pipeline, similar to real-world platforms like Spotify or YouTube Music. This project demonstrates distributed processing, inter-service communication, and containerized deployment using Docker.
This system consists of three gRPC services and a client orchestrator:
| Service | Description | Port |
|---|---|---|
| MapReduceService | Aggregates raw streaming data (song plays) into per-song counts (like total plays per artist/song). | 50051 |
| UserBehaviorService | Analyzes user-level statistics such as total listening time and favorite artist. | 50053 |
| RecommendationService | Uses play counts and user behavior to generate personalized recommendations and trending lists. | 50055 |
| Client | Orchestrates calls to all services, prints analytics results, and saves performance metrics. | โ |
All services communicate using Protocol Buffers (protobuf) definitions found under grpc/generated/.
Client โ MapReduceService โ UserBehaviorService โ RecommendationService โ Client
- Client loads dataset (
data/stream_data.csv). - Sends the data to MapReduceService, which:
- Counts how many times each song was played (
artist - song_id). - Returns aggregated play counts and metrics.
- Counts how many times each song was played (
- Sends the same dataset to UserBehaviorService, which:
- Calculates total listening time per user.
- Determines each userโs top artist.
- Lists top 5 most active users.
- Sends results to RecommendationService, which:
- Identifies trending songs (global top 5).
- Recommends songs not from a userโs top artist.
- Client aggregates all results and writes detailed JSON metrics to
results/run_grpc_metrics.json.
| Category | Details |
|---|---|
| Concurrency | Each gRPC service uses ThreadPoolExecutor for parallel processing. |
| Metrics & Timing | Each service records processing time and outputs JSON metrics. |
| Isolation | Each service runs independently and communicates via defined protobuf schemas. |
| Containerization | All services are Dockerized and orchestrated using docker-compose. |
| Scalable | Can be easily extended for new analytics or recommendation models. |
You can also run all services manually using Python, simulating the distributed setup.
Ensure you have Python 3.8+ installed. Then run:
pip install -r requirements.txtThis installs required packages such as grpcio, protobuf, and grpcio-tools.
If not already generated, create the Python gRPC bindings using:
python generate_proto.pyThis script compiles music_service.proto into music_service_pb2.py and music_service_pb2_grpc.py under grpc/generated/.
Each service must be started in a separate terminal to simulate a distributed microservices environment.
cd grpc/server
python mapreduce_stream_service.pyExpected Output:
[MapReduce] gRPC server started on port 50051
cd grpc/server
python user_behavior_service.pyExpected Output:
[UserBehavior] gRPC server started on port 50053
cd grpc/server
python recommendation_service.pyExpected Output:
[Recommendation] gRPC server started on port 50055
Once all services are running, open a new terminal and execute:
cd grpc/client
python client.pyThis client will:
- Load streaming data from
data/stream_data.csv - Send it sequentially to all 3 services
- Print detailed analytics to the terminal
- Save the results to
results/run_grpc_metrics.json
After completion, open:
results/run_grpc_metrics.json
This file contains complete performance timings and analysis output for all services.
Each service writes runtime metrics to /tmp/ (inside container) and the client aggregates them into a final summary JSON.
{
"processing_time": 0.342,
"count_keys": 142,
"num_users": 55,
"num_trending": 5
}{
"workflow": "Client โ MapReduce โ UserBehavior โ Recommendation โ Client",
"performance": {
"mapreduce_time": 0.34,
"userbehavior_time": 0.28,
"recommendation_time": 0.19,
"total_workflow_time": 0.81
},
"mapreduce_results": {
"top_songs": {"Artist1 - SongA": 53, "Artist2 - SongB": 41}
},
"userbehavior_results": {
"top_users": ["U1", "U2", "U3"]
},
"recommendation_results": {
"trending_songs": ["Artist2 - SongB", "Artist3 - SongC"]
}
}Each service has its own Dockerfile, defined in the docker/ folder, and the setup is orchestrated via docker-compose.grpc.yml.
docker-compose -f docker/docker-compose.grpc.yml up --buildThis command starts the following containers:
- grpc-mapreduce (port
50051) - grpc-userbehavior (port
50053) - grpc-recommendation (port
50055)
All containers are connected on the shared grpc-network bridge.
Once the services are running, execute:
cd grpc/client
python client.pycst352-main/
โโโ data/
โ โโโ stream_data.csv
โโโ docker/
โ โโโ Dockerfile.grpc.mapreduce
โ โโโ Dockerfile.grpc.userbehavior
โ โโโ Dockerfile.grpc.recommendation
โ โโโ docker-compose.grpc.yml
โโโ grpc/
โ โโโ client/
โ โ โโโ main_client.py
โ โโโ server/
โ โ โโโ mapreduce_server.py
โ โ โโโ userbehavior_server.py
โ โ โโโ recommendation_server.py
โ โโโ proto/
โ โ โโโ music_service.proto
โ โโโ generated/
โ โโโ music_service_pb2.py
โ โโโ music_service_pb2_grpc.py
โโโ services/
โโโ requirements.txt
โโโ generate_proto.py
Each service writes runtime metrics to /tmp/ (inside container) and the client aggregates them into a final summary JSON.
{
"processing_time": 0.342,
"count_keys": 142,
"num_users": 55,
"num_trending": 5
}{
"workflow": "Client โ MapReduce โ UserBehavior โ Recommendation โ Client",
"performance": {
"mapreduce_time": 0.34,
"userbehavior_time": 0.28,
"recommendation_time": 0.19,
"total_workflow_time": 0.81
},
"mapreduce_results": {
"top_songs": {"Artist1 - SongA": 53, "Artist2 - SongB": 41}
},
"userbehavior_results": {
"top_users": ["U1", "U2", "U3"]
},
"recommendation_results": {
"trending_songs": ["Artist2 - SongB", "Artist3 - SongC"]
}
}| Aspect | Rating | Comment |
|---|---|---|
| Architecture | โญโญโญโญโ (4.5/5) | Well-structured microservice pattern with clear gRPC communication. |
| Code Quality | โญโญโญโญโ | Modular, clean, and uses concurrency effectively. |
| Scalability | โญโญโญโญโ | Each service is independent and easily deployable. |
| Error Handling | โญโญโญโ | Could improve error propagation from gRPC layers. |
| Documentation | โญโญโญ | Functional; this README provides the missing overview. |
- Add a REST gateway or frontend dashboard for visualization.
- Use persistent storage (e.g., PostgreSQL or Redis) for historical analytics.
- Implement advanced recommendation models using ML libraries.
- Add Prometheus/Grafana for live performance monitoring.
To run locally without Docker:
pip install -r requirements.txt
python generate_proto.py
python grpc/server/mapreduce_server.py
python grpc/server/userbehavior_server.py
python grpc/server/recommendation_server.py
python grpc/client/main_client.pyThis project is for academic and demonstration purposes under the CST352 course. All rights reserved by the original author(s).
- Original repository:
nandanjunior/cst352