Data Engineering specialization capstone: deployable analytics web service exposing BigQuery-backed satellite classification data and serving in-database ML predictions through a Flask API on Google Cloud Platform.
Analytics Applications Engineering (Data Engineering Specialization Capstone)
Northwestern University M.S. in Data Science, Data Engineering specialization: culminating specialization deliverable — a production-style client-server analytics application with REST endpoints, cloud data warehouse integration, BigQuery ML inference, automated testing/linting, and CI/CD deployment to Google App Engine.
- Web service design (Flask): HTTP routes for health check (
/), read-only analytics export (GET /get-model), and CSV upload with prediction response (POST /satellite-model); JSON responses viaapp.response_class - Cloud data warehouse integration (BigQuery):
google-cloud-bigqueryclient; query existingsat_analyticstable; load uploaded CSV into ephemeral tables with autodetect schema,WRITE_TRUNCATE, and timestamped table names - In-database ML inference:
ML.PREDICTagainst pre-trained BigQuery modelsample_modelover uploaded feature columns (AattrthroughF30attr,classas label); return prediction rows as JSON - Packaging and local execution:
gunicornentrypoint for App Engine Flex;run.shfor local dev withGOOGLE_APPLICATION_CREDENTIALS - Software engineering workflow:
Makefiletargets forvenv,install,pytestwith coverage, andpylint; reusablemyrepolibmodule pattern for importable library code and unit tests - CI/CD (CircleCI): dependency cache,
make test,make lint, authenticatedgcloud app deployon merge - Multi-service deployment: separate App Engine service configs for
default,dev, andproduction(app.yaml,servicedev.yaml,serviceprod.yaml);deploy.shdeploys all three
Project domain: satellite analytics classification use case; model.json holds sample feature records aligned with model input schema
| Layer | Tools |
|---|---|
| Language | Python 3 |
| Web framework | Flask, gunicorn |
| Cloud platform | Google Cloud Platform (App Engine Flex, BigQuery, BigQuery ML) |
| Data / ML | google-cloud-bigquery, BigQuery ML.PREDICT |
| Testing / quality | pytest, pytest-cov, pylint, nbval |
| CI/CD | CircleCI (google/cloud-sdk image), gcloud app deploy |
| Config / ops | YAML (app.yaml, service variants), Shell (deploy.sh, run.sh), Makefile |
| Library pattern | myrepolib (importable package + tests) |
Analytics-Applications-Engineering/
├── main.py # Flask app and route definitions
├── ml_satellite.py # BigQuery query, CSV load, ML.PREDICT logic
├── myrepolib/
│ └── main.py # Reusable library stubs (requests helper, test hooks)
├── tests/
│ └── test_main.py # Unit tests for myrepolib
├── app.yaml # App Engine default service
├── servicedev.yaml # App Engine dev service
├── serviceprod.yaml # App Engine production service
├── deploy.sh # Multi-service deploy script
├── run.sh # Local run with credentials
├── Makefile # install, test, lint targets
├── requirements.txt
├── model.json # Sample feature records for model input schema
├── .circleci/config.yml # CI test, lint, deploy pipeline
└── README.md
- Organization: thin Flask layer over BigQuery/ML module; library + test scaffold separated in
myrepolib/ - Reusable modules:
ml_satellite.query_data(),ml_satellite.load_csv();myrepolib.main.myfunc()and HTTP helper stubs - Engineering practice: cloud-hosted analytics API, ephemeral upload tables, JSON serialization of warehouse rows, multi-environment YAML configs, automated deploy after test/lint gate
Course context: Northwestern University, M.S. in Data Science, Data Engineering specialization (Data Engineering specialization capstone)
Repository: https://github.com/EAName/Analytics-Applications-Engineering