diff --git a/.gitignore b/.gitignore index ed8ebf583..484d0ac0b 100644 --- a/.gitignore +++ b/.gitignore @@ -1 +1,21 @@ -__pycache__ \ No newline at end of file +__pycache__ +.idea +.claude/ +books_database/state/ + +# Local-only working files (operator demo scripts, scratch notes). The +# entire folder is excluded so anything dropped here stays off the +# remote branch; the team-lead-only checkpoint-4-script.tex lives here. +local-only/ + +# Local-only development notes (kept out of the remote repository). +# All .md files except the root README.md are excluded so that human +# reviewers see a single concise document rather than scattered notes. +# The four CP4 deliverable docs are committed (the rubric grades them). +# The team-lead review .md is also committed; the team lead removes it +# before merging the PR to master. +*.md +!/README.md +!docs/checkpoint-4-evaluation.md +!docs/checkpoint-4-architecture.md +!docs/checkpoint-4-review.md \ No newline at end of file diff --git a/README.md b/README.md index f7f53570f..23a0b361a 100644 --- a/README.md +++ b/README.md @@ -1,46 +1,389 @@ -# Distributed Systems @ University of Tartu +# Distributed Systems Practice -This repository contains the initial code for the practice sessions of the Distributed Systems course at the University of Tartu. +A distributed online-bookshop checkout system built across the four +checkpoints of the *Distributed Systems Practice* course. The current +build (Checkpoint 4) ships: -## Getting started +- A 13-service application stack (orchestrator, three input/fraud + backends, an in-memory order queue, three bully-elected order + executors, a payment service, and three primary-backup replicated + books-database replicas), each containerised and wired via gRPC. +- OpenTelemetry instrumentation on the four hot services + (orchestrator, executor, books-database, payment) exporting traces + and metrics over OTLP/HTTP. +- A pre-provisioned Grafana stack (Grafana + Prometheus + Tempo + + Loki via the `grafana/otel-lgtm` all-in-one image) with a 12-panel + Checkpoint 4 dashboard. +- An automated end-to-end test suite covering four scenarios + (single clean order, multiple non-conflicting concurrent orders, + mixed fraudulent/clean orders, conflicting orders on the same + title). +- An open-loop load harness with three modes (constant, step, + spike) and a captured baseline of CSV runs under + [load_test/results/](load_test/results/). -### Overview +Detailed design and analysis live in the [docs/](docs/) folder; the +two most relevant entry points are +[docs/checkpoint-4-architecture.md](docs/checkpoint-4-architecture.md) +(system diagram, port table, telemetry pipeline) -The code consists of multiple services. Each service is located in a separate folder. The `frontend` service folder contains a Dockerfile and the code for an example bookstore application. Each backend service folder (e.g. `orchestrator` or `fraud_detection`) contains a Dockerfile, a requirements.txt file and the source code of the service. During the practice sessions, you will implement the missing functionality in these backend services, or extend the backend with new services. +--- -There is also a `utils` folder that contains some helper code or specifications that are used by multiple services. Check the `utils` folder for more information. +## Prerequisites -### Running the code with Docker Compose [recommended] +| Component | Minimum version | Used for | +|-----------|-----------------|----------| +| Docker Desktop / Docker Engine + Compose v2 | 20.10 / 2.20 | Building and running the stack | +| Python | 3.10 | End-to-end test suite, load harness, helper scripts | +| PowerShell | 7.x (`pwsh`) | Helper scripts under `scripts/` | +| Free TCP ports | `3000`, `4317`, `4318`, `8080`, `8081` | Grafana UI, OTLP gRPC, OTLP HTTP, frontend, orchestrator | -To run the code, you need to clone this repository, make sure you have Docker and Docker Compose installed, and run the following command in the root folder of the repository: +Install the Python test dependency once: -```bash -docker compose up +```powershell +python -m pip install pytest ``` -This will start the system with the multiple services. Each service will be restarted automatically when you make changes to the code, so you don't have to restart the system manually while developing. If you want to know how the services are started and configured, check the `docker-compose.yaml` file. +The load harness and the e2e suite are stdlib-only beyond `pytest`; +no other Python packages are required on the host. -The checkpoint evaluations will be done using the code that is started with Docker Compose, so make sure that your code works with Docker Compose. +--- -If, for some reason, changes to the code are not reflected, try to force rebuilding the Docker images with the following command: +## Quick start -```bash -docker compose up --build +From the repository root: + +```powershell +docker compose up --build -d +docker compose ps ``` -### Run the code locally +You should see 14 services in the `Up` state: + +| Tier | Services | +|------|----------| +| Frontend | `frontend` | +| Orchestrator | `orchestrator` | +| Checkpoint-2 validation | `transaction_verification`, `fraud_detection`, `suggestions` | +| Queue + executors | `order_queue`, `order_executor_1`, `order_executor_2`, `order_executor_3` | +| Payment | `payment_service` | +| Books database (3-replica primary-backup) | `books_database_1`, `books_database_2`, `books_database_3` | +| Observability | `observability` (Grafana + Prometheus + Tempo + Loki) | + +Allow ~15 seconds after `up` for the stack to settle (executors run a +bully election, databases elect a primary, OTLP exporters establish +connections). + +--- + +## Service endpoints + +| URL | Purpose | +|-----|---------| +| | Bookshop frontend (single-page app) | +| | Orchestrator REST endpoint (POST JSON) | +| | Grafana — Checkpoint 4 overview dashboard | +| | Grafana Explore (Tempo / Prometheus / Loki ad-hoc queries) | +| `observability:4317` (in-network) | OTLP/gRPC ingest | +| `observability:4318` (in-network) | OTLP/HTTP ingest | + +Grafana is configured for anonymous **Admin** access; no login is +required. + +--- + +## Manual testing + +### From the browser + +Open , add a book to the cart, fill in the +checkout form, and submit. The frontend POSTs the order to the +orchestrator and renders the response. + +### From the command line -Even though you can run the code locally, it is recommended to use Docker and Docker Compose to run the code. This way you don't have to install any dependencies locally and you can easily run the code on any platform. +Three prepared payloads live at the repository root for quick +`curl`-style probing: -If you want to run the code locally, you need to install the following dependencies: +| File | Expected outcome | +|------|------------------| +| `test_checkout.json` | `Order Approved` — clean card, in-stock book | +| `test_checkout_fraud.json` | `Order Rejected` — card number ends in `0000` (fraud-detector trip) | +| `test_checkout_oversold.json` | Order is enqueued and approved; 2PC aborts asynchronously because the requested quantity exceeds stock | + +PowerShell example: + +```powershell +Invoke-RestMethod -Method Post ` + -Uri http://127.0.0.1:8081/checkout ` + -ContentType "application/json" ` + -InFile .\test_checkout.json +``` + +For the `_oversold.json` case, the HTTP response says approved +immediately (the order has been enqueued) — open the Grafana +*"2PC outcomes per minute"* panel or grep executor logs to see the +asynchronous abort: + +```powershell +docker compose logs order_executor_1 order_executor_2 order_executor_3 | + Select-String "2pc_decision" +``` + +--- + +## Automated end-to-end tests + +Four scenarios, one verifier script, ~30 seconds wall-clock. + +```powershell +.\scripts\checkpoint4-checks.ps1 +``` + +The script: + +1. Brings up the `observability` container if it is not already + running, and leaves it running on exit so the dashboard's + in-memory history persists across runs. +2. Resets the application tier to a clean state (clears + `books_database/state/*` seed files; removes and recreates all + non-observability containers). +3. Runs `pytest tests/e2e` against the live stack. +4. Exits non-zero on test failure. + +Switches: + +| Flag | Effect | +|------|--------| +| `-SkipBuild` | Skip the reset/rebuild step; reuse the running stack as-is. | + +The script **does not** tear the stack down at the end; tear down +manually with `docker compose down` when you are finished. + +### The four scenarios + +| Test file | Scenario | What it asserts | +|-----------|----------|-----------------| +| [tests/e2e/test_01_single_clean_order.py](tests/e2e/test_01_single_clean_order.py) | Single clean order | HTTP 200, status `Order Approved`, `orderId` present | +| [tests/e2e/test_02_multiple_non_conflicting.py](tests/e2e/test_02_multiple_non_conflicting.py) | 4 concurrent orders for 4 different titles | All approved, all `orderId`s distinct | +| [tests/e2e/test_03_mixed_fraud.py](tests/e2e/test_03_mixed_fraud.py) | 3 clean + 3 fraud interleaved | Clean → approved; fraud → rejected | +| [tests/e2e/test_04_conflicting_orders.py](tests/e2e/test_04_conflicting_orders.py) | 8 concurrent orders for the same scarce title | `commits ≤ pre_stock`; exactly `min(stock, 8)` commits and the rest are abort decisions (verified by tailing executor logs) | + +### Running pytest directly + +If you prefer not to use the helper script (e.g. when iterating on a +single test): + +```powershell +$env:PYTHONPATH = "." +python -m pytest tests/e2e/test_04_conflicting_orders.py -v --tb=short +``` + +`PYTHONPATH=.` lets the suite import `tests.e2e._common` when invoked +from the repo root. The session-scoped fixture in +[tests/e2e/conftest.py](tests/e2e/conftest.py) waits up to 90 s for +the orchestrator to come ready, so it is safe to run immediately +after `docker compose up -d`. + +--- + +## Load testing + +[load_test/run_load.py](load_test/run_load.py) is an open-loop load +generator. Open-loop matters: the scheduler dispatches the *target* +RPS on the wall clock regardless of how slow individual responses +become, which is the only honest way to expose server back-pressure. + +Each run prints a per-stage summary table to stdout and writes a CSV +to `load_test/results/