cedardb · victorpacyna · May 27, 2026 · May 27, 2026 · May 27, 2026
diff --git a/nasdaq/README.md b/nasdaq/README.md
@@ -1,155 +1,121 @@
 # Replaying the NASDAQ order book
 
-This is an example project live-replaying the complete NASDAQ exchange orders from January 30 2020 with CedarDB.
-For an overview of the dataset, take a look at [our example dataset docs](https://cedardb.com/docs/example_datasets/nasdaq/).
+This example live-replays the complete NASDAQ order stream from January 30, 2020, with CedarDB. For dataset background, see [the NASDAQ example dataset docs](https://cedardb.com/docs/example_datasets/nasdaq/).
 
 What's especially noteworthy here is that CedarDB is not only running the **transactional query workload**,
 inserting thousands of events every 100 ms, but also the **complex analytical queries** which feed the various
-views in the Grafana dashboard.  It's an excellent illustration of the power of Hybrid Transactional/Analytical
+views in the Grafana dashboard. It's an excellent illustration of the power of Hybrid Transactional/Analytical
 Processing (HTAP).
 
-This example consists of separate applications:
+![Grafana](./grafana.png)
 
-1. A parser written in Python that parses NASDAQ's proprietary ITCHv5 protocol into human-readable CSV files.
-2. A C++ client connecting to CedarDB and live-replaying all orders.
-3. A Grafana Dashboard displaying live analytics (pictured below).
+The setup is fully dockerized. The demo stack contains:
 
+1. `parser`: downloads the NASDAQ ITCH dump and converts it into CSV files.
+2. `cedar`: runs CedarDB and stores the parsed data on a Docker volume.
+3. `client`: creates the schema, loads reference and pre-market data, and replays the live market stream in 100 ms batches.
+4. `grafana`: shows live analytics on top of the replay.
+5. `aichat`: optional web UI for natural-language questions over the same database.
 
-![Grafana](./grafana.png)
+In comparison mode, the stack also starts PostgreSQL and replays the same workload into both databases.
 
-In addition to Grafana, you can also issue queries yourself to get insight into the market state.
-This guide will show you how to do both using `docker compose`.
+## Getting started
 
+Prerequisites:
 
-## Getting started
+1. Docker with Compose support.
+2. A stable internet connection to pull the required Docker images and download the NASDAQ dataset on first run.
 
-This guide assumes you already have a cedardb docker image, i.e. have completed [this guide](https://cedardb.com/docs/getting_started/running_docker_image/) up to step two.
+Optional:
 
-### 1. Prepare the data 
-Execute the `prepare.sh` script:
-```shell
-./prepare.sh
-```
-It downloads the raw binary package capture that NASDAQ provides, extracts it and transforms it into CSV files.
-This downloads about 3.3 GB and writes ~16 GB CSV files.
+1. A CedarDB license at `db-config/cedar/license.env`. You can sign up for a trial at https://console.cedardb.com/signup.
+
+The license is needed to create the dedicated `grafana` database user and grant the required user permissions cleanly. It also enables database statistics in comparison mode.
+
+If no license is present, `demo.sh` falls back to using the `postgres` admin user for Grafana access because the dedicated `grafana` user cannot be granted the required read permissions.
+
+## Run the demo
 
-You should now have a set of files in the data directory containing the stock exchange events:
+Use `demo.sh` as the entrypoint for the stack:
 
 ```shell
-du -h data/*.csv
+./demo.sh start
 ```
 
-```
-5,3G	data/cancellations.csv
-181M	data/cancellationsPreMarket.csv
-337M	data/executions.csv
-2,7M	data/executionsPreMarket.csv
-7,5M	data/marketMakers.csv
-9,8G	data/orders.csv
-279M	data/ordersPreMarket.csv
-516K	data/stocks.csv
-```
+This starts the normal stack in the background with `docker compose up -d --build`. On the first run, the parser container:
+
+1. downloads the NASDAQ archive, about 3.3 GB compressed,
+2. extracts it,
+3. parses it into roughly 16 GB of CSV data,
+4. stores everything in the Docker volume `data`.
+
+Depending on your connection and machine, the initial download and parsing step can take around 10 to 15 minutes.
+
+After the parser finishes, the client loads the schema and pre-market data, then begins the timed replay. The replay starts 10 minutes after market open, so the initial database state corresponds to 9:40 AM market time. If it has been running for 20 minutes, the database state represents 10:00 AM market time.
+
+Useful lifecycle commands:
 
-### 2. Run the application
 ```shell
-docker compose build client
-docker compose up
+./demo.sh stop  # Stop and remove all containers
+./demo.sh clean # Stop and remove all containers, then remove Docker volumes, including the parsed dataset
+./demo.sh pull  # Pull the latest database images
 ```
 
-While the client is running, it replays the live exchange data in 100ms batches, treating the point in time the program was started as 9:30 AM, i.e. the exact instance the market opens.
-In the first minute, the client catches up to the live transaction stream and starts inserting many events.
-Afterward, you should get batches of a couple of thousand events per 100ms.
-So, if you run the client for 30 minutes, the database state will represent the state of the NASDAQ exchange 30 minutes after market open, i.e., 10:00 AM.
+## Access the Dashboard
 
+Grafana is exposed on http://localhost:3000.
 
-You can stop the application via `CTRL+C` followed by `docker compose down`
-
-### 3. Connect to Grafana
-You can now browse to Grafana at http://localhost:3000, log in with username `admin` and password `admin`, and view the NASDAQ dashboard.
+Authentication is disabled for the UI, so opening the page is enough. The dashboard is provisioned automatically.
 
 ![Grafana Instructions](./grafana_instructions.png)
 
+## Access the AI Chat
+
+The AI chat UI is exposed on http://localhost:8080.
 
-### 4. Query the data
-Alternatively, you can run your own queries.  This requires installation of the `psql` PostgreSQL command line interface.
-Note that, for the `Time:` values to appear, you need to either run `\timing on` from within the session or
-have a `$HOME/.psqlrc` file containing at least the following line: `\timing on`.
+By default, the container starts with:
 
 ```shell
-PGPASSWORD=postgres psql -h localhost -U postgres -d postgres
+OPENROUTER_API_KEY={your_api_key_here}
+LLM_MODEL=anthropic/claude-sonnet-4.5
 ```
 
-Here are some example queries to get you started:
+Set `OPENROUTER_API_KEY` before `./demo.sh start` if you want the chat UI to be functional.
 
-```sql
-postgres=#
-select count(*) from orders;
-  count   
-----------
- 11019259
-(1 row)
+## Query the data
 
-Time: 5.316 ms
-```
+The best way to run ad hoc SQL in this setup is through Grafana Explore.
+
+Open http://localhost:3000/explore, select the provisioned PostgreSQL-compatible data source, and run SQL directly there.
+
+Example queries:
 
 ```sql
-postgres=#
+select count(*) from orders;
 select avg(price) from executions;
-             avg             
------------------------------
- 140.21785151844912886904428
-(1 row)
-
-Time: 15.681 ms
 ```
 
-The following query calculates the new orders created per second averaged over the last 10 seconds.
+The following query calculates new orders per second averaged over the last 10 seconds:
 
 ```sql
-client=#
-select count(*) / 10 as new -- averaged over 10 seconds
-from  orders o
-where prevOrder is null -- == new order
-and o.timestamp > (select max(e.timestamp) from executions e) - 10::bigint * 1000 * 1000 * 1000; -- averaged over 10 seconds
- new  
-------
- 8285
-(1 row)
-
-Time: 32.514 ms
+select count(*) / 10 as new
+from orders o
+where prevOrder is null
+  and o.timestamp > (
+    select max(e.timestamp) from executions e
+  ) - 10::bigint * 1000 * 1000 * 1000;
 ```
 
-You can find some more complex queries in the `sql` subdirectory.
+More analytical queries are available in [`sql/`](./sql).
 
-## Load everything
+If you prefer other database tools like psql or DBeaver, expose the database port (5432) to localhost in `compose.yml`/`comparison.compose.yml`.
 
-Start the Docker image, mounting the `./data` directory containing the CSV data:
+## Comparison mode
 
-```shell
-docker run --rm -p 5432:5432 -e CEDAR_PASSWORD=postgres -v ./data:/data --name cedardb cedardb
-```
+Comparison mode starts CedarDB and PostgreSQL with the same CPU and memory limits, then replays the same workload into both systems.
 
-Connect to CedarDB via the `psql` CLI:
+It requires `DB_CPU_LIMIT` and `DB_MEM_LIMIT`:
 
 ```shell
-PGPASSWORD=postgres psql -h localhost -U postgres -d postgres
-```
-
-Using the `psql` client, run the DDL and then directly copy the CSV data:
-
-```sql
-\i client/schema.sql
-copy stocks from '/data/stocks.csv' with(format text, delimiter ';', null '', header true);
-copy marketmakers from '/data/marketMakers.csv' with(format text, delimiter ';', null '', header true);
-copy orders from '/data/ordersPreMarket.csv' with(format text, delimiter ';', null '', header true);
-copy orders from '/data/orders.csv' with(format text, delimiter ';', null '', header true);
-copy executions from '/data/executionsPreMarket.csv' with(format text, delimiter ';', null '', header true);
-copy executions from '/data/executions.csv' with(format text, delimiter ';', null '', header true);
-copy cancellations from '/data/cancellationsPreMarket.csv' with(format text, delimiter ';', null '', header true);
-copy cancellations from '/data/cancellations.csv' with(format text, delimiter ';', null '', header true);
+DB_CPU_LIMIT=4 DB_MEM_LIMIT=8g ./demo.sh --comparison start
 ```
-
-Try running some ad hoc SQL queries.
-
-Please note that this does not maintain the orderbook, which would be maintained by the client.
-