-
Notifications
You must be signed in to change notification settings - Fork 3
update README.md #24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
victorpacyna
wants to merge
3
commits into
main
Choose a base branch
from
pacyna/updateReadme
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
update README.md #24
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,155 +1,121 @@ | ||
| # Replaying the NASDAQ order book | ||
|
|
||
| This is an example project live-replaying the complete NASDAQ exchange orders from January 30 2020 with CedarDB. | ||
| For an overview of the dataset, take a look at [our example dataset docs](https://cedardb.com/docs/example_datasets/nasdaq/). | ||
| This example live-replays the complete NASDAQ order stream from January 30, 2020, with CedarDB. For dataset background, see [the NASDAQ example dataset docs](https://cedardb.com/docs/example_datasets/nasdaq/). | ||
|
|
||
| What's especially noteworthy here is that CedarDB is not only running the **transactional query workload**, | ||
| inserting thousands of events every 100 ms, but also the **complex analytical queries** which feed the various | ||
| views in the Grafana dashboard. It's an excellent illustration of the power of Hybrid Transactional/Analytical | ||
| views in the Grafana dashboard. It's an excellent illustration of the power of Hybrid Transactional/Analytical | ||
| Processing (HTAP). | ||
|
|
||
| This example consists of separate applications: | ||
|  | ||
|
|
||
| 1. A parser written in Python that parses NASDAQ's proprietary ITCHv5 protocol into human-readable CSV files. | ||
| 2. A C++ client connecting to CedarDB and live-replaying all orders. | ||
| 3. A Grafana Dashboard displaying live analytics (pictured below). | ||
| The setup is fully dockerized. The demo stack contains: | ||
|
|
||
| 1. `parser`: downloads the NASDAQ ITCH dump and converts it into CSV files. | ||
| 2. `cedar`: runs CedarDB and stores the parsed data on a Docker volume. | ||
| 3. `client`: creates the schema, loads reference and pre-market data, and replays the live market stream in 100 ms batches. | ||
| 4. `grafana`: shows live analytics on top of the replay. | ||
| 5. `aichat`: optional web UI for natural-language questions over the same database. | ||
|
|
||
|  | ||
| In comparison mode, the stack also starts PostgreSQL and replays the same workload into both databases. | ||
|
|
||
| In addition to Grafana, you can also issue queries yourself to get insight into the market state. | ||
| This guide will show you how to do both using `docker compose`. | ||
| ## Getting started | ||
|
|
||
| Prerequisites: | ||
|
|
||
| ## Getting started | ||
| 1. Docker with Compose support. | ||
| 2. A stable internet connection to pull the required Docker images and download the NASDAQ dataset on first run. | ||
|
|
||
| This guide assumes you already have a cedardb docker image, i.e. have completed [this guide](https://cedardb.com/docs/getting_started/running_docker_image/) up to step two. | ||
| Optional: | ||
|
|
||
| ### 1. Prepare the data | ||
| Execute the `prepare.sh` script: | ||
| ```shell | ||
| ./prepare.sh | ||
| ``` | ||
| It downloads the raw binary package capture that NASDAQ provides, extracts it and transforms it into CSV files. | ||
| This downloads about 3.3 GB and writes ~16 GB CSV files. | ||
| 1. A CedarDB license at `db-config/cedar/license.env`. You can sign up for a trial at https://console.cedardb.com/signup. | ||
|
|
||
| The license is needed to create the dedicated `grafana` database user and grant the required user permissions cleanly. It also enables database statistics in comparison mode. | ||
|
|
||
| If no license is present, `demo.sh` falls back to using the `postgres` admin user for Grafana access because the dedicated `grafana` user cannot be granted the required read permissions. | ||
|
|
||
| ## Run the demo | ||
|
|
||
| You should now have a set of files in the data directory containing the stock exchange events: | ||
| Use `demo.sh` as the entrypoint for the stack: | ||
|
|
||
| ```shell | ||
| du -h data/*.csv | ||
| ./demo.sh start | ||
| ``` | ||
|
|
||
| ``` | ||
| 5,3G data/cancellations.csv | ||
| 181M data/cancellationsPreMarket.csv | ||
| 337M data/executions.csv | ||
| 2,7M data/executionsPreMarket.csv | ||
| 7,5M data/marketMakers.csv | ||
| 9,8G data/orders.csv | ||
| 279M data/ordersPreMarket.csv | ||
| 516K data/stocks.csv | ||
| ``` | ||
| This starts the normal stack in the background with `docker compose up -d --build`. On the first run, the parser container: | ||
|
|
||
| 1. downloads the NASDAQ archive, about 3.3 GB compressed, | ||
| 2. extracts it, | ||
| 3. parses it into roughly 16 GB of CSV data, | ||
| 4. stores everything in the Docker volume `data`. | ||
|
|
||
| Depending on your connection and machine, the initial download and parsing step can take around 10 to 15 minutes. | ||
|
|
||
| After the parser finishes, the client loads the schema and pre-market data, then begins the timed replay. The replay starts 10 minutes after market open, so the initial database state corresponds to 9:40 AM market time. If it has been running for 20 minutes, the database state represents 10:00 AM market time. | ||
|
|
||
| Useful lifecycle commands: | ||
|
|
||
| ### 2. Run the application | ||
| ```shell | ||
| docker compose build client | ||
| docker compose up | ||
| ./demo.sh stop # Stop and remove all containers | ||
| ./demo.sh clean # Stop and remove all containers, then remove Docker volumes, including the parsed dataset | ||
| ./demo.sh pull # Pull the latest database images | ||
| ``` | ||
|
|
||
| While the client is running, it replays the live exchange data in 100ms batches, treating the point in time the program was started as 9:30 AM, i.e. the exact instance the market opens. | ||
| In the first minute, the client catches up to the live transaction stream and starts inserting many events. | ||
| Afterward, you should get batches of a couple of thousand events per 100ms. | ||
| So, if you run the client for 30 minutes, the database state will represent the state of the NASDAQ exchange 30 minutes after market open, i.e., 10:00 AM. | ||
| ## Access the Dashboard | ||
|
|
||
| Grafana is exposed on http://localhost:3000. | ||
|
|
||
| You can stop the application via `CTRL+C` followed by `docker compose down` | ||
|
|
||
| ### 3. Connect to Grafana | ||
| You can now browse to Grafana at http://localhost:3000, log in with username `admin` and password `admin`, and view the NASDAQ dashboard. | ||
| Authentication is disabled for the UI, so opening the page is enough. The dashboard is provisioned automatically. | ||
|
|
||
|  | ||
|
|
||
| ## Access the AI Chat | ||
|
|
||
| The AI chat UI is exposed on http://localhost:8080. | ||
|
|
||
| ### 4. Query the data | ||
| Alternatively, you can run your own queries. This requires installation of the `psql` PostgreSQL command line interface. | ||
| Note that, for the `Time:` values to appear, you need to either run `\timing on` from within the session or | ||
| have a `$HOME/.psqlrc` file containing at least the following line: `\timing on`. | ||
| By default, the container starts with: | ||
|
|
||
| ```shell | ||
| PGPASSWORD=postgres psql -h localhost -U postgres -d postgres | ||
| OPENROUTER_API_KEY={your_api_key_here} | ||
| LLM_MODEL=anthropic/claude-sonnet-4.5 | ||
| ``` | ||
|
|
||
| Here are some example queries to get you started: | ||
| Set `OPENROUTER_API_KEY` before `./demo.sh start` if you want the chat UI to be functional. | ||
|
|
||
| ```sql | ||
| postgres=# | ||
| select count(*) from orders; | ||
| count | ||
| ---------- | ||
| 11019259 | ||
| (1 row) | ||
| ## Query the data | ||
|
|
||
| Time: 5.316 ms | ||
| ``` | ||
| The best way to run ad hoc SQL in this setup is through Grafana Explore. | ||
|
victorpacyna marked this conversation as resolved.
|
||
|
|
||
| Open http://localhost:3000/explore, select the provisioned PostgreSQL-compatible data source, and run SQL directly there. | ||
|
|
||
| Example queries: | ||
|
|
||
| ```sql | ||
| postgres=# | ||
| select count(*) from orders; | ||
| select avg(price) from executions; | ||
| avg | ||
| ----------------------------- | ||
| 140.21785151844912886904428 | ||
| (1 row) | ||
|
|
||
| Time: 15.681 ms | ||
| ``` | ||
|
|
||
| The following query calculates the new orders created per second averaged over the last 10 seconds. | ||
| The following query calculates new orders per second averaged over the last 10 seconds: | ||
|
|
||
| ```sql | ||
| client=# | ||
| select count(*) / 10 as new -- averaged over 10 seconds | ||
| from orders o | ||
| where prevOrder is null -- == new order | ||
| and o.timestamp > (select max(e.timestamp) from executions e) - 10::bigint * 1000 * 1000 * 1000; -- averaged over 10 seconds | ||
| new | ||
| ------ | ||
| 8285 | ||
| (1 row) | ||
|
|
||
| Time: 32.514 ms | ||
| select count(*) / 10 as new | ||
| from orders o | ||
| where prevOrder is null | ||
| and o.timestamp > ( | ||
| select max(e.timestamp) from executions e | ||
| ) - 10::bigint * 1000 * 1000 * 1000; | ||
| ``` | ||
|
|
||
| You can find some more complex queries in the `sql` subdirectory. | ||
| More analytical queries are available in [`sql/`](./sql). | ||
|
|
||
| ## Load everything | ||
| If you prefer other database tools like psql or DBeaver, expose the database port (5432) to localhost in `compose.yml`/`comparison.compose.yml`. | ||
|
|
||
| Start the Docker image, mounting the `./data` directory containing the CSV data: | ||
| ## Comparison mode | ||
|
|
||
| ```shell | ||
| docker run --rm -p 5432:5432 -e CEDAR_PASSWORD=postgres -v ./data:/data --name cedardb cedardb | ||
| ``` | ||
| Comparison mode starts CedarDB and PostgreSQL with the same CPU and memory limits, then replays the same workload into both systems. | ||
|
|
||
| Connect to CedarDB via the `psql` CLI: | ||
| It requires `DB_CPU_LIMIT` and `DB_MEM_LIMIT`: | ||
|
|
||
| ```shell | ||
| PGPASSWORD=postgres psql -h localhost -U postgres -d postgres | ||
| ``` | ||
|
|
||
| Using the `psql` client, run the DDL and then directly copy the CSV data: | ||
|
|
||
| ```sql | ||
| \i client/schema.sql | ||
| copy stocks from '/data/stocks.csv' with(format text, delimiter ';', null '', header true); | ||
| copy marketmakers from '/data/marketMakers.csv' with(format text, delimiter ';', null '', header true); | ||
| copy orders from '/data/ordersPreMarket.csv' with(format text, delimiter ';', null '', header true); | ||
| copy orders from '/data/orders.csv' with(format text, delimiter ';', null '', header true); | ||
| copy executions from '/data/executionsPreMarket.csv' with(format text, delimiter ';', null '', header true); | ||
| copy executions from '/data/executions.csv' with(format text, delimiter ';', null '', header true); | ||
| copy cancellations from '/data/cancellationsPreMarket.csv' with(format text, delimiter ';', null '', header true); | ||
| copy cancellations from '/data/cancellations.csv' with(format text, delimiter ';', null '', header true); | ||
| DB_CPU_LIMIT=4 DB_MEM_LIMIT=8g ./demo.sh --comparison start | ||
| ``` | ||
|
|
||
| Try running some ad hoc SQL queries. | ||
|
|
||
| Please note that this does not maintain the orderbook, which would be maintained by the client. | ||
|
|
||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.