Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,16 +188,18 @@ SLayer uses sqlglot for dialect-aware SQL generation. Databases are supported at
- **DuckDB** — integration tests in `tests/integration/test_integration_duckdb.py` (no Docker, runs in-process)
- **MySQL** — Docker example with `verify.py`
- **ClickHouse** — Docker example with `verify.py`
- **SQL Server** — Docker example with `verify.py` in `examples/sqlserver/` (requires SQL Server 2022; uses `mssql+pyodbc://` driver; `median`/`percentile` unsupported; `corr`/`covar_samp`/`covar_pop` via variance-decomposition formula)

**Tier 2 — code-covered** (unit tests for SQL generation, no live instance verification):
- Snowflake, BigQuery, Redshift, Trino/Presto, Databricks/Spark, MS SQL Server, Oracle
- Snowflake, BigQuery, Redshift, Trino/Presto, Databricks/Spark, Oracle

Dialect mapping lives in `query_engine.py:_dialect_for_type()`. Dialect-specific SQL lives in `generator.py` — mainly `_build_date_trunc` (SQLite branch), `_build_time_offset_expr` (date arithmetic for shifted CTEs), `_build_median`, `_build_percentile`, and `_build_stat_agg` (stddev/var/corr). Calendar-based time shifts use timestamp offset inside DATE_TRUNC with simple equality joins (no per-dialect join logic). All other SQL differences are handled by sqlglot transpilation. When adding a new dialect: add it to `_dialect_for_type`, add a `_build_time_offset_expr` branch if it doesn't use Postgres-style `INTERVAL`, and add parameterized tests in `TestMultiDialectGeneration`.

**Aggregation caveats:**
- **SQLite**: `median`, `percentile_cont`, `percentile_disc`, `stddev_samp`, `stddev_pop`, `var_samp` (also aliased as `variance`), `var_pop` (also aliased as `variance_pop`), `corr`, `covar_samp`, `covar_pop` are provided via Python aggregate UDFs registered on every new connection (`slayer/sql/sqlite_udfs.py`); SQLite has no native equivalent. Scalar UDFs `ln`, `log10`, `log2`, `exp`, `sqrt`, `pow`, `power` are also registered there; `log2` overrides SQLite ≥3.35's silent-NULL built-in to keep the strict math-domain-error semantics. The 2-arg `log(B, X)` UDF (returns log_B(X) — base first, value second) is registered on **every** SQLite version, including ≥3.35 where it overrides the built-in's silent-NULL behaviour to match Postgres's strict error semantics. Same B-first arg order in both.
- **ClickHouse**: `percentile` emits the parametric `quantile(p)(x)` syntax; `median` uses native `median(x)`. `stddev_samp`/`stddev_pop`/`var_samp`/`var_pop`/`corr` are native (sqlglot transpiles to dialect-appropriate spelling).
- **MySQL**: `median`, `percentile`, `corr`, `covar_samp`, `covar_pop` are not supported — MySQL has no native function and no Python-UDF mechanism. The generator raises `NotImplementedError` at SQL generation time. Use MariaDB or compute client-side. `stddev_samp`/`stddev_pop`/`var_samp`/`var_pop` are native on MySQL.
- **MySQL**: `median` and `percentile` are not supported — raises `NotImplementedError`. `stddev_samp`/`stddev_pop`/`var_samp`/`var_pop` are native. `corr`/`covar_samp`/`covar_pop` use a variance-decomposition formula: `cov(x,y) = (var(x+y) - var(x) - var(y)) / 2`, `corr = cov / (stddev(x) * stddev(y))`.
- **T-SQL (SQL Server)**: `median` and `percentile` are not supported — raises `NotImplementedError` (`PERCENTILE_CONT` is window-only in T-SQL, not a GROUP BY aggregate). `stddev_samp`/`stddev_pop`/`var_samp`/`var_pop` emit as `STDEV`/`STDEVP`/`VAR`/`VARP`. `corr`/`covar_samp`/`covar_pop` use the same variance-decomposition formula as MySQL. `DATETRUNC` is used for date truncation (SQL Server 2022+; week uses `iso_week` for Monday-based truncation). `DATEADD` is used for interval arithmetic (no `INTERVAL` syntax). Type aliases `mssql`/`sqlserver`/`tsql` all map to the T-SQL dialect and generate `mssql+pyodbc://` connection strings.
- **Postgres / DuckDB**: native `PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)` (DuckDB via sqlglot's `QUANTILE_CONT` translation). `STDDEV_SAMP`/`STDDEV_POP`/`VAR_SAMP`/`VAR_POP`/`CORR`/`COVAR_SAMP`/`COVAR_POP` are native on both.

**In-memory SQLite caveat:** `sqlite:///:memory:` (and equivalent URI variants — `sqlite://`, `sqlite:///file::memory:?…`, `mode=memory`) works across `await` calls on a single `SlayerSQLClient` because the client owns a per-instance `StaticPool` engine with `check_same_thread=False`. Two separate `SlayerSQLClient` instances on `:memory:` are isolated from each other. Use a file path or `mode=memory&cache=shared` URI form to share state across clients. File-backed SQLite is unaffected — it routes through the module-level engine cache as before.
Expand Down
15 changes: 14 additions & 1 deletion docs/configuration/datasources.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,20 @@ SQL generation is covered by unit tests, but not verified against live instances
| `trino` / `presto` / `athena` | `trino` or `PyAthena` | `pip install trino` or `pip install PyAthena` |
| `databricks` / `spark` | `databricks-sql-connector` | `pip install databricks-sql-connector` |
| `oracle` | `oracledb` | `pip install oracledb` |
| `mssql` / `sqlserver` / `tsql` | `pyodbc` or `pymssql` | `pip install pyodbc` or `pip install pymssql` |
| `mssql` / `sqlserver` / `tsql` | `pyodbc` (auto-generated strings) or `pymssql` (manual `connection_string` only) | `pip install pyodbc` or `pip install pymssql` |

!!! warning "SQL Server — requires SQL Server 2022+"
SLayer uses `DATETRUNC` for time-dimension queries, which was introduced in SQL Server 2022 (version 16.0).
SQL Server 2019 and earlier will return an error on time-dimension queries.
The Docker example uses `mcr.microsoft.com/mssql/server:2022-latest`.

!!! warning "SQL Server — TrustServerCertificate"
Auto-generated SQL Server connection strings include `TrustServerCertificate=yes`, which disables
TLS certificate validation. This is correct for local development and Docker environments that use
self-signed certificates, but **must not be used in production** — it allows a man-in-the-middle
attack on the database connection. For production, supply a `connection_string` field directly with
a valid CA certificate chain, or configure your SQL Server instance with a certificate signed by a
trusted CA and omit `TrustServerCertificate`.

!!! note
Snowflake, BigQuery, ClickHouse, and similar analytical warehouses typically don't have foreign keys, so auto-ingestion won't discover joins. Define joins manually in your model YAML.
Expand Down
7 changes: 4 additions & 3 deletions examples/mysql/verify.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from verify_common import (
run_common_checks,
check_rollup,
check_corr_covar,
check_stddev_var,
check,
summary,
Expand All @@ -35,9 +36,9 @@
check("4 models without rollup", len(models) == 4)

# MySQL has native STDDEV_SAMP/STDDEV_POP/VAR_SAMP/VAR_POP. DEV-1317 smoke.
# corr / covar_samp / covar_pop are NOT supported on MySQL — SLayer
# raises NotImplementedError there, so we deliberately don't call
# check_corr_covar() from this script. Use MariaDB for those.
check_stddev_var()

# MySQL corr/covar_samp/covar_pop now use a variance-decomposition formula.
check_corr_covar()

summary()
34 changes: 34 additions & 0 deletions examples/seed.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,38 @@
);
"""

# T-SQL (SQL Server): TEXT is deprecated — use NVARCHAR; TIMESTAMP is a binary
# rowversion type — use DATETIME2 instead.
CREATE_SQL_TSQL = """
CREATE TABLE regions (
id INTEGER PRIMARY KEY,
name NVARCHAR(255) NOT NULL
);

CREATE TABLE customers (
id INTEGER PRIMARY KEY,
name NVARCHAR(255) NOT NULL,
email NVARCHAR(255) NOT NULL,
region_id INTEGER REFERENCES regions(id)
);

CREATE TABLE products (
id INTEGER PRIMARY KEY,
name NVARCHAR(255) NOT NULL,
category NVARCHAR(255) NOT NULL,
price NUMERIC(10,2) NOT NULL
);

CREATE TABLE orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER REFERENCES customers(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER NOT NULL,
status NVARCHAR(50) NOT NULL,
created_at DATETIME2 NOT NULL
);
"""

# ClickHouse uses MergeTree engine, no PRIMARY KEY constraint, no REFERENCES
CREATE_SQL_CLICKHOUSE = """
CREATE TABLE regions (
Expand Down Expand Up @@ -88,6 +120,8 @@ def _get_create_sql(connection_string: str) -> str:
"""Return dialect-appropriate CREATE TABLE SQL."""
if "clickhouse" in connection_string.lower():
return CREATE_SQL_CLICKHOUSE
if "mssql" in connection_string.lower() or "sqlserver" in connection_string.lower():
return CREATE_SQL_TSQL
return CREATE_SQL_STANDARD


Expand Down
23 changes: 23 additions & 0 deletions examples/sqlserver/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# SQL Server Example

This example uses **SQL Server 2022** (`mcr.microsoft.com/mssql/server:2022-latest`).

## Important: SQL Server 2022 required

`DATETRUNC` was introduced in SQL Server 2022. Earlier versions (2019 and older) do not have
this function and will error on time-dimension queries. The Docker image tag
`mcr.microsoft.com/mssql/server:2022-latest` is the only supported tag for this example.

## ODBC driver dependency

The seed and SLayer containers use a custom `Dockerfile` (in this directory) that installs
`msodbcsql18` via the Microsoft apt repository. The driver version is pinned to 18 because
pyodbc's connection string includes `ODBC+Driver+18+for+SQL+Server`.

## Running

```bash
cd examples/sqlserver
docker compose up -d
python verify.py
```
34 changes: 34 additions & 0 deletions examples/sqlserver/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
FROM python:3.14-slim-bookworm

WORKDIR /app

# Install msodbcsql18 driver (OS-level dependency for pyodbc)
RUN apt-get update && apt-get install -y --no-install-recommends \
curl \
gnupg \
unixodbc-dev \
&& curl -fsSL https://packages.microsoft.com/keys/microsoft.asc \
| gpg --dearmor -o /usr/share/keyrings/microsoft-prod.gpg \
&& curl -fsSL https://packages.microsoft.com/config/debian/12/prod.list \
> /etc/apt/sources.list.d/mssql-release.list \
&& apt-get update \
&& ACCEPT_EULA=Y apt-get install -y --no-install-recommends msodbcsql18 \
&& rm -rf /var/lib/apt/lists/*

# Install Python dependencies
COPY pyproject.toml poetry.lock README.md LICENSE ./
RUN pip install --no-cache-dir poetry && \
poetry config virtualenvs.create false && \
poetry install -E all --no-root --no-interaction --no-ansi && \
pip uninstall -y poetry

# Copy application code and install project
COPY slayer/ slayer/
RUN pip install --no-deps . && \
useradd --create-home slayer
USER slayer

ENV SLAYER_STORAGE=/data
EXPOSE 5143

CMD ["slayer", "serve", "--host", "0.0.0.0", "--port", "5143", "--storage", "/data"]
31 changes: 31 additions & 0 deletions examples/sqlserver/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# SLayer + SQL Server Example

Runs SLayer against a SQL Server 2022 database using Docker Compose.

## Prerequisites

- Docker and Docker Compose
- Python 3.11+

## Quick start

```bash
cd examples/sqlserver
docker compose up -d
# Wait ~30 s for SQL Server to be ready and the seed to complete, then:
python verify.py
```

## What it does

1. Starts a SQL Server 2022 container
2. Creates the `slayer_demo` database
3. Seeds it with the shared e-commerce dataset (regions, customers, products, orders)
4. Starts a SLayer API server on port 5143

## Notes

- SQL Server 2022 is required — `DATETRUNC` (used for time-dimension truncation) was added in 2022.
- `median` and `percentile` are not supported on T-SQL; SLayer raises `NotImplementedError` for those.
- `corr`, `covar_samp`, and `covar_pop` use a variance-decomposition formula (no native T-SQL equivalent).
- The `Dockerfile` in this directory extends the standard SLayer image with `msodbcsql18` (Microsoft ODBC Driver 18).
57 changes: 57 additions & 0 deletions examples/sqlserver/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
services:
sqlserver:
image: mcr.microsoft.com/mssql/server:2022-latest
environment:
ACCEPT_EULA: "Y"
MSSQL_SA_PASSWORD: "YourStrong@Passw0rd"
MSSQL_PID: "Developer"
ports:
- "1433:1433"
healthcheck:
test:
- "CMD-SHELL"
- >
/opt/mssql-tools18/bin/sqlcmd
-S localhost -U sa -P 'YourStrong@Passw0rd'
-Q 'SELECT 1' -No || exit 1
interval: 5s
timeout: 10s
retries: 30
start_period: 30s

createdb:
image: mcr.microsoft.com/mssql/server:2022-latest
command: >
/opt/mssql-tools18/bin/sqlcmd
-S sqlserver -U sa -P 'YourStrong@Passw0rd'
-Q "IF DB_ID(N'slayer_demo') IS NULL CREATE DATABASE slayer_demo;" -No
depends_on:
sqlserver:
condition: service_healthy

seed:
build:
context: ../..
dockerfile: examples/sqlserver/Dockerfile
command: >
python /examples/seed.py
"mssql+pyodbc://sa:YourStrong%40Passw0rd@sqlserver:1433/slayer_demo?driver=ODBC+Driver+18+for+SQL+Server&TrustServerCertificate=yes"
volumes:
- ../seed.py:/examples/seed.py:ro
depends_on:
createdb:
condition: service_completed_successfully

slayer:
build:
context: ../..
dockerfile: examples/sqlserver/Dockerfile
command: sh /examples/start.sh
ports:
- "5143:5143"
volumes:
- ./start.sh:/examples/start.sh:ro
- ./slayer_data:/data
depends_on:
seed:
condition: service_completed_successfully
2 changes: 2 additions & 0 deletions examples/sqlserver/slayer_data/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*
!.gitignore
21 changes: 21 additions & 0 deletions examples/sqlserver/start.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/sh
# Ingest models from SQL Server and start the SLayer API server.

python -c "
from slayer.async_utils import run_sync
from slayer.core.models import DatasourceConfig
from slayer.engine.ingestion import ingest_datasource_idempotent
from slayer.storage.yaml_storage import YAMLStorage

storage = YAMLStorage(base_dir='/data')
ds = DatasourceConfig(
name='demo', type='mssql',
host='sqlserver', port=1433,
database='slayer_demo', username='sa', password='YourStrong@Passw0rd',
)
run_sync(storage.save_datasource(ds))
result = run_sync(ingest_datasource_idempotent(datasource=ds, storage=storage))
print(f'Ingested {len(result.additions)} models')
"

exec slayer serve --host 0.0.0.0 --port 5143 --storage /data
56 changes: 56 additions & 0 deletions examples/sqlserver/verify.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
"""Verification script for the SQL Server Docker example.

Run after `docker compose up -d`:
python examples/sqlserver/verify.py

SQL Server 2022 supports STDEV/STDEVP/VAR/VARP natively; corr/covar_samp/
covar_pop use a variance-decomposition formula (no native function on T-SQL).
median/percentile are not supported on T-SQL and raise NotImplementedError.
"""

import os
import sys

sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
from verify_common import (
run_common_checks,
check_rollup,
check_stddev_var,
check_corr_covar,
check_column_types,
summary,
)

if __name__ == "__main__":
models = run_common_checks()
check_rollup(expect_rollup=True)

check_column_types(
model_name="orders",
expected_types={
"id": "INT",
"customer_id": "INT",
"product_id": "INT",
"quantity": "INT",
"status": "TEXT",
"created_at": "TIMESTAMP",
},
)
check_column_types(
model_name="products",
expected_types={
"id": "INT",
"name": "TEXT",
"category": "TEXT",
"price": "DOUBLE",
},
)
Comment thread
coderabbitai[bot] marked this conversation as resolved.

# T-SQL uses STDEV/STDEVP/VAR/VARP (not stddev_samp etc.) — verified via
# the SQL generator; the API response is the same regardless of dialect.
check_stddev_var()

# corr/covar_samp/covar_pop via variance-decomposition formula.
check_corr_covar()

summary()
8 changes: 4 additions & 4 deletions examples/verify_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ def check_column_types(model_name, expected_types):
"""Assert /models/{name} returns the expected DataType strings.

expected_types: dict mapping column name to DataType .value string
(e.g. "number", "string", "time", "date"). Columns absent
(e.g. "DOUBLE", "TEXT", "TIMESTAMP", "DATE"). Columns absent
from the dict are ignored — different dialects expose different
column sets, and this helper is a positive-coverage check, not
an exhaustive schema comparison.
Expand Down Expand Up @@ -317,9 +317,9 @@ def check_stddev_var(measure="quantity"):
def check_corr_covar(measure="quantity", other="customer_id"):
"""2-arg stat aggregates: corr, covar_samp, covar_pop.

Do NOT call from MySQL examples — SLayer raises ``NotImplementedError``
for these on MySQL (no native function, no Python-UDF mechanism).
Use MariaDB or compute client-side as a workaround.
Safe to call for all Tier-1 dialects including MySQL and T-SQL (SQL Server):
those use a variance-decomposition formula instead of native functions.
MariaDB and all others use native CORR/COVAR_*.
"""
print("\nCorrelation / covariance:")

Expand Down
Loading
Loading