Likely SQLite Dialect Bias: DeepAnalyze-8B SQL Breaks on DuckDB/Postgres/MySQL



DeepAnalyze-8B appears to rely heavily on Text-to-SQL datasets (e.g., Spider/BIRD/SynSQL) whose ground-truth queries are executed on **SQLite**. This likely biases the model toward SQLite execution semantics/idioms, so generated SQL may not transfer cleanly to **DuckDB/PostgreSQL/MySQL** without dialect adaptation.

## What I’m seeing (root cause)

The current SQL execution path in this repo is **SQLite-first**. The code connects via Python `sqlite3`, targets `.sqlite` DB files, and routes benchmark DB paths (SynSQL/Spider/BIRD) to SQLite databases. That means the “ground-truth execution environment” for training/eval here is SQLite, not DuckDB/Postgres/MySQL.

## Evidence from the codebase (SQLite engine wiring)

* Uses `sqlite3.connect(db_file)` for execution
* Wraps execution in `BEGIN TRANSACTION;` + `rollback()` (safe read-only validation)
* Resolves DB file paths as `.../{db_id}/{db_id}.sqlite`
* Selects benchmark DB roots for SynSQL/Spider/BIRD and then builds a `.sqlite` file path


## Why this matters

If the model + tooling loop is optimized around **SQLite execution**, you should expect portability failures when switching to engines like **DuckDB/Postgres/MySQL** (missing functions, different date/time semantics, different binder/parser rules, etc.). In practice, this shows up as “SQL looks fine” but fails at runtime on non-SQLite engines.

## Suggested fix direction (high-level)

Add a **dialect/engine adapter layer** (SQLite vs DuckDB vs Postgres), and make the execution backend explicit (e.g., `--sql-engine duckdb|sqlite|postgres`). Right now the default path implicitly assumes SQLite.


### Citations

**File:** deepanalyze/SkyRL/skyrl-gym/skyrl_gym/tools/sql.py (L18-18)
```python
                conn = sqlite3.connect(db_file)
```

**File:** deepanalyze/SkyRL/skyrl-gym/skyrl_gym/tools/sql.py (L20-24)
```python
                conn.execute("BEGIN TRANSACTION;")
                cursor.execute(sql)
                execution_res = frozenset(cursor.fetchall())
                conn.rollback()
                conn.close()
```

**File:** deepanalyze/SkyRL/skyrl-gym/skyrl_gym/tools/sql.py (L62-62)
```python
            db_file = os.path.join(self.db_path, db_id, db_id + ".sqlite")
```

**File:** deepanalyze/SkyRL/skyrl-gym/skyrl_gym/envs/sql/env.py (L29-50)
```python
        if self.task == "synsql":
            self.db_path = os.path.join(
                self.db_path,
                "SynSQL-2.5M/databases",
            )
        elif self.task == "spider":
            self.db_path = os.path.join(
                self.db_path,
                "spider/database",
            )
        elif self.task == "bird":
            self.db_path = os.path.join(
                self.db_path,
                "bird/train/train_databases",
            )
        else:
            raise NotImplementedError

        self.db_file = os.path.join(self.db_path, self.db_id, self.db_id + ".sqlite")
        # Check for DB file existence
        if not os.path.exists(self.db_file):
            raise FileNotFoundError(f"Database file not found at: {self.db_file}")
```

**File:** deepanalyze/SkyRL/skyrl-gym/skyrl_gym/envs/sql/utils.py (L48-54)
```python
        conn = sqlite3.connect(db_file)
        cursor = conn.cursor()
        conn.execute("BEGIN TRANSACTION;")
        cursor.execute(sql)
        execution_res = frozenset(cursor.fetchall())
        conn.rollback()
        conn.close()
```

**File:** playground/TableQA/tests/eval/wikisql_eval.py (L91-91)
```python
    agg_ops = ["", "MAX", "MIN", "COUNT", "SUM", "AVG"]
```

**File:** playground/TableQA/tests/eval/wikisql_eval.py (L123-129)
```python
    for i, op in enumerate(agg_ops[1:], 1):
        pattern = rf"{op}\s*\((.*?)\)"
        match = re.search(pattern, select_clause, re.IGNORECASE)
        if match:
            agg = i
            select_col = match.group(1).strip()
            break
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Likely SQLite Dialect Bias: DeepAnalyze-8B SQL Breaks on DuckDB/Postgres/MySQL #82

What I’m seeing (root cause)

Evidence from the codebase (SQLite engine wiring)

Why this matters

Suggested fix direction (high-level)

Citations

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Likely SQLite Dialect Bias: DeepAnalyze-8B SQL Breaks on DuckDB/Postgres/MySQL #82

Description

What I’m seeing (root cause)

Evidence from the codebase (SQLite engine wiring)

Why this matters

Suggested fix direction (high-level)

Citations

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions