feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#4
feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#4lohanidamodar wants to merge 7 commits into
Conversation
Pin `utopia-php/query` to `dev-feat/clickhouse-insert-delete-settings-mv as 0.3.2` to consume the new ClickHouse builder + schema layer (groupByTimeBucket, named-typed bindings, Schema\ClickHouse, Builder\ClickHouse insertFormat/deleteMode). Flip composer to `minimum-stability: dev` with `prefer-stable: true` so the alias resolves. Tracks utopia-php/query#11; flip to `^0.3.2` once tagged.
Replace Query::TYPE_* string constants (removed upstream in 0.3.x) with Method enum cases. Behaviour unchanged; the GroupByTimeBucket case is the new name for the deferred groupByInterval/intervalGroupBy bucket — still a no-op for the Database adapter since it has no time-bucketed aggregation path.
PHPStan 1.x ships php-parser 4.x which cannot parse the asymmetric visibility (`public protected(set)`) used by utopia-php/query 0.3.x's ClickHouse schema layer (Schema\Table\ClickHouse), causing an internal "Lexer::getNextToken returned null" crash on the usage adapter source file. Bump to PHPStan ^2.0 (ships php-parser 5.x, understands PHP 8.4) and capture the pre-existing "mixed array access" findings from json_decode result handling in a baseline so net analysis stays green while the adapter migrates. Wire up a phpstan.neon that runs at level max over src + tests.
Replaces the hand-rolled SQL strings in src/Usage/Adapter/ClickHouse.php
with the utopia-php/query 0.3.x builder + schema layer. The adapter is
now a thin HTTP runtime: every DDL statement, INSERT, SELECT, and DELETE
fragment is produced by Schema\ClickHouse or Builder\ClickHouse.
Schema (DDL via Utopia\Query\Schema\ClickHouse):
- setup() emits events/gauges tables through Table\ClickHouse with typed
columns (string, datetime, bigInteger), Engine::MergeTree, monthly
partitioning, ORDER BY, bloom_filter skip indexes, and SETTINGS.
- createDailyTable() emits the SummingMergeTree daily aggregation
table via Engine::SummingMergeTree('value').
- createDailyMaterializedView() emits the MV via
Schema\ClickHouse::createMaterializedView. The MV body remains a hand
SELECT because subquery-aggregation MV bodies do not round-trip
cleanly through the builder yet (deferred upstream).
Reads (SELECT via Utopia\Query\Builder\ClickHouse):
- A per-call builder is initialised with `useNamedBindings()` plus
`withParamTypes()` so every `?` placeholder rewrites to ClickHouse
`{paramN:Type}` form (DateTime64(3), Int64, String, Nullable(String)).
- find/findFromTable, findAggregatedFromTable, count/countFromTable,
sum/sumFromTable, findDaily (FINAL), sumDaily, sumDailyBatch,
getTotal/getTotalFromEvents/getTotalFromGauges, getTotalBatch,
getTimeSeries/getTimeSeriesFromTable all compose filters via
$builder->filter([Query::...]) and emit aggregations through
$builder->sum()/->count()/selectRaw.
- Time-bucketed aggregation consumes Query::groupByTimeBucket via the
builder's native GROUP BY emission, paired with selectRaw/orderByRaw
for the bucket projection and ordering.
- Cursor pagination keeps the tuple-keyset comparison as a whereRaw
fragment and merges its named bindings into the builder Statement at
HTTP execute time.
Writes (INSERT via Utopia\Query\Builder\ClickHouse::insertFormat):
- addBatch() builds `INSERT INTO t (...) FORMAT JSONEachRow` via the
builder's insertFormat helper; the JSONEachRow payload is streamed in
the HTTP body.
Deletes (DELETE via Utopia\Query\Builder\ClickHouse::delete):
- purge() and purgeDaily() emit lightweight DELETE through the builder.
UsageQuery is removed. Downstream callers must switch from
`UsageQuery::groupByInterval('time', '1h')` to
`Query::groupByTimeBucket('time', '1h')`. The 0.3.x library rejects
non-enumerated intervals through ValidationException at construction.
HTTP transport and configuration setters (validateHost/validatePort,
escapeIdentifier, setNamespace/setTenant/setSharedTables/setDatabase/
setSecure/setTimeout/setCompression/setKeepAlive/setMaxRetries/
setRetryDelay/setAsyncInserts, healthCheck, query/insert/executeWithRetry,
logQuery, getConnectionStats/getQueryLog/clearQueryLog) stay untouched -
the migration is strictly to the SQL-building layer.
LowCardinality wrapping on the country column is dropped (the schema
layer wraps `LowCardinality` inside `Nullable`, but ClickHouse only
accepts `LowCardinality(Nullable(T))`; the column stays a plain
Nullable(String) until the upstream wrapping order is fixed - deferred
upstream).
References utopia-php/query#11.
Adds tests/Usage/Adapter/ClickHouseSqlSnapshotTest.php covering DDL (events table, daily SummingMergeTree, materialized view), the named-typed binding read path, groupByTimeBucket aggregation, INSERT FORMAT JSONEachRow, getTimeSeries shape, FINAL on daily reads, and the lightweight DELETE form. 9 snapshot tests / 30 assertions. Bumps Dockerfile from php:8.3.3-cli-alpine3.19 to php:8.4.21-cli-alpine3.23 and composer.json's php constraint from >=8.0 to >=8.4 so CI matches the utopia-php/query 0.3.x asymmetric-visibility requirement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryMigrates
Confidence Score: 3/5The migration is structurally sound, but getTimeSeries rejects valid bucket intervals that find()+groupByTimeBucket accepts, breaking any caller that previously used those intervals. The INTERVAL_FUNCTIONS constant used by getTimeSeries covers only 1h and 1d, while the new bucketFunctionFor() helper supports seven. Callers migrating from the deleted UsageQuery::groupByInterval to Query::groupByTimeBucket with intervals other than 1h/1d will get a runtime InvalidArgumentException from getTimeSeries despite those intervals working correctly in the find() path. src/Usage/Adapter/ClickHouse.php — the INTERVAL_FUNCTIONS constant and getTimeSeries validation logic. Important Files Changed
Reviews (3): Last reviewed commit: "chore(deps): bump utopia-php/query pin t..." | Re-trigger Greptile |
| "prefer-stable": true, | ||
| "require": { | ||
| "php": ">=8.0", | ||
| "php": ">=8.4", |
There was a problem hiding this comment.
Dev-branch dependency and loosened stability setting
"minimum-stability": "dev" means Composer can resolve any transitive dependency to a dev release. Combined with "utopia-php/query": "dev-feat/clickhouse-insert-delete-settings-mv as 0.3.2", the locked revision is a mutable branch tip — if the branch is force-pushed or rebased after composer.lock is committed, any fresh composer update would silently pull a different commit. The PR description acknowledges this needs flipping to ^0.3.2 once tagged.
There was a problem hiding this comment.
Acknowledged — this is intentional during the rollout and is called out in the PR description. The plan is:
- Merge the upstream
utopia-php/queryPR and tag0.3.2(or whatever the next stable carries the ClickHouse builder + Method enum surface used here). - Flip this PR's
composer.jsonback to"utopia-php/query": "^0.3.2"and restore"minimum-stability": "stable"(droppingprefer-stable) in the same commit, refreshcomposer.lock, and only then merge.
Until the tag exists the dev-branch alias is required so CI can resolve the dependency at all. Leaving this thread open so it stays visible as a pre-merge gate.
…ment import Inline `$this->query(...)` sites already pass merged named bindings explicitly, so the wrapper is dead. Removes the corresponding PHPStan baseline entry.
| @@ -172,49 +166,24 @@ public function setTimeout(int $milliseconds): self | |||
| return $this; | |||
| } | |||
|
|
|||
| /** | |||
| * Enable or disable query logging for debugging. | |||
| * | |||
| * @param bool $enable Whether to enable query logging | |||
| * @return self | |||
| */ | |||
| public function enableQueryLogging(bool $enable = true): self | |||
| { | |||
There was a problem hiding this comment.
Let's keep the docs on public methods
| 'integer' => $table->addColumn($id, ColumnType::BigInteger), | ||
| 'float' => $table->addColumn($id, ColumnType::Float), | ||
| 'boolean' => $table->addColumn($id, ColumnType::Boolean), | ||
| 'datetime' => $table->datetime($id, 3), |
There was a problem hiding this comment.
There should be shorthand methods we can use for integer, float and boolean too
utopia-php/query 0.3.2 is now tagged and published with all ClickHouse builder/schema features this PR depends on. Switch from the temporary dev-branch alias to the stable semver constraint and restore minimum-stability: stable (drop the temporary 'dev' relaxation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| if (!isset(self::INTERVAL_FUNCTIONS[$interval])) { | ||
| throw new InvalidArgumentException("Invalid interval '{$interval}'. Allowed: " . implode(', ', array_keys(self::INTERVAL_FUNCTIONS))); |
There was a problem hiding this comment.
getTimeSeries interval set is narrower than find() + groupByTimeBucket
INTERVAL_FUNCTIONS only covers 1h and 1d, so calling getTimeSeries($metrics, '5m', ...) throws InvalidArgumentException even though find([Query::groupByTimeBucket('time', '5m'), ...]) succeeds. The deleted UsageQuery::VALID_INTERVALS supported seven intervals (1m, 5m, 15m, 1h, 1d, 1w, 1M), and bucketFunctionFor() (added in this PR) supports the same seven. Any caller that migrated from UsageQuery::groupByInterval('time', '5m') to Query::groupByTimeBucket('time', '5m') and uses getTimeSeries will hit a runtime error after this change.
| /** @var array<string, string> Maps interval strings to ClickHouse time bucket functions */ | ||
| private const INTERVAL_FUNCTIONS = [ | ||
| '1h' => 'toStartOfHour', | ||
| '1d' => 'toStartOfDay', | ||
| ]; |
There was a problem hiding this comment.
getTimeSeriesFromTable doesn't use bucketFunctionFor() for unsupported intervals
INTERVAL_FUNCTIONS is used both as the allowlist and as the function-name lookup inside getTimeSeriesFromTable. The new bucketFunctionFor() already covers all seven intervals and would make the behaviour consistent with findAggregatedFromTable. Delegating to bucketFunctionFor() eliminates the divergence.
| /** @var array<string, string> Maps interval strings to ClickHouse time bucket functions */ | |
| private const INTERVAL_FUNCTIONS = [ | |
| '1h' => 'toStartOfHour', | |
| '1d' => 'toStartOfDay', | |
| ]; | |
| /** @var array<string, string> Maps interval strings to ClickHouse time bucket functions */ | |
| private const INTERVAL_FUNCTIONS = [ | |
| '1m' => 'toStartOfMinute', | |
| '5m' => 'toStartOfFiveMinutes', | |
| '15m' => 'toStartOfFifteenMinutes', | |
| '1h' => 'toStartOfHour', | |
| '1d' => 'toStartOfDay', | |
| '1w' => 'toStartOfWeek', | |
| '1M' => 'toStartOfMonth', | |
| ]; |
Summary
Migrates
src/Usage/Adapter/ClickHouse.php(3252 → 2854 lines) to consumeutopia-php/query0.3.x. The adapter is now a thin HTTP runtime — every DDL, INSERT, SELECT, and DELETE goes throughSchema\ClickHouse/Builder\ClickHouse.Depends on utopia-php/query#11 (branch
feat/clickhouse-insert-delete-settings-mv). Once that PR is tagged 0.3.2, flip composer fromdev-feat/clickhouse-insert-delete-settings-mv as 0.3.2to^0.3.2.What changed
Schema (DDL via
Utopia\Query\Schema\ClickHouse)setup()emits events/gauges tables throughTable\ClickHousewith typed columns,Engine::MergeTree, monthly partitioning, ORDER BY, bloom_filter skip indexes, and SETTINGS.createDailyTable()usesEngine::SummingMergeTree('value').createDailyMaterializedView()usesSchema\ClickHouse::createMaterializedView; the MV body remains a hand SELECT because subquery-aggregation MV bodies do not round-trip cleanly through the builder yet (deferred upstream).Reads (SELECT via
Utopia\Query\Builder\ClickHouse)useNamedBindings()+withParamTypes()so every?placeholder rewrites to ClickHouse{paramN:Type}form (DateTime64(3),Int64,String,Nullable(String)).find/findFromTable,findAggregatedFromTable,count/countFromTable,sum/sumFromTable,findDaily(FINAL),sumDaily,sumDailyBatch,getTotal/getTotalFromEvents/getTotalFromGauges,getTotalBatch,getTimeSeries/getTimeSeriesFromTable.Query::groupByTimeBucketvia the builder's native GROUP BY emission, paired withselectRaw/orderByRawfor the bucket projection and ordering.whereRawfragment and merges its named bindings into the builderStatementat HTTP execute time (upstream gap, deferred).Writes (INSERT via
Builder\ClickHouse::insertFormat)addBatch()buildsINSERT INTO t (...) FORMAT JSONEachRow; row payload still streamed in HTTP body.Deletes (DELETE via
Builder\ClickHouse::delete)purge()andpurgeDaily()emit lightweightDELETE FROMthrough the builder.Removed
src/Usage/UsageQuery.php—groupByIntervalreplaced byUtopia\Query\Query::groupByTimeBucket. BREAKING: downstream callers must switch fromUsageQuery::groupByInterval('time', '1h')toQuery::groupByTimeBucket('time', '1h').Adapters:
Methodenum migrationconvertQueriesToDatabaseinAdapter/Database.phpswitched toMethodenum cases. Same for the ClickHouse adapter's filter-method handling.Infra
Dockerfilebumped tophp:8.4.21-cli-alpine3.23.composer.jsonPHP constraint bumped to>=8.4(matches the 0.3.x query lib requirement).phpstanbumped to^2.0for PHP 8.4 compatibility.Test plan
composer lintcleancomposer check(PHPStan max) cleanClickHouseSqlSnapshotTest: 9/9 green (DDL, daily, MV, find with typed bindings, groupByTimeBucket aggregation, INSERT FORMAT, getTimeSeries, FINAL, lightweight DELETE)Still deferred upstream (follow-up PRs)
whereRaw)utopia-php/database)Related
groupByTimeBucket, named-typed bindings)