build: kill PROXYSQLGENAI flag, PROXYSQL40=1 builds all plugins by renecannao · Pull Request #5814 · sysown/proxysql

renecannao · 2026-05-23T18:53:55Z

PROXYSQLGENAI was a leftover from before the genai plugin carve-out. All genai/MCP/AI/RAG code now lives in plugins/genai/ as a runtime-loadable .so. Since the plugin chassis (PROXYSQL40=1) already builds and loads the genai .so, a separate flag serves no purpose.

What changed

Entrypoints (rhel-compliant, suse-compliant, deb-compliant)

Build: pass PROXYSQL40=1 (not PROXYSQLGENAI=1) to make. All three now use the same EXTRA pattern.
Packaging: copy genai .so alongside mysqlx when PROXYSQL40=1 — no separate PROXYSQLGENAI gate.
protobuf-devel install: gate on PROXYSQL40=1 (needed for mysqlx plugin build).

Automated package verification (added to all 3 entrypoints)

Package content check: rpm -qpl / dpkg -c confirms both .so files are present. Fails build if missing.
ELF smoke test: extracts .so files, verifies they're valid ELF 64-bit shared objects, checks they export proxysql_plugin_descriptor_v1. Fails build on mismatch.

Build/test infrastructure

test/tap/tests/Makefile: changed 2 vec.o link gates from PROXYSQLGENAI to PROXYSQL40.
test/deps/cluster_simulator/Makefile: same change.
Updated all references in 14 genai test file comments.

Documentation cleanup

CLAUDE.md, include/cpp.h, lib/proxy_sqlite3_symbols.cpp, src/main.cpp, proxysql.spec, and test infra scripts updated.

Verification

PROXYSQL40=1 make packages now builds and packages both mysqlx and genai .so files.
PROXYSQLGENAI env var is no longer consulted anywhere in the build or packaging pipeline.

Summary by CodeRabbit

New Features
- PostgreSQL DNS cache with background resolver workers; multi-value SET parsing for PostgreSQL.
Bug Fixes
- Fixed large-packet double-free; corrected transaction isolation/read-only variable handling; improved MariaDB collation detection; enhanced SET STATEMENT whitespace tolerance.
Documentation
- Consolidated v4.0 plugin packaging semantics; added plugin integrity validation in packaging flows.
Tests / CI
- Added PgSQL DNS-cache and regression tests; new stmt-execute and collation tests; CI package install verification and louder test failure behavior.

PR #5708 added detection for MariaDB's `SET STATEMENT … FOR …` syntax to keep ProxySQL from invoking `unable_to_parse_set_statement()` and locking the session to a single hostgroup. The detection at `lib/MySQL_Session.cpp` was: if (strncasecmp(nq.c_str(), "SET STATEMENT ", 14) == 0 && strcasestr(nq.c_str(), " FOR ")) { return false; } `strcasestr(nq, " FOR ")` is a literal-substring search requiring a space BOTH before and after the FOR keyword. A customer report (see PR #5708 thread) reproduced the same error 9006 as before the fix by typing the statement across multiple lines at the mysql client prompt — the character after FOR is then a newline, not a space, so the detection misses, the code falls through to the regular SET parser, the parser doesn't know SET STATEMENT, and the hostgroup is locked to the user's default. The next non-default-hostgroup query fails with 9006. Verified with a standalone strcasestr probe: q1 (customer multi-line): strcasestr(" FOR ") matches ? NO q2 (existing TAP single-line): YES The fix uses `dig` (the digest text) for the detection instead of `nq` (the raw query). `dig` is whitespace-normalised by the digest builder, so " FOR " always has a space on both sides regardless of how the client formatted the query. This is consistent with the prefix check 35 lines above, which already uses `dig` for keyword detection. == Test coverage == `test/tap/tests/test_set_statement_for-t.cpp` previously had `plan(5)` and self-skipped on non-MariaDB backends via `skip_all()`. It was also registered ONLY in MySQL-family TAP groups (`legacy-g2`, `mysql84-g2`, etc.). The net effect was that the test self-skipped on every CI run since it was added — never actually executing an assertion in CI. The customer's bug regressed entirely unnoticed. This commit: 1. Removes the runtime MariaDB-detection helper and the `skip_all` call. A misregistered test should fail loudly, not silently skip; correct registration is enforced through the groups.json layer instead. 2. Moves the groups.json registration from the MySQL-only groups to `mariadb10-galera-g2`, which actually provides a MariaDB backend. 3. Adds four new test cases (assertions 6–13) covering the customer whitespace shapes that strcasestr (" FOR ") missed: newline, tab, CRLF, and multiple spaces between FOR and the inner statement. Each case verifies (a) the SET STATEMENT succeeds, and (b) a follow-up plain SELECT succeeds (i.e. no hostgroup lock persists — the customer's actual failure mode). Verified locally on TAP group mariadb10-galera-g2 against MariaDB 10.11.9: 13/13 ok, RC=0. A separate framework-cleanup follow-up should remove `skip_all()` from `test/tap/tap/tap.{h,cpp}` entirely and convert the other four callers (`test_mysqlx_e2e_*`, `test_mysqlx_route_drop_inflight-t`, `test_mysqlx_sigterm_inflight-t`) to either fail loudly or move to appropriately-gated TAP groups.

…ions Stage 2 follow-up to the PR #5708 reproduction debug. The framework helper test/tap/tap/tap.{h,cpp}::skip_all() prints "1..0 # skip ..." and exit(0), which the harness reports as "ignored" — zero assertions emitted, zero failures recorded, no notice that a test never ran. While investigating the customer regression on PR #5708 we discovered that test_set_statement_for-t had been silently skipping in CI since the day it was added: it self-skipped on non-MariaDB backends, but was registered only in MySQL-family TAP groups, so every run skipped. PR #5794 fixes that one test. The principled fix is to remove the mechanism that lets a test silently disappear. This commit converts the remaining four skip_all() callers and deletes the framework function: - test_mysqlx_e2e_routing-t BAIL_OUT if MYSQLX_E2E_PROXYSQL_PORT is not set. The group env file test/tap/groups/mysqlx-e2e/env.sh sets it; missing at runtime means the harness did not source it. - test_mysqlx_e2e_handshake-t BAIL_OUT if MYSQLX_E2E_HOST is not set (same env file, same rationale). - test_mysqlx_route_drop_inflight-t BAIL_OUT if the X-Protocol listener is not reachable. The mysqlx-soak-g1 harness is supposed to start ProxySQL with the mysqlx plugin and provision the route; a missing listener is a setup bug. - test_mysqlx_sigterm_inflight-t Deleted. The binary was a stub whose body was a TODO; the real sigterm scenario is implemented in test/scripts/mysqlx/behavioral_validation.py --scenario sigterm. With skip_all gone the stub had no remaining purpose. Also dropped from groups.json and the Makefile. After this commit, the test/tap codebase has zero call sites of skip_all(). The framework declaration and definition are deleted from tap.h and tap.cpp; the doc references at the bottom of tap.cpp are updated; a comment explains the rationale and points at BAIL_OUT() as the replacement. Verified locally: full TAP debug build succeeds, no undefined-reference errors, lint_groups_json + check_groups --source both green. Depends on PR #5794 (which removed the fifth caller in test_set_statement_for-t.cpp). This PR is based on that branch; merging in order keeps each merge a clean compile.

…verage is cancelled The previous pattern opened a LouisBrunner check_run with status=in_progress at job start and closed it via check_id at job end with if: always(). When the runner is killed mid-job (concurrency:cancel-in-progress, OOM, infra shutdown), the closing step never executes and the check_run is left in in_progress forever, blocking the PR rollup. Collapse to a single LouisBrunner call at the end: pass name: instead of check_id: so the action creates the check_run with a terminal conclusion in one shot. If the job is cancelled before the closing step runs, no check_run is ever created (PR rollup shows "missing" — recoverable via re-run) instead of "stuck" — unrecoverable without GitHub App auth. Refs: stuck check_run 77381366265 on PR #5752 (sha cf0b1d3) from 2026-05-22 12:39 UTC. Same pattern exists across ~163 other CI-* workflows; fixing only the workflow with the observed incident here. Sweep of the remaining files belongs in a separate PR.

test(framework): remove skip_all() — fail loudly on missing preconditions (Stage 2 follow-up to PR #5794)

…value list ParserSQL v1.0.2's PG SET parser had two bugs that were hidden in CI until PR #5760 fixed `test/infra/control/ensure-infras.bash` to actually dispatch `pre-proxysql.sql` hooks (previously only `.bash` hooks ran). With the harness fix in place, `set_parser_algorithm_3-g1` for the first time ran its tests under `pgsql-set_parser_algorithm=3`, surfacing: * `pgsql-set_parameter_validation_test-t` — `SET search_path TO "$user", public` parsed as `search_path = "$user"`, dropping every value after the first comma. * `pgsql-set_statement_test-t` — `SET TIME ZONE 'UTC'` parsed as `time = ZONE`, dropping the actual timezone value entirely. ## Upstream fix ParserSQL v1.0.3 (ProxySQL/ParserSQL PR #38) is the upstream parser fix: * PG `SET TIME ZONE <value>` is now recognised as the PG alias for `SET TimeZone = <value>`. The two-word keyword is detected via case-insensitive identifier lookahead since the tokenizer has no dedicated TK_TIME / TK_ZONE keyword. * PG `parse()` no longer shares MySQL's comma loop. In PG, the comma after the first value is value-list continuation — each subsequent expression is appended as another child of the same NODE_VAR_ASSIGNMENT node, alongside the first RHS. Full upstream test suite: 1223/1223 pass; 0 regressions. ## ProxySQL-side changes * `deps/parsersql/parsersql-1.0.3.tar.gz` — new tarball, replaces 1.0.2. * `deps/parsersql/parsersql` symlink retargeted to `parsersql-1.0.3`. * `lib/Query_Processor_ParserSQL.cpp::walk_set_stmt` — the PG multi-value fix means a single NODE_VAR_ASSIGNMENT can have multiple RHS siblings. The adapter now iterates `target->next_sibling` chain instead of reading a single sibling, so the `map<string, vector<string>>` for `SET search_path TO 'a', 'b', 'c'` carries `["a", "b", "c"]` rather than `["a"]`. Fully backward-compatible for MySQL (always one RHS). * `test/tap/tests/setparser_parsersql_test.cpp` — adds four permanent regression-test groups exercising the library + adapter end-to-end without a live backend: * `mysql_filtered_set` (7 cases) — verifies the filtered-SET variables parse cleanly so the downstream filter logic gets the input it expects. * `mysql_set_testing` (4 cases) — multi-variable comma-separated SET samples from `set_testing-240.csv`. * `pgsql_search_path` (8 cases) — single-value and multi-value shapes from `pgsql-set_parameter_validation_test-t`. * `pgsql_time_zone` (4 cases) — every `SET TIME ZONE ...` shape exercised by `pgsql-set_statement_test-t`. The 4 new assertions for the `INTERVAL '7' HOUR` case capture today's actual ParserSQL output (`timezone = [INTERVAL]`) — the expression parser does not yet consume the full interval modifier chain. The test reflects observable behaviour so it stays green; tightening to capture the full interval value is a future ExpressionParser improvement. ## Test plan - [x] Local rebuild of ProxySQL from the new 1.0.3 tarball: clean. - [x] `./setparser_parsersql_test`: 46 new assertions, all pass. - [x] No regressions in the 256 pre-existing assertions. The same 10 pre-existing failures (sql_mode 23-25, Set1_v2 0-1) — unrelated to this work, present before the upgrade. - [ ] CI on this branch: TAP groups `legacy-g2`, `mysql84-g2`, and `set_parser_algorithm_3-g1` are expected to flip green for the ParserSQL-related failures. Two non-ParserSQL bugs on the algorithm=3 ProxySQL session-dispatch path (`test_filtered_set_statements-t`, `set_testing-t`) remain to be investigated separately — the adapter now returns the correct maps, but downstream dispatch still misbehaves. ## Dependency Should not be merged before ProxySQL/ParserSQL PR #38 is merged and a v1.0.3 tag is cut. The vendored tarball in this commit was built from that PR's branch head; once the upstream tag exists the tarball SHA should be identical to a fresh `git archive --prefix=ParserSQL-1.0.3/ v1.0.3 | gzip` from the upstream tag.

@FoZo

…kends (#5790) Commit 7c6f42b ("fix: MySQL 9.x charset handling and log_last_insert_id test race") changed the version guard in lib/MySQL_Variables.cpp from testing the first character of server_version to using atoi(): - if (charset >= 255 && myconn->mysql->server_version[0] != '8') { + if (charset >= 255 && atoi(myconn->mysql->server_version) < 8) { The new form correctly extends the "supports collation id 255" treatment to MySQL 9.x and later, but it silently broke MariaDB: server_version old check new check ---------------------------- --------- --------- "5.7.x" replace replace "8.0.x" skip skip "9.x.y" replace[*] skip "10.11.13-MariaDB" replace skip <-- BUG "11.x.x-MariaDB" replace skip <-- BUG [*] wrong for MySQL 9.x; that is the case the original commit fixed. MariaDB does not implement utf8mb4_0900_ai_ci (collation id 255), regardless of its numeric major version. With the regression, ProxySQL no longer rewrites the unsupported collation before sending SET NAMES to the MariaDB backend, and the backend rejects it with: ERROR 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_ci' …breaking clients such as Dovecot that pick collation 255 from libmariadb's charset table at connect time. Fix: also treat MariaDB (detected by the "MariaDB" suffix in server_version) as a backend that cannot accept collation id >= 255, so handle_unknown_charset replacement applies there as well. The regression is present on v3.0 HEAD (and on the v3.1.8 / v4.0.8 tags), but NOT in the v3.0.7 / v3.0.8 release tags themselves; users hitting this in production are on post-v3.0.8 "v3.0-head" CI packages (`3.0.8-NN-gHASH`). Reported-by: @FoZo (#5790) Also-confirmed-by: @akrus Adds reg_test_5790-mariadb_collation_255-t, registered in mariadb10-galera-g1, which: 1. Skips when the backend is not MariaDB. 2. Sets mysql-handle_unknown_charset=1 and mysql-default_collation_connection=utf8mb4_general_ci. 3. Issues `SET NAMES utf8mb4 COLLATE utf8mb4_0900_ai_ci`. 4. Runs a SELECT that forces backend SET NAMES via handler_again___status_CHANGING_CHARSET. 5. Verifies the SELECT succeeds and that the backend's @@collation_connection is the configured default, not utf8mb4_0900_ai_ci. This test fails without the fix (errno 1273 from the backend) and passes with it.

…ehavior PR #5755 (commit e6bef5c) fixed `process_pg_typecast()` in `lib/pgsql_tokenizer.cpp` so that the trailing space after a PG typecast is preserved unless the next non-space character is a modifier `(` or array bracket `[`. The commit message documented this as an intentional behaviour change: "The new lookahead-conditional skip preserves the trailing space unless the very next non-space character is a modifier `(` or array bracket `[`." Three test cases in `tokenizer_payloads/pgsql_regular_tokenizer_digests.hjson` hardcoded expected digests against the OLD (buggy) tokenizer output, where the space-eating bug glued the preceding literal to the following operator. With the fix in place those expectations mismatch, making `pgsql-query_digests_stages_test-t` fail deterministically in the legacy-g4 TAP group on v3.0 HEAD: * `::json -> 'key'` STAGE 1: actual `select ? -> ?` vs old expected `select ?-> ?` * `::jsonb @> ...` STAGE 1-4: actual `select ? @> ?` vs old expected `select ?@> ?` * `::money + ...` STAGE 1: actual `select ? + ?` vs old expected `select ?+ ?` For `->` and `+`, only STAGE 1 changes — STAGE 2-4's existing whitespace-collapse step still glues `?->` and `?+` for those operators, so `s2`/`s3`/`s4` expectations stay as-is. For `@>`, none of the stages collapse the space, so all four expectations move. The new actual digest output is the more correct one — it mirrors the existing MySQL tokenizer's behaviour for the same shape, where `'literal' + 'literal'` already produces `? + ?` at STAGE 1. This commit only updates the fixture; no tokenizer or test source changes. Verified locally against the existing test binary (built from v3.0 HEAD, includes the new tokenizer behaviour): ./pgsql-query_digests_stages_test-t → 734/734 ok, RC=0 The 9 previously-failing assertions (#381, #385, #389-#392, #592, #596, #600) all flip to ok with the updated fixture.

When handler_WCD_SS_MCQ_qpo_LargePacket() is reached via the COM_STMT_EXECUTE path, MySQL_Protocol::get_binds_from_pkt() has already aliased stmt_meta->pkt to pkt->ptr (see the FIXME at MySQL_Protocol.cpp:2873). The subsequent RequestEnd() -> Query_Info::end() free()s stmt_meta->pkt (the fix introduced for bug #796 at MySQL_Session.cpp:455), and then LargePacket's own l_free(pkt->size, pkt->ptr) frees the same buffer a second time. With jemalloc this silently corrupts the arena and surfaces later as the SIGSEGV reported in #5639, with the crashing frame landing on the l_free call site (+0x10e into LargePacket). Under ASAN the double-free aborts deterministically with a backtrace whose two free() chains match the report exactly. Fix: capture whether CurrentQuery.stmt_meta aliases pkt before calling RequestEnd() (which clears stmt_meta), and skip the second free when the alias was present. For the COM_QUERY path through LargePacket, stmt_meta is always NULL, so the previous behaviour is preserved there. Also add reg_test_5639_stmt_execute_max_allowed_packet-t which exercises the STMT_EXECUTE > mysql-max_allowed_packet path and asserts the session and process survive.

DNS_Cache, DNS_Cache_Record, DNS_Resolve_Data, the resolver worker, and the validate_ip / get_connected_peer_ip_from_socket / debug_iplisttostring helpers used to live inside MySQL_Monitor.{hpp,cpp}. Move them to a new DNS_Cache.{hpp,cpp} so the same plumbing can back an independent PgSQL_Monitor cache. DNS_Cache no longer references GloMyMon directly; counters are wired in per-instance via set_counters(). monitor_dns_resolver_thread no longer reads mysql_thread___resolution_family; the desired ai_family is now a field on DNS_Resolve_Data set by the caller. MySQL_Monitor wires both up in its constructor and resolver loop respectively; behavior is identical. No change to the MySQL DNS cache semantics.

Fixes the watchdog asserts under DNS degradation reported in #5768. Symptom: with a PgSQL backend addressed by hostname, an unhealthy resolver made every PQconnectStart() block ~5s on getaddrinfo inside libpq. Under load, all PgSQL_Thread workers spent the bulk of each loop iteration in that synchronous DNS call, atomic_curtime fell behind the watchdog threshold (poll_timeout + 1s), and after restart_on_missing_heartbeats misses the watchdog asserts. MySQL avoids this because mysql-monitor's DNS cache lets MySQL_Connection::connect_start pass an IP directly to libmariadb; PgSQL had no equivalent. This change adds the equivalent for PgSQL, with state fully independent of the MySQL cache: * PgSQL_Monitor gains its own DNS_Cache, force_dns_cache_update flag, and dns_cache_{queried,lookup_success,record_updated} counters. PgSQL_Monitor::dns_lookup / update_dns_cache_from_pgsql_conn / trigger_dns_cache_update mirror the MySQL_Monitor surface. * A new background thread (PgSQL_monitor_dns_cache_pthread) drives PgSQL_Monitor::monitor_dns_cache(), which walks PgHGM->pgsql_servers_to_monitor and feeds the shared DNSResolverWorker pool. Launched alongside PgSQL_monitor_scheduler and lives independently of pgsql-monitor_enabled, matching MySQL. * PgSQL_Connection::connect_start now consults the cache. On a hit it appends `hostaddr=<ip>` to the libpq conninfo while keeping `host=<hostname>`; libpq documents this combo specifically to skip name resolution while preserving TLS hostname verification. On a miss / IP literal / disabled cache we fall back to the previous behavior (libpq does getaddrinfo). * Successful PgSQL connects seed the cache via update_dns_cache_from_pgsql_conn, so the second connect avoids getaddrinfo even before the resolver loop has visited the host. * The pgsql-monitor_local_dns_cache_ttl / _refresh_interval / _resolver_queue_maxsize variables (already accepted by Admin via PgSQL_Thread's variable registry) now actually propagate to pgsql_thread___monitor_local_dns_* globals. The propagation block in PgSQL_Thread::refresh_variables was previously commented out, so the values never reached anywhere. * The runtime stats output (SQL3_GlobalStatus) now exposes the PgSQL_Monitor_dns_cache_* counters alongside the existing PgSQL_Monitor_* counters. DNSResolverWorker is shared infrastructure added to DNS_Cache.hpp/cpp, used here by PgSQL_Monitor. MySQL_Monitor still uses its existing ConsumerThread-based pool; nothing about the MySQL cache changes.

The names were defined in PgSQL_Thread::variables but sat inside a commented-out block in pgsql_thread_variables_names, so Admin's has_variable() returned false and SET pgsql-monitor_local_dns_* was rejected with "Unknown global variable". Move them out of the /* ... */ so the variables actually show up in global_variables and runtime_global_variables and can be tuned at runtime. Verified end-to-end on a smoke run: SET ... LOAD PGSQL VARIABLES TO RUNTIME succeeds, the resolver loop populates the cache, and a client connect through the proxy increments dns_cache_queried / dns_cache_lookup_success (cache hit, libpq skipped getaddrinfo).

Adds pgsql-test_dns_cache-t mirroring the existing MySQL test_dns_cache test, exercising 16 scenarios: * pgsql-monitor_local_dns_cache_* variables propagate to runtime * IP literal does not touch the cache * Resolvable hostname populates dns_cache_record_updated * Client connect through the proxy increments dns_cache_queried and dns_cache_lookup_success on a cache hit * Hostname with leading/trailing whitespace is trimmed before resolution * Unresolvable hostname is queried but produces no record (cache miss bumps dns_cache_queried only) * Removing pgsql_servers rows drops the orphaned cache record * Cache disabled via refresh_interval=0 flatlines all counters * MySQL DNS cache counters do not move in response to pgsql-side activity (independence between the two caches) While writing the test, step 4 (whitespace trim) caught a real bug: PgSQL_Monitor::monitor_dns_cache was inserting hostnames verbatim from the SQLite3_result into the resolver queue, so ' example.org ' was handed to getaddrinfo() unchanged and failed. MySQL_Monitor sidesteps this by reading via SELECT trim(hostname) FROM monitor_internal.*; the pgsql variant reads directly from pgsql_servers_to_monitor so it needs to call trim() in C++. Fix: trim before validate_ip / insert into the hostnames set. The lookup path already trimmed, so the bookkeeper key now matches. Registered the test in groups.json under legacy-g4 and the mysql-* variant g4 groups, matching how the other pgsql-* tests are grouped.

Fixes for actionable findings on PR #5806: DNS_Cache.cpp / hpp: - lookup(): initialize *ip_count = 0 on the disabled-cache fast path so callers don't observe a stale count from a previous lookup (CodeRabbit). - add_if_not_exist(): only bump counter_record_updated_ when an insert actually happened, and return that signal to callers. Previously a no-op call inflated the cache-update stats (CodeRabbit). - get_connected_peer_ip_from_socket(): drop the malloc, switch to a stack sockaddr_storage, and only assign result on AF_INET / AF_INET6 with a successful inet_ntop. Previously, on an unknown address family, we'd copy the uninitialized ip_addr buffer into result (Gemini + CodeRabbit). - monitor_dns_resolver_thread(): cache_ttl is signed int and at the max configured TTL (7*24*3600*1000 ms) `1000 * cache_ttl` overflows before being added to monotonic_time(). Force the multiply into unsigned long long space with 1000ULL (Gemini). - DNS_Cache::IP_ADDR::counter is now mutable so get_next_ip() (const) can __sync_fetch_and_add it without a const_cast (Gemini). PgSQL_Monitor.cpp: - monitor_dns_cache(): refresh pgsql_thread___monitor_local_dns_* before the loop, not just on the first version-bump. Otherwise the resolver runs its first pass against zero-initialized TTL / refresh interval and starts in the wrong enabled/disabled state until a later config bump (CodeRabbit). test/tap/tests/pgsql-test_dns_cache-t.cpp: - Step 7 ("Cache disabled by refresh_interval=0") was inserting an IP literal (0.0.0.0), which bypasses DNS regardless of the cache state. Switched to a resolvable hostname (example.com) so a regression in the cache-disable path would actually move counters and fail the assertion. Added a third assertion on dns_cache_lookup_success for symmetry. Plan adjusted to 17. Deliberately skipped: - CR comment about mysql_thread___resolution_family change requiring a cache flush: pre-existing MySQL behavior, out of scope for this PR. - CR comment about pthread_attr_init / setstacksize return checks in main.cpp: matches the existing un-checked pattern in MySQL_Monitor::run(). No need to diverge here. - Gemini comment about persistent thread pool vs. per-iteration spawn: MySQL_Monitor::monitor_dns_cache() also recreates the resolver threads each loop iteration; the PgSQL variant follows the same pattern. - Gemini comment about 2048-byte stack size: Thread::start takes ss in KB (lib/thread.cpp:46 `pthread_attr_setstacksize(&attr, ss*1024)`), so 2048 here means 2 MB, matching the MySQL resolver pool.

@gemini-code-assist

Address @gemini-code-assist review on PR #5807: the previous patch unconditionally evaluated `atoi(sv)` in the disjunction `(is_mariadb || atoi(sv) < 8)`. If `sv` was NULL, the short-circuit on `is_mariadb` (false) would still reach `atoi(NULL)`, which is undefined behavior per the C standard. In practice `myconn->mysql->server_version` is non-NULL by the time validate_charset() runs (CHANGING_CHARSET is only reachable after the backend handshake has populated server_version), so this is defensive, not a real crash path. Still essentially free, and keeps the guard consistent with the existing NULL check on `strstr(sv, "MariaDB")`. No behavior change for any reachable input.

The same mental defect fixed in lib/MySQL_Variables.cpp:validate_charset() (commit before this one) exists at four other call sites in core. All four use a first-character or 3-byte string-prefix comparison against server_version and consequently misclassify either MariaDB or MySQL 9.x (or both). Two are latent (MariaDB happens to land on the still-working branch); the two in MySQL_Monitor.cpp are an active break on MySQL 9.x Galera/PXC, because `INFORMATION_SCHEMA.GLOBAL_STATUS` was removed in MySQL 8.4 and MySQL 9.x falls into the else branch that targets it. Audited sites and fixes: lib/MySQL_Session.cpp:2607-2615 -- tx_isolation / tx_read_only Pre-fix: `strncmp(sv,"8",1)==0` chose `transaction_isolation` / `transaction_read_only`; everything else got `tx_isolation` / `tx_read_only`. MySQL 9.x fell to the else branch, but `tx_isolation` was removed in MySQL 8.4; on MySQL 9.x the SET fails. MariaDB also fell to the else branch -- coincidentally correct, since MariaDB keeps `tx_isolation` across all versions. Post-fix: use the modern name only for non-MariaDB MySQL >= 8.0 (any major), so MySQL 8.0/8.4/9.x all get `transaction_isolation` and MariaDB 10/11 keeps `tx_isolation`. lib/MySQL_Monitor.cpp:749 (sync MON_GALERA setup) lib/MySQL_Monitor.cpp:2297 (async MON_GALERA dispatch) Pre-fix: `strncmp(sv,"5.7",3)==0 || strncmp(sv,"8",1)==0` picked performance_schema.global_status; everything else used INFORMATION_SCHEMA.GLOBAL_STATUS. MySQL 9.x fell to the else branch and queries INFORMATION_SCHEMA.GLOBAL_STATUS, which was removed in MySQL 8.4 -- monitor query fails on MySQL 9.x Galera/PXC. MariaDB Galera correctly fell to the else branch and uses information_schema, which is where MariaDB keeps GLOBAL_STATUS. Post-fix: use performance_schema for non-MariaDB MySQL >= 5.7 (so MySQL 5.7, 8.0, 8.4, 9.x all share that path); MariaDB Galera and legacy MySQL/PXC < 5.7 retain the information_schema path. Pattern used at all sites (matching the existing fix in validate_charset): const char *sv = ...->mysql->server_version; bool is_mariadb = (sv != NULL && strstr(sv, "MariaDB") != NULL); int sv_major = (sv != NULL) ? atoi(sv) : 0; No dedicated test added for the Monitor / SET TRANSACTION paths in this commit: the Monitor change is exercised by any mariadb10-galera or mysql{8,9}x-* CI group with Galera/PXC backends, and the transaction_isolation change by SET-isolation-level coverage in existing TAP groups. The validate_charset regression test added in the prior commit on this branch still covers the original #5790 path.

A subagent code review on top of the CodeRabbit + Gemini bot reviews found five additional issues that the bots missed. All five are fixed here. 1. LOAD PGSQL SERVERS TO RUNTIME was waking MySQL's resolver, not PgSQL's. PgSQL_HostGroups_Manager::commit() inherited a MySQL_Monitor::trigger_dns_cache_update() call from a copy of the MySQL HGM, so after a LOAD pgsql hostnames had to wait for the refresh interval (default 60 s) before becoming resolvable — defeating the warm-cache-before-traffic intent that motivated issue #5768. Route to PgSQL_Monitor::trigger_dns_cache_update(). Verified directly: with refresh_interval=60000 ms, inserting a pgsql_servers row + LOAD now produces record_updated=1 within ~4 s instead of waiting a full minute. 2. DNSResolverWorker could deadlock shutdown. monitor_dns_resolver_thread isn't noexcept (vector growth, set_thread_name, etc. can throw); if it threw, the std::promise on DNS_Resolve_Data was never satisfied, the producer's future::get() blocked forever, the resolver loop never reached the shutdown check, and pthread_join in main hung. Wrap the call in try/catch and force-satisfy the promise with a failure result on any uncaught exit. 3. Test step 3 wasn't actually exercising the connect-path lookup it claimed. hammer_proxy() routes through the proxy front-end, which in standard test infra resolves through the default hostgroup — typically configured with an IP literal (see e.g. test/infra/docker-pgsql16-single/conf/proxysql/config.sql). IP literals bypass dns_lookup via validate_ip(), so the dns_cache_queried bumps observed came from the resolver loop alone, not from the connect path. Step 3 would have passed even if PgSQL_Connection::connect_start_DNS_lookup() was wired up wrong. Install a pgsql_query_rules row from inside the test that routes testuser to hostgroup 999, so the proxy actually calls into the cache on client connects. Clean up the rule in the test exit path. 4. Use-after-move in the _dns_cache_update debug log. Both PgSQL_Monitor::_dns_cache_update and MySQL_Monitor::_dns_cache_update logged debug_iplisttostring(ip_address) after std::move'ing ip_address into add_if_not_exist(). The moved-from vector is in a "valid but unspecified" state — typically empty — so the log printed "IP:[]" instead of the IPs. Capture the string before the move. Same bug also fixed in the bookkeeper expired-record requeue path inside PgSQL_Monitor::monitor_dns_cache. 5. monitor_dns_cache had inconsistent indentation inside the `if (hostnames.empty()) { ... } else { ... }` block introduced when I wrapped the original goto-based control flow. The braces were correct but the body sat at the wrong tab depth, making the scope visually misleading and prone to mis-edits. Re-indented. Findings deliberately skipped from the review (rationale in commit body): - "thread_local shared_ptr<DNS_Cache> keeps stale cache alive across GloPgMon recreate" — pre-existing MySQL pattern; GloPgMon is never destroyed mid-process today. - "PgSQL_monitor_dns_cache_pthread can deref GloPgMon during a tight shutdown" — shutdown ordering in main.cpp sets GloPgMon->shutdown before joining the thread and deletes GloPgMon only afterwards, matching the MySQL pattern. - "cache only seeded on ASYNC_CONNECT_SUCCESSFUL" — matches MySQL; pre-existing scope choice. - "validate_ip rejects bracketed IPv6" — pre-existing in MySQL; no current configuration path produces bracketed IPv6 in pgsql_servers. - "Test relies on outbound DNS for example.com" — matches the upstream MySQL test_dns_cache test which uses google.com / yahoo.com / amazon.com. Established convention. - "ai_family hard-coded to AF_UNSPEC for PgSQL" — already commented in code; follow-up if/when pgsql-resolution_family is added.

CI-lint-groups-json requires keys in groups.json to be sorted.

Follow-up to the cluster-2 ParserSQL 1.0.3 bump (this same PR), addressing review feedback from coderabbit + gemini. ## 1. PgSQL_Session: accept multi-value SET vectors (coderabbit, Major) The previous commit `5a48004af` taught the adapter to walk every RHS sibling so PG multi-value lists (e.g. `SET search_path TO "$user", public`) come through as a `std::vector<std::string>` of every value. But `lib/PgSQL_Session.cpp:4516` still had `size() > 2` → `unable_to_parse_set_statement()`, which would have caused 3+ value PG SETs to lock the hostgroup instead of being processed. Worse, the downstream value-consumer at the same site used only `it->second.front()`, silently dropping every value past the first for valid 2-value SETs too. Fix: * Drop the upper bound from the cardinality check (keep `< 1` so empty value lists still fail loudly). * Join every value in the vector into `value1` with the comma-separated form PG itself uses for these parameters (`a, b, c`). Single-value SETs hit the loop with `size()==1` and take just `it->second.front()`, preserving the previous behaviour. The validator / hash-compare / tracker / param_status code below the join is unchanged — it now receives the full joined value instead of the truncated first element. ## 2. setparser_parsersql_test: drop INTERVAL test that locked in incomplete behaviour (subagent correctness review, Major) `parsersql_pgsql_time_zone` previously asserted that `SET TIME ZONE INTERVAL '7' HOUR` produces `timezone = "INTERVAL"`, which is the *current* observable output of ParserSQL — the expression parser does not yet consume the `'7' HOUR` modifier chain. Encoding that as expected output would flip the test red when ParserSQL is later improved to capture the full interval expression, conflating an improvement with a regression. Removed the case; left a comment in the fixture explaining the gap so a future contributor adds the case back together with the ExpressionParser fix. ## 3. Diag-output cleanup (gemini, cosmetic) * `%lu` → `%zu` for `size_t` (portability — `size_t` is not always `unsigned long` on 32-bit / Windows). * Extract the per-vector " | "-separated diag-string builder into a single `join_values_for_diag()` helper used by both `TestParse` (now also dumps full vector on mismatch, where it previously printed just `vals[0]`) and `TestParsePgsql`. ## Verification * `./setparser_parsersql_test`: 260/270 ok; the 10 remaining not-ok are unchanged pre-existing failures (`sql_mode 23-25`, `Set1_v2 0-1`), nothing new from this commit. * `pgsql_time_zone` group: 6 assertions (3 cases × 2) — confirms the INTERVAL case is removed cleanly. * `pgsql_search_path` group: 16 assertions still all pass — the multi-value join in PgSQL_Session does not affect the parser- side test (which goes through the adapter, not the session).

The previous commit introduced local variable declarations with initializers (`const char *sv = …`, `bool is_mariadb = …`, etc.) directly inside `case MON_GALERA:` of the `switch` in `init_async()` at lib/MySQL_Monitor.cpp:742. The sibling `case MON_CLOSE_CONNECTION:` / `MON_CONNECT:` / `MON_AWS_AURORA:` labels then jump past these initializations, which GCC rejects: MySQL_Monitor.cpp:779:14: error: jump to case label MySQL_Monitor.cpp:781:14: error: jump to case label MySQL_Monitor.cpp:783:14: error: jump to case label Wrap the `#else` body of the MON_GALERA case in `{ ... }` so the declarations live in their own scope and no sibling case can jump over them. No behavior change. The async path at lib/MySQL_Monitor.cpp:2299 was already inside its own `{ ... }` block (added with the original code, not by this patch), so it compiled fine and needs no change. The `MySQL_Session.cpp` edits live inside `if / else if` chains with their own braces, also unaffected.

fix(ci): prevent zombie IN_PROGRESS checks in CI-unit-tests-asan-coverage

fix(session): SET STATEMENT ... FOR detection tolerates any whitespace (PR #5708 follow-up)

fix: pgsql digest fixtures (typecast spacing) + ParserSQL 1.0.3 bump (PG SET TIME ZONE + multi-value lists)

Independent DNS cache for PgSQL (fixes #5768)

fix(core): validate_charset must replace collation 255 on MariaDB backends (#5790)

fix(session): avoid double-free of stmt_meta pkt in LargePacket (#5639)

validate_ip() treated inet_pton(...) != 0 as success, but inet_pton returns 1 / 0 / -1; a -1 (e.g. unsupported family) would be reported as a valid IP. Compare against == 1 instead. The DNS cache counters (dns_cache_queried / dns_cache_lookup_success / dns_cache_record_updated) were declared as plain unsigned long long on MySQL_Monitor. Resolver workers update them atomically via __sync_fetch_and_add, but p_update_metrics() reads them from a different thread without a barrier, racing the writers. Promote to std::atomic<unsigned long long> and switch DNS_Cache to fetch_add / the atomic pointer type so both ends are consistent. Addresses CodeRabbit review feedback on PR #5806.

GloPgMon->shutdown was a plain bool, written by main.cpp's teardown and read by the monitor / DNS resolver loops on worker threads — a data race that can let the loops spin past shutdown or observe a stale value indefinitely. Promote to std::atomic<bool> and switch the readers/writer to acquire/release ordering. PgSQL_Monitor::dns_lookup() was also returning an empty string when GloPgMon / dns_cache_thread weren't initialized yet, ignoring the return_hostname_if_lookup_fails contract. Move the fallback out of the dns_cache_thread branch so startup / shutdown behave like a normal cache miss for callers that asked for the hostname fallback. Addresses CodeRabbit review feedback on PR #5806.

Two regressions surfaced by CodeRabbit on PR #5806: * admin_exec() return values for the DNS-cache runtime config ('SET pgsql-monitor_local_dns_cache_ttl' / 'LOAD PGSQL VARIABLES TO RUNTIME') were silently dropped, so a failed setup cascaded into misleading downstream assertions. BAIL_OUT immediately. * The pgsql_query_rules INSERT hardcoded username='testuser', but hammer_proxy() connects as cl.pgsql_username. When the infra's test user is anything else the rule never matches, hostgroup 999 is bypassed, and the DNS-cache assertions exercise the wrong path. Build the INSERT with cl.pgsql_username instead.

The admin connection used cl.host, which is the frontend endpoint. In infrastructures where the admin interface is on a different host the test would false-fail on the admin connect, unrelated to collation behavior. Use cl.admin_host. Addresses CodeRabbit review feedback on PR #5807.

The previous commit promoted MySQL_Monitor::dns_cache_{queried, lookup_success,record_updated} to std::atomic<unsigned long long> but missed several call sites that pass them through implicit copy: * lib/MySQL_Thread.cpp sprintf("%llu", GloMyMon->dns_cache_*) — varargs cannot copy std::atomic. * lib/PgSQL_Thread.cpp sprintf("%llu", GloPgMon->dns_cache_*) — same, for the PgSQL counters. * lib/MySQL_Monitor.cpp p_update_counter(..., GloMyMon->dns_cache_*) — works via implicit operator T(), but explicit .load() is clearer and matches the rest of the file. Promote PgSQL_Monitor::dns_cache_* to std::atomic for the same race that motivated the MySQL side (resolver workers increment, stats readers and p_update_metrics read from another thread), and add .load() at every read site so we don't depend on implicit conversion. Verified with a local 'make -j32 debug' build in the integration worktree.

PROXYSQLGENAI was a leftover from before the genai plugin carve-out. All genai/MCP/AI/RAG code now lives in plugins/genai/ as a runtime loadable .so. Since the plugin chassis (PROXYSQL40=1) already builds and loads the genai .so, a separate flag serves no purpose. Entrypoint changes (all three -- rhel, suse, deb): - Build: pass PROXYSQL40=1 (not PROXYSQLGENAI=1) to make. - Packaging: copy genai .so alongside mysqlx when PROXYSQL40=1. - protobuf-devel install: gate on PROXYSQL40=1 (needed for mysqlx). Automated packaging verification added to all entrypoints: rpm -qpl / dpkg -c confirms both .so files are in the package. ELF smoke test: validates .so is a valid shared object exporting proxysql_plugin_descriptor_v1 symbol. Fails build on mismatch. Test Makefiles: PROXYSQLGENAI -> PROXYSQL40 for the vec.o link gate (test/tap/tests/Makefile and test/deps/cluster_simulator/Makefile). Updated remaining PROXYSQLGENAI references to PROXYSQL40 in: CLAUDE.md, include/cpp.h, lib/proxy_sqlite3_symbols.cpp, src/main.cpp (3x comments), proxysql.spec, test/infra/control/start-proxysql-isolated.bash, test/infra/control/run-unit-tests-asan-coverage.bash, and 14 genai test file comments.

coderabbitai · 2026-05-23T18:54:24Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3ff051a-1b4e-41f3-99e6-dee4bdb91e56

📥 Commits

Reviewing files that changed from the base of the PR and between 9719d4a and ecc60a1.

⛔ Files ignored due to path filters (2)

deps/parsersql/parsersql-1.0.2.tar.gz is excluded by !**/*.gz
deps/parsersql/parsersql-1.0.3.tar.gz is excluded by !**/*.gz

📒 Files selected for processing (83)

.github/workflows/CI-package-arm64-almalinux10-genai.yml
.github/workflows/CI-package-arm64-almalinux10.yml
.github/workflows/CI-package-arm64-almalinux8-genai.yml
.github/workflows/CI-package-arm64-almalinux8.yml
.github/workflows/CI-package-arm64-almalinux9-genai.yml
.github/workflows/CI-package-arm64-almalinux9.yml
.github/workflows/CI-package-arm64-centos10-genai.yml
.github/workflows/CI-package-arm64-centos10.yml
.github/workflows/CI-package-arm64-centos9-genai.yml
.github/workflows/CI-package-arm64-centos9.yml
.github/workflows/CI-package-arm64-debian12-genai.yml
.github/workflows/CI-package-arm64-debian12.yml
.github/workflows/CI-package-arm64-debian13-genai.yml
.github/workflows/CI-package-arm64-debian13.yml
.github/workflows/CI-package-arm64-fedora42-genai.yml
.github/workflows/CI-package-arm64-fedora42.yml
.github/workflows/CI-package-arm64-fedora43-genai.yml
.github/workflows/CI-package-arm64-fedora43.yml
.github/workflows/CI-package-arm64-fedora44-genai.yml
.github/workflows/CI-package-arm64-fedora44.yml
.github/workflows/CI-package-arm64-opensuse15-genai.yml
.github/workflows/CI-package-arm64-opensuse15.yml
.github/workflows/CI-package-arm64-opensuse16-genai.yml
.github/workflows/CI-package-arm64-opensuse16.yml
.github/workflows/CI-package-arm64-ubuntu22-genai.yml
.github/workflows/CI-package-arm64-ubuntu22.yml
.github/workflows/CI-package-arm64-ubuntu24-genai.yml
.github/workflows/CI-package-arm64-ubuntu24.yml
.github/workflows/CI-unit-tests-asan-coverage.yml
.github/workflows/gh-actions-reusable/verify-package-install.yml
CLAUDE.md
deps/parsersql/parsersql
docker/images/proxysql/deb-compliant/entrypoint/entrypoint.bash
docker/images/proxysql/rhel-compliant/entrypoint/entrypoint.bash
docker/images/proxysql/rhel-compliant/rpmmacros/rpmbuild/SPECS/proxysql.spec
docker/images/proxysql/suse-compliant/entrypoint/entrypoint.bash
include/DNS_Cache.hpp
include/MySQL_Monitor.hpp
include/PgSQL_Connection.h
include/PgSQL_Monitor.hpp
include/cpp.h
include/proxysql_structs.h
lib/DNS_Cache.cpp
lib/Makefile
lib/MySQL_Monitor.cpp
lib/MySQL_Session.cpp
lib/MySQL_Thread.cpp
lib/MySQL_Variables.cpp
lib/PgSQL_Connection.cpp
lib/PgSQL_HostGroups_Manager.cpp
lib/PgSQL_Monitor.cpp
lib/PgSQL_Session.cpp
lib/PgSQL_Thread.cpp
lib/Query_Processor_ParserSQL.cpp
lib/proxy_sqlite3_symbols.cpp
src/main.cpp
test/deps/cluster_simulator/Makefile
test/infra/control/run-unit-tests-asan-coverage.bash
test/infra/control/start-proxysql-isolated.bash
test/infra/control/verify-package-install.bash
test/tap/groups/groups.json
test/tap/tap/tap.cpp
test/tap/tap/tap.h
test/tap/tests/Makefile
test/tap/tests/pgsql-test_dns_cache-t.cpp
test/tap/tests/reg_test_5639_stmt_execute_max_allowed_packet-t.cpp
test/tap/tests/reg_test_5790-mariadb_collation_255-t.cpp
test/tap/tests/setparser_parsersql_test.cpp
test/tap/tests/test_mysqlx_e2e_handshake-t.cpp
test/tap/tests/test_mysqlx_e2e_routing-t.cpp
test/tap/tests/test_mysqlx_route_drop_inflight-t.cpp
test/tap/tests/test_mysqlx_sigterm_inflight-t.cpp
test/tap/tests/test_set_statement_for-t.cpp
test/tap/tests/tokenizer_payloads/pgsql_regular_tokenizer_digests.hjson
test/tap/tests/unit/genai_discovery_schema_unit-t.cpp
test/tap/tests/unit/genai_fts_string_unit-t.cpp
test/tap/tests/unit/genai_llm_clients_unit-t.cpp
test/tap/tests/unit/genai_mcp_endpoint_unit-t.cpp
test/tap/tests/unit/genai_mcp_thread_unit-t.cpp
test/tap/tests/unit/genai_mysql_catalog_unit-t.cpp
test/tap/tests/unit/genai_query_handler_unit-t.cpp
test/tap/tests/unit/genai_stats_parsing_unit-t.cpp
test/tap/tests/unit/genai_thread_unit-t.cpp

📝 Walkthrough

Walkthrough

This PR: (1) consolidates v4.0 plugin gating under PROXYSQL40 and adds package-install verification to many CI package jobs (reusable workflow + verifier), (2) introduces a shared DNS cache and resolver workers plus PgSQL monitor integration and a dedicated DNS-cache thread, (3) extends ParserSQL/PG SET parsing to support multi-value RHS, (4) fixes session/collation/version detection and other robustness issues, and (5) updates tests and TAP framework (remove skip_all → BAIL_OUT).

Changes

Plugin Chassis Tier Unification & CI package verification

Layer / File(s)	Summary
CI package install verification steps `.github/workflows/CI-package-arm64-*.yml`	Inserted a post-build step that finds the first produced `.deb`/`.rpm` in `binaries/`, fails if none found, and runs `test/infra/control/verify-package-install.bash` to verify clean-install on target distro.
Reusable workflow + verifier script `.github/workflows/gh-actions-reusable/verify-package-install.yml`, `test/infra/control/verify-package-install.bash`	New reusable workflow and containerized verifier script that locate the package and run an install + basic runtime/ELF/plugin checks.
Docs and flag semantics `CLAUDE.md`, `include/cpp.h`, `lib/proxy_sqlite3_symbols.cpp`	Explain that `PROXYSQL40=1` is the v4.0 plugin chassis and remove standalone PROXYSQLGENAI carve-outs; update header comments and tokenizer hook notes.

Docker entrypoints & packaging smoke tests

Layer / File(s)	Summary
Debian entrypoint `docker/images/proxysql/deb-compliant/entrypoint/entrypoint.bash`	Gate libprotobuf and plugin inclusion on `PROXYSQL40=1`; add .deb plugin presence and ELF+symbol smoke tests.
RHEL entrypoint `docker/images/proxysql/rhel-compliant/entrypoint/entrypoint.bash`, `rpmmacros/.../proxysql.spec`	Gate protobuf-devel and plugin staging on `PROXYSQL40=1`; add RPM contents check and extraction/smoke-test for ELF and exported symbol.
SUSE entrypoint `docker/images/proxysql/suse-compliant/entrypoint/entrypoint.bash`	Same gating and RPM verification/smoke-test behavior for SUSE family images.

Shared DNS cache and resolver infrastructure

Layer / File(s)	Summary
DNS cache API & impl `include/DNS_Cache.hpp`, `lib/DNS_Cache.cpp`, `lib/Makefile`	New DNS_Cache types and implementation: thread-safe hostname→IP records, round-robin selection, add/add_if_not_exist/lookup/remove/clear/empty APIs, resolver worker thread(s) that run getaddrinfo, and inclusion in lib build.
MySQL monitor integration `include/MySQL_Monitor.hpp`, `lib/MySQL_Monitor.cpp`, `lib/MySQL_Thread.cpp`	Move DNS helpers into shared code, wire DNS cache counters, use atomic `.load()` for metrics, and keep Galera query logic version-aware.

PgSQL monitor, connection integration & thread lifecycle

Layer / File(s)	Summary
PgSQL headers & structs `include/PgSQL_Monitor.hpp`, `include/proxysql_structs.h`, `include/PgSQL_Connection.h`	Add DNS cache fields and APIs to PgSQL monitor, thread-local config variables, atomic counters, and `connect_start_DNS_lookup()` declaration.
PgSQL monitor implementation `lib/PgSQL_Monitor.cpp`, `lib/PgSQL_HostGroups_Manager.cpp`	Create dns_cache instance, implement dns_lookup/update/trigger, implement monitor_dns_cache() loop with resolver queue scaling, and trigger logic from hostgroup updates.
PgSQL connection & main thread `lib/PgSQL_Connection.cpp`, `src/main.cpp`	Seed cache on successful PG connection; use cached hostaddr in connect_start(); create/join dedicated DNS-cache pthread with atomic shutdown coordination.
PgSQL thread-local refresh & metrics `lib/PgSQL_Thread.cpp`	Add thread variables for TTL/refresh/queue size and expose PgSQL DNS cache counters in global status via atomic loads.

ParserSQL Multi-Value SET Statement Support

Layer / File(s)	Summary
Query processor multi-value SET parsing `lib/Query_Processor_ParserSQL.cpp`	Collect all RHS siblings for NODE_VAR_ASSIGNMENT into vector values (supports multi-value SET forms).
PgSQL session multi-value handling `lib/PgSQL_Session.cpp`	Allow multi-element value lists and collapse into a comma-separated string for validation/tracking.
SET parser tests `test/tap/tests/setparser_parsersql_test.cpp`	Add fixtures for MySQL/PG multi-value cases and improve diagnostics to show all expected vs actual values.

MySQL session robustness & collation handling

Layer / File(s)	Summary
MySQL_Session fixes `lib/MySQL_Session.cpp`	Avoid double-free in large-packet handler by checking pkt ownership/aliasing; use digest-based detection for SET STATEMENT ... FOR.
Collation/version detection `lib/MySQL_Variables.cpp`	Detect MariaDB via substring and safely parse server major version to decide support for collations with id >=255.
Galera query selection `lib/MySQL_Monitor.cpp`	Make Galera healthcheck query selection version-aware and MariaDB-specific across sync/async paths.

TAP test framework & test-suite updates

Layer / File(s)	Summary
TAP framework changes `test/tap/tap/tap.h`, `test/tap/tap/tap.cpp`	Remove `skip_all()` API; instruct callers to use `BAIL_OUT()` for loud failures.
Test routing & Makefile cleanup `test/tap/groups/groups.json`, `test/tap/tests/Makefile`, `test/deps/cluster_simulator/Makefile`	Register pgsql-test_dns_cache, add new regression mappings, remove deleted TAP binary, and switch build gates from PROXYSQLGENAI → PROXYSQL40.
New/updated tests `test/tap/tests/pgsql-test_dns_cache-t.cpp`, `test/tap/tests/reg_test_5639_`, `test/tap/tests/reg_test_5790_`, many unit tests doc lines	Add comprehensive PgSQL DNS cache TAP test, new regression tests for COM_STMT_EXECUTE and MariaDB collation 255, and update unit test headers to reference PROXYSQL40 where appropriate.
Tokenizer fixture tweaks `test/tap/tests/tokenizer_payloads/pgsql_regular_tokenizer_digests.hjson`	Adjust expected digest formatting to include spaces around certain operators.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

genai: formally delete or document the kept PROXYSQLGENAI build flag (Q9 follow-up) #5728 — same PROXYSQLGENAI → PROXYSQL40 consolidation and docs updates.
MySQL 9.x: charset_unsigned_int-t — SET NAMES … COLLATE utf8mb4_croatian_ci ignored, connection stays on default #5634 — related to collation/version detection and charset handling fixes.
CI: Add package install/start/stop/restart/uninstall/reinstall tests to ARM64 (and x86_64) package workflows #5665 — aligns with adding package-install verification in CI.

Possibly related PRs

sysown/proxysql#5803 — related CI checks-action change.
sysown/proxysql#5805 — related ParserSQL bump and multi-value SET parsing.
sysown/proxysql#5808 — overlapping large-packet double-free fix and regression test.

Suggested labels

Packages, Compiling

🐰 A rhyming rabbit's cheer:
Plugins converge, flags set right,
DNS cached, resolvers light,
Tests assert and builds now try,
Bugs are chased, and sleep is nigh,
Hoppity hop — code runs bright!

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/kill-proxysqlgenai-build-flag

gemini-code-assist

Code Review

This pull request refactors the DNS cache infrastructure into a shared class, enabling independent DNS caching for both MySQL and PostgreSQL monitors. It introduces several critical fixes for MySQL 8.4+ and MariaDB compatibility, specifically regarding Galera healthcheck queries, collation ID handling, and whitespace-tolerant SET STATEMENT parsing. Additionally, it resolves a double-free vulnerability in large packet handling and updates the test framework to favor explicit failures over silent skips. Review feedback identifies opportunities to improve the accuracy of build-time error logging, ensure robust return-value checking for network address conversions, and refine the staggering logic used in the resolver threads.

gemini-code-assist · 2026-05-23T18:56:54Z

+                if nm -D "${plugin_path}" 2>/dev/null | grep -q 'proxysql_plugin_descriptor_v1'; then
+                    echo "  OK   ${plugin} (exports proxysql_plugin_descriptor_v1)"
+                else
+                    echo "  WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2


The log message uses WARN but the logic sets ALL_OK=1, which causes the build to fail later at line 201. This should be labeled as FAIL to accurately reflect that it is a fatal error for the packaging process.

Suggested change

echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

echo " FAIL ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

gemini-code-assist · 2026-05-23T18:56:54Z

+                if nm -D "${plugin}" 2>/dev/null | grep -q 'proxysql_plugin_descriptor_v1'; then
+                    echo "  OK   ${plugin} (exports proxysql_plugin_descriptor_v1)"
+                else
+                    echo "  WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2


The log message uses WARN but the logic sets ALL_OK=1, which causes the build to fail later at line 179. This should be labeled as FAIL to accurately reflect that it is a fatal error for the packaging process.

Suggested change

echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

echo " FAIL ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

gemini-code-assist · 2026-05-23T18:56:54Z

+                if nm -D "${plugin}" 2>/dev/null | grep -q 'proxysql_plugin_descriptor_v1'; then
+                    echo "  OK   ${plugin} (exports proxysql_plugin_descriptor_v1)"
+                else
+                    echo "  WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2


The log message uses WARN but the logic sets ALL_OK=1, which causes the build to fail later at line 173. This should be labeled as FAIL to accurately reflect that it is a fatal error for the packaging process.

Suggested change

echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

echo " FAIL ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

gemini-code-assist · 2026-05-23T18:56:54Z

+			if (p->ai_family == AF_INET) {
+				struct sockaddr_in* ipv4 = (struct sockaddr_in*)p->ai_addr;
+				inet_ntop(p->ai_addr->sa_family, &ipv4->sin_addr, ip_addr, INET_ADDRSTRLEN);
+				ips.push_back(ip_addr);
+			}
+			else {
+				struct sockaddr_in6* ipv6 = (struct sockaddr_in6*)p->ai_addr;
+				inet_ntop(p->ai_addr->sa_family, &ipv6->sin6_addr, ip_addr, INET6_ADDRSTRLEN);
+				ips.push_back(ip_addr);
+			}


The return value of inet_ntop is not checked. If it fails, an uninitialized or stale ip_addr buffer will be pushed into the ips vector. Additionally, the else block should explicitly check for AF_INET6 and use sizeof(ip_addr) for robustness.

for (auto p = res; p != NULL; p = p->ai_next) { if (p->ai_family == AF_INET) { struct sockaddr_in* ipv4 = (struct sockaddr_in*)p->ai_addr; if (inet_ntop(p->ai_family, &ipv4->sin_addr, ip_addr, sizeof(ip_addr))) ips.push_back(ip_addr); } else if (p->ai_family == AF_INET6) { struct sockaddr_in6* ipv6 = (struct sockaddr_in6*)p->ai_addr; if (inet_ntop(p->ai_family, &ipv6->sin6_addr, ip_addr, sizeof(ip_addr))) ips.push_back(ip_addr); } }

gemini-code-assist · 2026-05-23T18:56:54Z

+	rc = pthread_rwlock_unlock(&rwlock_);
+
+	if (item_removed && counter_record_updated_)
+		counter_record_updated_->fetch_add(1, std::memory_order_relaxed);
+
+	assert(rc == 0);


The assertion for the return code of pthread_rwlock_unlock should be placed immediately after the call to ensure it refers to the correct operation.

Suggested change

rc = pthread_rwlock_unlock(&rwlock_);

if (item_removed && counter_record_updated_)

counter_record_updated_->fetch_add(1, std::memory_order_relaxed);

assert(rc == 0);

rc = pthread_rwlock_unlock(&rwlock_);

assert(rc == 0);

if (item_removed && counter_record_updated_)

counter_record_updated_->fetch_add(1, std::memory_order_relaxed);

gemini-code-assist · 2026-05-23T18:56:54Z

+					if (delay_us > 1000000 || delay_us <= 0)
+						delay_us = 10000;


The logic for capping delay_us is inconsistent. If delay_us exceeds 1 second, it is reset to 10ms, which causes a massive drop in the intended staggering delay. It should be capped at 1 second instead.

if (delay_us > 1000000) delay_us = 1000000; else if (delay_us <= 0) delay_us = 10000;

renecannao · 2026-05-23T18:58:00Z

Wrong base — will recreate from clean v3.0

Add verify-package-install.bash that installs the built .deb/.rpm on a clean Docker image of the target distro and runs smoke tests: - proxysql --version catches missing runtime shared library deps - Plugin .so file presence at /usr/lib/proxysql/ (auto-detected) - ELF header validation on installed binaries The script auto-detects whether the package ships plugins by inspecting dpkg -c / rpm -qpl, so it works for both v3.0 (no plugins) and v4.0 genai/mysqlx packages. Wire it into all 28 CI-package-arm64-* workflows (14 genai + 14 base) between the Build and Upload steps. If verification fails, the build job fails and the package never reaches the release draft.

renecannao · 2026-05-24T08:33:51Z

Superseded by #5815 — closing, all changes on fix/kill-proxysqlgenai-build-flag-v2

sonarqubecloud · 2026-05-24T08:45:16Z

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

renecannao and others added 30 commits May 17, 2026 14:34

Merge branch 'v3.0' into fix/5708-followup-multiline-for

3689fed

Merge branch 'v3.0' into fix/ci-asan-zombie-checks

482204d

Merge pull request #5795 from sysown/fix/remove-skip-all

7e984d6

test(framework): remove skip_all() — fail loudly on missing preconditions (Stage 2 follow-up to PR #5794)

Merge branch 'v3.0' into fix/5708-followup-multiline-for

9f9a75f

fixup: sort reg_test_5639_... entry alphabetically in groups.json

d8c32f1

CI-lint-groups-json requires keys in groups.json to be sorted.

Merge pull request #5803 from sysown/fix/ci-asan-zombie-checks

8a92a62

fix(ci): prevent zombie IN_PROGRESS checks in CI-unit-tests-asan-coverage

Merge pull request #5794 from sysown/fix/5708-followup-multiline-for

2ffadaf

fix(session): SET STATEMENT ... FOR detection tolerates any whitespace (PR #5708 follow-up)

Merge pull request #5805 from sysown/fix/parsersql-1.0.3-pg-set-fixes

642b232

fix: pgsql digest fixtures (typecast spacing) + ParserSQL 1.0.3 bump (PG SET TIME ZONE + multi-value lists)

Merge pull request #5806 from sysown/pgsql_dns_cache

fadffdb

Independent DNS cache for PgSQL (fixes #5768)

Merge pull request #5807 from sysown/fix/5790-mariadb-collation-255

af4aa17

fix(core): validate_charset must replace collation 255 on MariaDB backends (#5790)

Merge pull request #5808 from sysown/issue_5639

1d52499

fix(session): avoid double-free of stmt_meta pkt in LargePacket (#5639)

renecannao added 4 commits May 23, 2026 09:13

gemini-code-assist Bot reviewed May 23, 2026

View reviewed changes

renecannao closed this May 23, 2026

renecannao deleted the fix/kill-proxysqlgenai-build-flag branch May 23, 2026 18:58

renecannao reopened this May 24, 2026

renecannao closed this May 24, 2026

	echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2
	echo " FAIL ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

Uh oh!

Conversation

renecannao commented May 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changed

Entrypoints (rhel-compliant, suse-compliant, deb-compliant)

Automated package verification (added to all 3 entrypoints)

Build/test infrastructure

Documentation cleanup

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

renecannao commented May 23, 2026

Uh oh!

renecannao commented May 24, 2026

Uh oh!

sonarqubecloud Bot commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

renecannao commented May 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 23, 2026 •

edited

Loading