Skip to content

build: kill PROXYSQLGENAI flag, PROXYSQL40=1 builds all plugins#5814

Closed
renecannao wants to merge 35 commits into
v3.0from
fix/kill-proxysqlgenai-build-flag
Closed

build: kill PROXYSQLGENAI flag, PROXYSQL40=1 builds all plugins#5814
renecannao wants to merge 35 commits into
v3.0from
fix/kill-proxysqlgenai-build-flag

Conversation

@renecannao

@renecannao renecannao commented May 23, 2026

Copy link
Copy Markdown
Contributor

PROXYSQLGENAI was a leftover from before the genai plugin carve-out. All genai/MCP/AI/RAG code now lives in plugins/genai/ as a runtime-loadable .so. Since the plugin chassis (PROXYSQL40=1) already builds and loads the genai .so, a separate flag serves no purpose.

What changed

Entrypoints (rhel-compliant, suse-compliant, deb-compliant)

  • Build: pass PROXYSQL40=1 (not PROXYSQLGENAI=1) to make. All three now use the same EXTRA pattern.
  • Packaging: copy genai .so alongside mysqlx when PROXYSQL40=1 — no separate PROXYSQLGENAI gate.
  • protobuf-devel install: gate on PROXYSQL40=1 (needed for mysqlx plugin build).

Automated package verification (added to all 3 entrypoints)

  1. Package content check: rpm -qpl / dpkg -c confirms both .so files are present. Fails build if missing.
  2. ELF smoke test: extracts .so files, verifies they're valid ELF 64-bit shared objects, checks they export proxysql_plugin_descriptor_v1. Fails build on mismatch.

Build/test infrastructure

  • test/tap/tests/Makefile: changed 2 vec.o link gates from PROXYSQLGENAI to PROXYSQL40.
  • test/deps/cluster_simulator/Makefile: same change.
  • Updated all references in 14 genai test file comments.

Documentation cleanup

CLAUDE.md, include/cpp.h, lib/proxy_sqlite3_symbols.cpp, src/main.cpp, proxysql.spec, and test infra scripts updated.

Verification

  • PROXYSQL40=1 make packages now builds and packages both mysqlx and genai .so files.
  • PROXYSQLGENAI env var is no longer consulted anywhere in the build or packaging pipeline.

Summary by CodeRabbit

  • New Features

    • PostgreSQL DNS cache with background resolver workers; multi-value SET parsing for PostgreSQL.
  • Bug Fixes

    • Fixed large-packet double-free; corrected transaction isolation/read-only variable handling; improved MariaDB collation detection; enhanced SET STATEMENT whitespace tolerance.
  • Documentation

    • Consolidated v4.0 plugin packaging semantics; added plugin integrity validation in packaging flows.
  • Tests / CI

    • Added PgSQL DNS-cache and regression tests; new stmt-execute and collation tests; CI package install verification and louder test failure behavior.

Review Change Stack

renecannao and others added 30 commits May 17, 2026 14:34
PR #5708 added detection for MariaDB's `SET STATEMENT … FOR …` syntax
to keep ProxySQL from invoking `unable_to_parse_set_statement()` and
locking the session to a single hostgroup. The detection at
`lib/MySQL_Session.cpp` was:

    if (strncasecmp(nq.c_str(), "SET STATEMENT ", 14) == 0
        && strcasestr(nq.c_str(),  " FOR ")) {
        return false;
    }

`strcasestr(nq, " FOR ")` is a literal-substring search requiring a
space BOTH before and after the FOR keyword. A customer report (see
PR #5708 thread) reproduced the same error 9006 as before the fix by
typing the statement across multiple lines at the mysql client
prompt — the character after FOR is then a newline, not a space, so
the detection misses, the code falls through to the regular SET
parser, the parser doesn't know SET STATEMENT, and the hostgroup is
locked to the user's default. The next non-default-hostgroup query
fails with 9006.

Verified with a standalone strcasestr probe:

  q1 (customer multi-line):  strcasestr(" FOR ") matches ? NO
  q2 (existing TAP single-line):                            YES

The fix uses `dig` (the digest text) for the detection instead of
`nq` (the raw query). `dig` is whitespace-normalised by the digest
builder, so " FOR " always has a space on both sides regardless of
how the client formatted the query. This is consistent with the
prefix check 35 lines above, which already uses `dig` for keyword
detection.

== Test coverage ==

`test/tap/tests/test_set_statement_for-t.cpp` previously had `plan(5)`
and self-skipped on non-MariaDB backends via `skip_all()`. It was
also registered ONLY in MySQL-family TAP groups
(`legacy-g2`, `mysql84-g2`, etc.). The net effect was that the test
self-skipped on every CI run since it was added — never actually
executing an assertion in CI. The customer's bug regressed entirely
unnoticed.

This commit:

1. Removes the runtime MariaDB-detection helper and the `skip_all`
   call. A misregistered test should fail loudly, not silently skip;
   correct registration is enforced through the groups.json layer
   instead.
2. Moves the groups.json registration from the MySQL-only groups to
   `mariadb10-galera-g2`, which actually provides a MariaDB backend.
3. Adds four new test cases (assertions 6–13) covering the customer
   whitespace shapes that strcasestr (" FOR ") missed: newline,
   tab, CRLF, and multiple spaces between FOR and the inner
   statement. Each case verifies (a) the SET STATEMENT succeeds, and
   (b) a follow-up plain SELECT succeeds (i.e. no hostgroup lock
   persists — the customer's actual failure mode).

Verified locally on TAP group mariadb10-galera-g2 against MariaDB
10.11.9: 13/13 ok, RC=0.

A separate framework-cleanup follow-up should remove `skip_all()`
from `test/tap/tap/tap.{h,cpp}` entirely and convert the other four
callers (`test_mysqlx_e2e_*`, `test_mysqlx_route_drop_inflight-t`,
`test_mysqlx_sigterm_inflight-t`) to either fail loudly or move to
appropriately-gated TAP groups.
…ions

Stage 2 follow-up to the PR #5708 reproduction debug. The framework
helper test/tap/tap/tap.{h,cpp}::skip_all() prints "1..0 # skip ..."
and exit(0), which the harness reports as "ignored" — zero assertions
emitted, zero failures recorded, no notice that a test never ran.

While investigating the customer regression on PR #5708 we discovered
that test_set_statement_for-t had been silently skipping in CI since
the day it was added: it self-skipped on non-MariaDB backends, but was
registered only in MySQL-family TAP groups, so every run skipped.
PR #5794 fixes that one test. The principled fix is to remove the
mechanism that lets a test silently disappear.

This commit converts the remaining four skip_all() callers and deletes
the framework function:

  - test_mysqlx_e2e_routing-t       BAIL_OUT if MYSQLX_E2E_PROXYSQL_PORT
                                    is not set. The group env file
                                    test/tap/groups/mysqlx-e2e/env.sh
                                    sets it; missing at runtime means
                                    the harness did not source it.
  - test_mysqlx_e2e_handshake-t     BAIL_OUT if MYSQLX_E2E_HOST is not
                                    set (same env file, same rationale).
  - test_mysqlx_route_drop_inflight-t  BAIL_OUT if the X-Protocol
                                    listener is not reachable. The
                                    mysqlx-soak-g1 harness is supposed
                                    to start ProxySQL with the mysqlx
                                    plugin and provision the route; a
                                    missing listener is a setup bug.
  - test_mysqlx_sigterm_inflight-t  Deleted. The binary was a stub
                                    whose body was a TODO; the real
                                    sigterm scenario is implemented in
                                    test/scripts/mysqlx/behavioral_validation.py
                                    --scenario sigterm. With skip_all
                                    gone the stub had no remaining
                                    purpose. Also dropped from
                                    groups.json and the Makefile.

After this commit, the test/tap codebase has zero call sites of
skip_all(). The framework declaration and definition are deleted from
tap.h and tap.cpp; the doc references at the bottom of tap.cpp are
updated; a comment explains the rationale and points at BAIL_OUT() as
the replacement.

Verified locally: full TAP debug build succeeds, no undefined-reference
errors, lint_groups_json + check_groups --source both green.

Depends on PR #5794 (which removed the fifth caller in
test_set_statement_for-t.cpp). This PR is based on that branch; merging
in order keeps each merge a clean compile.
…verage is cancelled

The previous pattern opened a LouisBrunner check_run with status=in_progress
at job start and closed it via check_id at job end with if: always(). When
the runner is killed mid-job (concurrency:cancel-in-progress, OOM, infra
shutdown), the closing step never executes and the check_run is left in
in_progress forever, blocking the PR rollup.

Collapse to a single LouisBrunner call at the end: pass name: instead of
check_id: so the action creates the check_run with a terminal conclusion
in one shot. If the job is cancelled before the closing step runs, no
check_run is ever created (PR rollup shows "missing" — recoverable via
re-run) instead of "stuck" — unrecoverable without GitHub App auth.

Refs: stuck check_run 77381366265 on PR #5752 (sha cf0b1d3) from
2026-05-22 12:39 UTC. Same pattern exists across ~163 other CI-* workflows;
fixing only the workflow with the observed incident here. Sweep of the
remaining files belongs in a separate PR.
test(framework): remove skip_all() — fail loudly on missing preconditions (Stage 2 follow-up to PR #5794)
…value list

ParserSQL v1.0.2's PG SET parser had two bugs that were hidden in CI
until PR #5760 fixed `test/infra/control/ensure-infras.bash` to actually
dispatch `pre-proxysql.sql` hooks (previously only `.bash` hooks ran).
With the harness fix in place, `set_parser_algorithm_3-g1` for the first
time ran its tests under `pgsql-set_parser_algorithm=3`, surfacing:

* `pgsql-set_parameter_validation_test-t` — `SET search_path TO "$user",
  public` parsed as `search_path = "$user"`, dropping every value after
  the first comma.
* `pgsql-set_statement_test-t` — `SET TIME ZONE 'UTC'` parsed as
  `time = ZONE`, dropping the actual timezone value entirely.

## Upstream fix

ParserSQL v1.0.3 (ProxySQL/ParserSQL PR #38) is the upstream parser fix:

* PG `SET TIME ZONE <value>` is now recognised as the PG alias for
  `SET TimeZone = <value>`. The two-word keyword is detected via
  case-insensitive identifier lookahead since the tokenizer has no
  dedicated TK_TIME / TK_ZONE keyword.
* PG `parse()` no longer shares MySQL's comma loop. In PG, the comma
  after the first value is value-list continuation — each subsequent
  expression is appended as another child of the same NODE_VAR_ASSIGNMENT
  node, alongside the first RHS.

Full upstream test suite: 1223/1223 pass; 0 regressions.

## ProxySQL-side changes

* `deps/parsersql/parsersql-1.0.3.tar.gz` — new tarball, replaces 1.0.2.
* `deps/parsersql/parsersql` symlink retargeted to `parsersql-1.0.3`.
* `lib/Query_Processor_ParserSQL.cpp::walk_set_stmt` — the PG multi-value
  fix means a single NODE_VAR_ASSIGNMENT can have multiple RHS siblings.
  The adapter now iterates `target->next_sibling` chain instead of
  reading a single sibling, so the `map<string, vector<string>>` for
  `SET search_path TO 'a', 'b', 'c'` carries `["a", "b", "c"]` rather
  than `["a"]`. Fully backward-compatible for MySQL (always one RHS).
* `test/tap/tests/setparser_parsersql_test.cpp` — adds four permanent
  regression-test groups exercising the library + adapter end-to-end
  without a live backend:
    * `mysql_filtered_set` (7 cases) — verifies the filtered-SET
      variables parse cleanly so the downstream filter logic gets the
      input it expects.
    * `mysql_set_testing` (4 cases) — multi-variable comma-separated
      SET samples from `set_testing-240.csv`.
    * `pgsql_search_path` (8 cases) — single-value and multi-value
      shapes from `pgsql-set_parameter_validation_test-t`.
    * `pgsql_time_zone` (4 cases) — every `SET TIME ZONE ...` shape
      exercised by `pgsql-set_statement_test-t`.

The 4 new assertions for the `INTERVAL '7' HOUR` case capture today's
actual ParserSQL output (`timezone = [INTERVAL]`) — the expression
parser does not yet consume the full interval modifier chain. The test
reflects observable behaviour so it stays green; tightening to capture
the full interval value is a future ExpressionParser improvement.

## Test plan

- [x] Local rebuild of ProxySQL from the new 1.0.3 tarball: clean.
- [x] `./setparser_parsersql_test`: 46 new assertions, all pass.
- [x] No regressions in the 256 pre-existing assertions. The same 10
      pre-existing failures (sql_mode 23-25, Set1_v2 0-1) — unrelated
      to this work, present before the upgrade.
- [ ] CI on this branch: TAP groups `legacy-g2`, `mysql84-g2`, and
      `set_parser_algorithm_3-g1` are expected to flip green for the
      ParserSQL-related failures. Two non-ParserSQL bugs on the
      algorithm=3 ProxySQL session-dispatch path
      (`test_filtered_set_statements-t`, `set_testing-t`) remain to be
      investigated separately — the adapter now returns the correct
      maps, but downstream dispatch still misbehaves.

## Dependency

Should not be merged before ProxySQL/ParserSQL PR #38 is merged and a
v1.0.3 tag is cut. The vendored tarball in this commit was built from
that PR's branch head; once the upstream tag exists the tarball SHA
should be identical to a fresh `git archive --prefix=ParserSQL-1.0.3/
v1.0.3 | gzip` from the upstream tag.
…kends (#5790)

Commit 7c6f42b ("fix: MySQL 9.x charset handling and log_last_insert_id
test race") changed the version guard in lib/MySQL_Variables.cpp from
testing the first character of server_version to using atoi():

    -    if (charset >= 255 && myconn->mysql->server_version[0] != '8') {
    +    if (charset >= 255 && atoi(myconn->mysql->server_version) < 8) {

The new form correctly extends the "supports collation id 255"
treatment to MySQL 9.x and later, but it silently broke MariaDB:

    server_version                  old check    new check
    ----------------------------    ---------    ---------
    "5.7.x"                         replace      replace
    "8.0.x"                         skip         skip
    "9.x.y"                         replace[*]   skip
    "10.11.13-MariaDB"              replace      skip   <-- BUG
    "11.x.x-MariaDB"                replace      skip   <-- BUG
    [*] wrong for MySQL 9.x; that is the case the original commit fixed.

MariaDB does not implement utf8mb4_0900_ai_ci (collation id 255),
regardless of its numeric major version. With the regression, ProxySQL
no longer rewrites the unsupported collation before sending SET NAMES
to the MariaDB backend, and the backend rejects it with:

    ERROR 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_ci'

…breaking clients such as Dovecot that pick collation 255 from libmariadb's
charset table at connect time.

Fix: also treat MariaDB (detected by the "MariaDB" suffix in
server_version) as a backend that cannot accept collation id >= 255,
so handle_unknown_charset replacement applies there as well.

The regression is present on v3.0 HEAD (and on the v3.1.8 / v4.0.8
tags), but NOT in the v3.0.7 / v3.0.8 release tags themselves; users
hitting this in production are on post-v3.0.8 "v3.0-head" CI packages
(`3.0.8-NN-gHASH`).

Reported-by: @FoZo (#5790)
Also-confirmed-by: @akrus

Adds reg_test_5790-mariadb_collation_255-t, registered in
mariadb10-galera-g1, which:

  1. Skips when the backend is not MariaDB.
  2. Sets mysql-handle_unknown_charset=1 and
     mysql-default_collation_connection=utf8mb4_general_ci.
  3. Issues `SET NAMES utf8mb4 COLLATE utf8mb4_0900_ai_ci`.
  4. Runs a SELECT that forces backend SET NAMES via
     handler_again___status_CHANGING_CHARSET.
  5. Verifies the SELECT succeeds and that the backend's
     @@collation_connection is the configured default, not
     utf8mb4_0900_ai_ci.

This test fails without the fix (errno 1273 from the backend) and passes
with it.
…ehavior

PR #5755 (commit e6bef5c) fixed `process_pg_typecast()` in
`lib/pgsql_tokenizer.cpp` so that the trailing space after a PG
typecast is preserved unless the next non-space character is a
modifier `(` or array bracket `[`.  The commit message documented
this as an intentional behaviour change:

    "The new lookahead-conditional skip preserves the trailing
     space unless the very next non-space character is a modifier
     `(` or array bracket `[`."

Three test cases in `tokenizer_payloads/pgsql_regular_tokenizer_digests.hjson`
hardcoded expected digests against the OLD (buggy) tokenizer output,
where the space-eating bug glued the preceding literal to the
following operator.  With the fix in place those expectations
mismatch, making `pgsql-query_digests_stages_test-t` fail
deterministically in the legacy-g4 TAP group on v3.0 HEAD:

  * `::json  ->  'key'`  STAGE 1: actual `select ? -> ?`   vs old expected `select ?-> ?`
  * `::jsonb @>  ...`    STAGE 1-4: actual `select ? @> ?` vs old expected `select ?@> ?`
  * `::money +  ...`     STAGE 1: actual `select ? + ?`    vs old expected `select ?+ ?`

For `->` and `+`, only STAGE 1 changes — STAGE 2-4's existing
whitespace-collapse step still glues `?->` and `?+` for those
operators, so `s2`/`s3`/`s4` expectations stay as-is.  For `@>`,
none of the stages collapse the space, so all four expectations
move.

The new actual digest output is the more correct one — it mirrors
the existing MySQL tokenizer's behaviour for the same shape, where
`'literal' + 'literal'` already produces `? + ?` at STAGE 1.  This
commit only updates the fixture; no tokenizer or test source
changes.

Verified locally against the existing test binary (built from
v3.0 HEAD, includes the new tokenizer behaviour):

    ./pgsql-query_digests_stages_test-t  →  734/734 ok, RC=0

The 9 previously-failing assertions (#381, #385, #389-#392, #592,
#596, #600) all flip to ok with the updated fixture.
When handler_WCD_SS_MCQ_qpo_LargePacket() is reached via the
COM_STMT_EXECUTE path, MySQL_Protocol::get_binds_from_pkt() has
already aliased stmt_meta->pkt to pkt->ptr (see the FIXME at
MySQL_Protocol.cpp:2873). The subsequent
RequestEnd() -> Query_Info::end() free()s stmt_meta->pkt (the
fix introduced for bug #796 at MySQL_Session.cpp:455), and then
LargePacket's own l_free(pkt->size, pkt->ptr) frees the same
buffer a second time.

With jemalloc this silently corrupts the arena and surfaces later
as the SIGSEGV reported in #5639, with the crashing frame landing
on the l_free call site (+0x10e into LargePacket). Under ASAN the
double-free aborts deterministically with a backtrace whose two
free() chains match the report exactly.

Fix: capture whether CurrentQuery.stmt_meta aliases pkt before
calling RequestEnd() (which clears stmt_meta), and skip the
second free when the alias was present. For the COM_QUERY path
through LargePacket, stmt_meta is always NULL, so the previous
behaviour is preserved there.

Also add reg_test_5639_stmt_execute_max_allowed_packet-t which
exercises the STMT_EXECUTE > mysql-max_allowed_packet path and
asserts the session and process survive.
DNS_Cache, DNS_Cache_Record, DNS_Resolve_Data, the resolver worker, and
the validate_ip / get_connected_peer_ip_from_socket / debug_iplisttostring
helpers used to live inside MySQL_Monitor.{hpp,cpp}.  Move them to a new
DNS_Cache.{hpp,cpp} so the same plumbing can back an independent
PgSQL_Monitor cache.

DNS_Cache no longer references GloMyMon directly; counters are wired in
per-instance via set_counters().  monitor_dns_resolver_thread no longer
reads mysql_thread___resolution_family; the desired ai_family is now a
field on DNS_Resolve_Data set by the caller.  MySQL_Monitor wires both up
in its constructor and resolver loop respectively; behavior is identical.

No change to the MySQL DNS cache semantics.
Fixes the watchdog asserts under DNS degradation reported in #5768.

Symptom: with a PgSQL backend addressed by hostname, an unhealthy resolver
made every PQconnectStart() block ~5s on getaddrinfo inside libpq.  Under
load, all PgSQL_Thread workers spent the bulk of each loop iteration in
that synchronous DNS call, atomic_curtime fell behind the watchdog
threshold (poll_timeout + 1s), and after restart_on_missing_heartbeats
misses the watchdog asserts.  MySQL avoids this because mysql-monitor's
DNS cache lets MySQL_Connection::connect_start pass an IP directly to
libmariadb; PgSQL had no equivalent.

This change adds the equivalent for PgSQL, with state fully independent
of the MySQL cache:
  * PgSQL_Monitor gains its own DNS_Cache, force_dns_cache_update flag,
    and dns_cache_{queried,lookup_success,record_updated} counters.
    PgSQL_Monitor::dns_lookup / update_dns_cache_from_pgsql_conn /
    trigger_dns_cache_update mirror the MySQL_Monitor surface.
  * A new background thread (PgSQL_monitor_dns_cache_pthread) drives
    PgSQL_Monitor::monitor_dns_cache(), which walks
    PgHGM->pgsql_servers_to_monitor and feeds the shared
    DNSResolverWorker pool.  Launched alongside PgSQL_monitor_scheduler
    and lives independently of pgsql-monitor_enabled, matching MySQL.
  * PgSQL_Connection::connect_start now consults the cache.  On a hit
    it appends `hostaddr=<ip>` to the libpq conninfo while keeping
    `host=<hostname>`; libpq documents this combo specifically to skip
    name resolution while preserving TLS hostname verification.  On a
    miss / IP literal / disabled cache we fall back to the previous
    behavior (libpq does getaddrinfo).
  * Successful PgSQL connects seed the cache via
    update_dns_cache_from_pgsql_conn, so the second connect avoids
    getaddrinfo even before the resolver loop has visited the host.
  * The pgsql-monitor_local_dns_cache_ttl / _refresh_interval /
    _resolver_queue_maxsize variables (already accepted by Admin via
    PgSQL_Thread's variable registry) now actually propagate to
    pgsql_thread___monitor_local_dns_* globals.  The propagation block
    in PgSQL_Thread::refresh_variables was previously commented out,
    so the values never reached anywhere.
  * The runtime stats output (SQL3_GlobalStatus) now exposes the
    PgSQL_Monitor_dns_cache_* counters alongside the existing
    PgSQL_Monitor_* counters.

DNSResolverWorker is shared infrastructure added to DNS_Cache.hpp/cpp,
used here by PgSQL_Monitor.  MySQL_Monitor still uses its existing
ConsumerThread-based pool; nothing about the MySQL cache changes.
The names were defined in PgSQL_Thread::variables but sat inside a
commented-out block in pgsql_thread_variables_names, so Admin's
has_variable() returned false and SET pgsql-monitor_local_dns_* was
rejected with "Unknown global variable".  Move them out of the
/* ... */ so the variables actually show up in global_variables and
runtime_global_variables and can be tuned at runtime.

Verified end-to-end on a smoke run: SET ... LOAD PGSQL VARIABLES TO
RUNTIME succeeds, the resolver loop populates the cache, and a client
connect through the proxy increments dns_cache_queried /
dns_cache_lookup_success (cache hit, libpq skipped getaddrinfo).
Adds pgsql-test_dns_cache-t mirroring the existing MySQL test_dns_cache
test, exercising 16 scenarios:

  * pgsql-monitor_local_dns_cache_* variables propagate to runtime
  * IP literal does not touch the cache
  * Resolvable hostname populates dns_cache_record_updated
  * Client connect through the proxy increments dns_cache_queried and
    dns_cache_lookup_success on a cache hit
  * Hostname with leading/trailing whitespace is trimmed before
    resolution
  * Unresolvable hostname is queried but produces no record (cache miss
    bumps dns_cache_queried only)
  * Removing pgsql_servers rows drops the orphaned cache record
  * Cache disabled via refresh_interval=0 flatlines all counters
  * MySQL DNS cache counters do not move in response to pgsql-side
    activity (independence between the two caches)

While writing the test, step 4 (whitespace trim) caught a real bug:
PgSQL_Monitor::monitor_dns_cache was inserting hostnames verbatim from
the SQLite3_result into the resolver queue, so ' example.org ' was
handed to getaddrinfo() unchanged and failed.  MySQL_Monitor sidesteps
this by reading via SELECT trim(hostname) FROM monitor_internal.*; the
pgsql variant reads directly from pgsql_servers_to_monitor so it needs
to call trim() in C++.  Fix: trim before validate_ip / insert into
the hostnames set.  The lookup path already trimmed, so the bookkeeper
key now matches.

Registered the test in groups.json under legacy-g4 and the mysql-*
variant g4 groups, matching how the other pgsql-* tests are grouped.
Fixes for actionable findings on PR #5806:

DNS_Cache.cpp / hpp:
- lookup(): initialize *ip_count = 0 on the disabled-cache fast path so
  callers don't observe a stale count from a previous lookup
  (CodeRabbit).
- add_if_not_exist(): only bump counter_record_updated_ when an insert
  actually happened, and return that signal to callers.  Previously a
  no-op call inflated the cache-update stats (CodeRabbit).
- get_connected_peer_ip_from_socket(): drop the malloc, switch to a
  stack sockaddr_storage, and only assign result on AF_INET / AF_INET6
  with a successful inet_ntop.  Previously, on an unknown address
  family, we'd copy the uninitialized ip_addr buffer into result
  (Gemini + CodeRabbit).
- monitor_dns_resolver_thread(): cache_ttl is signed int and at the max
  configured TTL (7*24*3600*1000 ms) `1000 * cache_ttl` overflows
  before being added to monotonic_time().  Force the multiply into
  unsigned long long space with 1000ULL (Gemini).
- DNS_Cache::IP_ADDR::counter is now mutable so get_next_ip() (const)
  can __sync_fetch_and_add it without a const_cast (Gemini).

PgSQL_Monitor.cpp:
- monitor_dns_cache(): refresh pgsql_thread___monitor_local_dns_*
  before the loop, not just on the first version-bump.  Otherwise the
  resolver runs its first pass against zero-initialized TTL / refresh
  interval and starts in the wrong enabled/disabled state until a
  later config bump (CodeRabbit).

test/tap/tests/pgsql-test_dns_cache-t.cpp:
- Step 7 ("Cache disabled by refresh_interval=0") was inserting an IP
  literal (0.0.0.0), which bypasses DNS regardless of the cache state.
  Switched to a resolvable hostname (example.com) so a regression in
  the cache-disable path would actually move counters and fail the
  assertion.  Added a third assertion on dns_cache_lookup_success for
  symmetry.  Plan adjusted to 17.

Deliberately skipped:
- CR comment about mysql_thread___resolution_family change requiring
  a cache flush: pre-existing MySQL behavior, out of scope for this PR.
- CR comment about pthread_attr_init / setstacksize return checks in
  main.cpp: matches the existing un-checked pattern in
  MySQL_Monitor::run().  No need to diverge here.
- Gemini comment about persistent thread pool vs. per-iteration spawn:
  MySQL_Monitor::monitor_dns_cache() also recreates the resolver
  threads each loop iteration; the PgSQL variant follows the same
  pattern.
- Gemini comment about 2048-byte stack size: Thread::start takes ss
  in KB (lib/thread.cpp:46 `pthread_attr_setstacksize(&attr, ss*1024)`),
  so 2048 here means 2 MB, matching the MySQL resolver pool.
Address @gemini-code-assist review on PR #5807: the previous patch
unconditionally evaluated `atoi(sv)` in the disjunction
`(is_mariadb || atoi(sv) < 8)`. If `sv` was NULL, the short-circuit on
`is_mariadb` (false) would still reach `atoi(NULL)`, which is undefined
behavior per the C standard.

In practice `myconn->mysql->server_version` is non-NULL by the time
validate_charset() runs (CHANGING_CHARSET is only reachable after the
backend handshake has populated server_version), so this is defensive,
not a real crash path. Still essentially free, and keeps the guard
consistent with the existing NULL check on `strstr(sv, "MariaDB")`.

No behavior change for any reachable input.
The same mental defect fixed in lib/MySQL_Variables.cpp:validate_charset()
(commit before this one) exists at four other call sites in core. All
four use a first-character or 3-byte string-prefix comparison against
server_version and consequently misclassify either MariaDB or MySQL 9.x
(or both). Two are latent (MariaDB happens to land on the still-working
branch); the two in MySQL_Monitor.cpp are an active break on MySQL 9.x
Galera/PXC, because `INFORMATION_SCHEMA.GLOBAL_STATUS` was removed in
MySQL 8.4 and MySQL 9.x falls into the else branch that targets it.

Audited sites and fixes:

  lib/MySQL_Session.cpp:2607-2615 -- tx_isolation / tx_read_only
    Pre-fix: `strncmp(sv,"8",1)==0` chose `transaction_isolation` /
    `transaction_read_only`; everything else got `tx_isolation` /
    `tx_read_only`. MySQL 9.x fell to the else branch, but `tx_isolation`
    was removed in MySQL 8.4; on MySQL 9.x the SET fails. MariaDB also
    fell to the else branch -- coincidentally correct, since MariaDB
    keeps `tx_isolation` across all versions.
    Post-fix: use the modern name only for non-MariaDB MySQL >= 8.0
    (any major), so MySQL 8.0/8.4/9.x all get `transaction_isolation`
    and MariaDB 10/11 keeps `tx_isolation`.

  lib/MySQL_Monitor.cpp:749 (sync MON_GALERA setup)
  lib/MySQL_Monitor.cpp:2297 (async MON_GALERA dispatch)
    Pre-fix: `strncmp(sv,"5.7",3)==0 || strncmp(sv,"8",1)==0` picked
    performance_schema.global_status; everything else used
    INFORMATION_SCHEMA.GLOBAL_STATUS. MySQL 9.x fell to the else branch
    and queries INFORMATION_SCHEMA.GLOBAL_STATUS, which was removed in
    MySQL 8.4 -- monitor query fails on MySQL 9.x Galera/PXC. MariaDB
    Galera correctly fell to the else branch and uses
    information_schema, which is where MariaDB keeps GLOBAL_STATUS.
    Post-fix: use performance_schema for non-MariaDB MySQL >= 5.7 (so
    MySQL 5.7, 8.0, 8.4, 9.x all share that path); MariaDB Galera and
    legacy MySQL/PXC < 5.7 retain the information_schema path.

Pattern used at all sites (matching the existing fix in
validate_charset):

    const char *sv = ...->mysql->server_version;
    bool is_mariadb = (sv != NULL && strstr(sv, "MariaDB") != NULL);
    int sv_major = (sv != NULL) ? atoi(sv) : 0;

No dedicated test added for the Monitor / SET TRANSACTION paths in this
commit: the Monitor change is exercised by any mariadb10-galera or
mysql{8,9}x-* CI group with Galera/PXC backends, and the
transaction_isolation change by SET-isolation-level coverage in
existing TAP groups. The validate_charset regression test added in
the prior commit on this branch still covers the original #5790 path.
A subagent code review on top of the CodeRabbit + Gemini bot reviews
found five additional issues that the bots missed.  All five are fixed
here.

1. LOAD PGSQL SERVERS TO RUNTIME was waking MySQL's resolver, not
   PgSQL's.  PgSQL_HostGroups_Manager::commit() inherited a
   MySQL_Monitor::trigger_dns_cache_update() call from a copy of the
   MySQL HGM, so after a LOAD pgsql hostnames had to wait for the
   refresh interval (default 60 s) before becoming resolvable —
   defeating the warm-cache-before-traffic intent that motivated
   issue #5768.  Route to PgSQL_Monitor::trigger_dns_cache_update().
   Verified directly: with refresh_interval=60000 ms, inserting a
   pgsql_servers row + LOAD now produces record_updated=1 within
   ~4 s instead of waiting a full minute.

2. DNSResolverWorker could deadlock shutdown.  monitor_dns_resolver_thread
   isn't noexcept (vector growth, set_thread_name, etc. can throw); if
   it threw, the std::promise on DNS_Resolve_Data was never satisfied,
   the producer's future::get() blocked forever, the resolver loop
   never reached the shutdown check, and pthread_join in main hung.
   Wrap the call in try/catch and force-satisfy the promise with a
   failure result on any uncaught exit.

3. Test step 3 wasn't actually exercising the connect-path lookup it
   claimed.  hammer_proxy() routes through the proxy front-end, which
   in standard test infra resolves through the default hostgroup —
   typically configured with an IP literal (see e.g.
   test/infra/docker-pgsql16-single/conf/proxysql/config.sql).  IP
   literals bypass dns_lookup via validate_ip(), so the
   dns_cache_queried bumps observed came from the resolver loop alone,
   not from the connect path.  Step 3 would have passed even if
   PgSQL_Connection::connect_start_DNS_lookup() was wired up wrong.
   Install a pgsql_query_rules row from inside the test that routes
   testuser to hostgroup 999, so the proxy actually calls into the
   cache on client connects.  Clean up the rule in the test exit path.

4. Use-after-move in the _dns_cache_update debug log.  Both
   PgSQL_Monitor::_dns_cache_update and MySQL_Monitor::_dns_cache_update
   logged debug_iplisttostring(ip_address) after std::move'ing
   ip_address into add_if_not_exist().  The moved-from vector is in a
   "valid but unspecified" state — typically empty — so the log
   printed "IP:[]" instead of the IPs.  Capture the string before the
   move.  Same bug also fixed in the bookkeeper expired-record requeue
   path inside PgSQL_Monitor::monitor_dns_cache.

5. monitor_dns_cache had inconsistent indentation inside the
   `if (hostnames.empty()) { ... } else { ... }` block introduced when
   I wrapped the original goto-based control flow.  The braces were
   correct but the body sat at the wrong tab depth, making the scope
   visually misleading and prone to mis-edits.  Re-indented.

Findings deliberately skipped from the review (rationale in commit
body):

- "thread_local shared_ptr<DNS_Cache> keeps stale cache alive across
  GloPgMon recreate" — pre-existing MySQL pattern; GloPgMon is never
  destroyed mid-process today.
- "PgSQL_monitor_dns_cache_pthread can deref GloPgMon during a tight
  shutdown" — shutdown ordering in main.cpp sets GloPgMon->shutdown
  before joining the thread and deletes GloPgMon only afterwards,
  matching the MySQL pattern.
- "cache only seeded on ASYNC_CONNECT_SUCCESSFUL" — matches MySQL;
  pre-existing scope choice.
- "validate_ip rejects bracketed IPv6" — pre-existing in MySQL; no
  current configuration path produces bracketed IPv6 in pgsql_servers.
- "Test relies on outbound DNS for example.com" — matches the upstream
  MySQL test_dns_cache test which uses google.com / yahoo.com /
  amazon.com.  Established convention.
- "ai_family hard-coded to AF_UNSPEC for PgSQL" — already commented
  in code; follow-up if/when pgsql-resolution_family is added.
CI-lint-groups-json requires keys in groups.json to be sorted.
Follow-up to the cluster-2 ParserSQL 1.0.3 bump (this same PR),
addressing review feedback from coderabbit + gemini.

## 1. PgSQL_Session: accept multi-value SET vectors (coderabbit, Major)

The previous commit `5a48004af` taught the adapter to walk every RHS
sibling so PG multi-value lists (e.g. `SET search_path TO "$user",
public`) come through as a `std::vector<std::string>` of every value.
But `lib/PgSQL_Session.cpp:4516` still had `size() > 2` →
`unable_to_parse_set_statement()`, which would have caused 3+ value
PG SETs to lock the hostgroup instead of being processed.  Worse,
the downstream value-consumer at the same site used only
`it->second.front()`, silently dropping every value past the first
for valid 2-value SETs too.

Fix:
  * Drop the upper bound from the cardinality check (keep `< 1` so
    empty value lists still fail loudly).
  * Join every value in the vector into `value1` with the
    comma-separated form PG itself uses for these parameters
    (`a, b, c`). Single-value SETs hit the loop with `size()==1` and
    take just `it->second.front()`, preserving the previous behaviour.

The validator / hash-compare / tracker / param_status code below the
join is unchanged — it now receives the full joined value instead of
the truncated first element.

## 2. setparser_parsersql_test: drop INTERVAL test that locked in
   incomplete behaviour (subagent correctness review, Major)

`parsersql_pgsql_time_zone` previously asserted that
`SET TIME ZONE INTERVAL '7' HOUR` produces `timezone = "INTERVAL"`,
which is the *current* observable output of ParserSQL — the
expression parser does not yet consume the `'7' HOUR` modifier
chain.  Encoding that as expected output would flip the test red
when ParserSQL is later improved to capture the full interval
expression, conflating an improvement with a regression.

Removed the case; left a comment in the fixture explaining the gap
so a future contributor adds the case back together with the
ExpressionParser fix.

## 3. Diag-output cleanup (gemini, cosmetic)

  * `%lu` → `%zu` for `size_t` (portability — `size_t` is not always
    `unsigned long` on 32-bit / Windows).
  * Extract the per-vector " | "-separated diag-string builder into
    a single `join_values_for_diag()` helper used by both
    `TestParse` (now also dumps full vector on mismatch, where it
    previously printed just `vals[0]`) and `TestParsePgsql`.

## Verification

  * `./setparser_parsersql_test`: 260/270 ok; the 10 remaining
    not-ok are unchanged pre-existing failures (`sql_mode 23-25`,
    `Set1_v2 0-1`), nothing new from this commit.
  * `pgsql_time_zone` group: 6 assertions (3 cases × 2) — confirms
    the INTERVAL case is removed cleanly.
  * `pgsql_search_path` group: 16 assertions still all pass — the
    multi-value join in PgSQL_Session does not affect the parser-
    side test (which goes through the adapter, not the session).
The previous commit introduced local variable declarations with
initializers (`const char *sv = …`, `bool is_mariadb = …`, etc.)
directly inside `case MON_GALERA:` of the `switch` in
`init_async()` at lib/MySQL_Monitor.cpp:742. The sibling
`case MON_CLOSE_CONNECTION:` / `MON_CONNECT:` / `MON_AWS_AURORA:`
labels then jump past these initializations, which GCC rejects:

    MySQL_Monitor.cpp:779:14: error: jump to case label
    MySQL_Monitor.cpp:781:14: error: jump to case label
    MySQL_Monitor.cpp:783:14: error: jump to case label

Wrap the `#else` body of the MON_GALERA case in `{ ... }` so the
declarations live in their own scope and no sibling case can jump
over them. No behavior change.

The async path at lib/MySQL_Monitor.cpp:2299 was already inside its
own `{ ... }` block (added with the original code, not by this
patch), so it compiled fine and needs no change. The
`MySQL_Session.cpp` edits live inside `if / else if` chains with
their own braces, also unaffected.
fix(ci): prevent zombie IN_PROGRESS checks in CI-unit-tests-asan-coverage
fix(session): SET STATEMENT ... FOR detection tolerates any whitespace (PR #5708 follow-up)
fix: pgsql digest fixtures (typecast spacing) + ParserSQL 1.0.3 bump (PG SET TIME ZONE + multi-value lists)
Independent DNS cache for PgSQL (fixes #5768)
fix(core): validate_charset must replace collation 255 on MariaDB backends (#5790)
fix(session): avoid double-free of stmt_meta pkt in LargePacket (#5639)
validate_ip() treated inet_pton(...) != 0 as success, but inet_pton
returns 1 / 0 / -1; a -1 (e.g. unsupported family) would be reported
as a valid IP.  Compare against == 1 instead.

The DNS cache counters (dns_cache_queried / dns_cache_lookup_success /
dns_cache_record_updated) were declared as plain unsigned long long
on MySQL_Monitor.  Resolver workers update them atomically via
__sync_fetch_and_add, but p_update_metrics() reads them from a
different thread without a barrier, racing the writers.  Promote to
std::atomic<unsigned long long> and switch DNS_Cache to fetch_add /
the atomic pointer type so both ends are consistent.

Addresses CodeRabbit review feedback on PR #5806.
GloPgMon->shutdown was a plain bool, written by main.cpp's teardown
and read by the monitor / DNS resolver loops on worker threads — a
data race that can let the loops spin past shutdown or observe a
stale value indefinitely.  Promote to std::atomic<bool> and switch
the readers/writer to acquire/release ordering.

PgSQL_Monitor::dns_lookup() was also returning an empty string when
GloPgMon / dns_cache_thread weren't initialized yet, ignoring the
return_hostname_if_lookup_fails contract.  Move the fallback out of
the dns_cache_thread branch so startup / shutdown behave like a
normal cache miss for callers that asked for the hostname fallback.

Addresses CodeRabbit review feedback on PR #5806.
Two regressions surfaced by CodeRabbit on PR #5806:

* admin_exec() return values for the DNS-cache runtime config
  ('SET pgsql-monitor_local_dns_cache_ttl' / 'LOAD PGSQL VARIABLES
  TO RUNTIME') were silently dropped, so a failed setup cascaded
  into misleading downstream assertions.  BAIL_OUT immediately.

* The pgsql_query_rules INSERT hardcoded username='testuser', but
  hammer_proxy() connects as cl.pgsql_username.  When the infra's
  test user is anything else the rule never matches, hostgroup 999
  is bypassed, and the DNS-cache assertions exercise the wrong
  path.  Build the INSERT with cl.pgsql_username instead.
The admin connection used cl.host, which is the frontend endpoint.
In infrastructures where the admin interface is on a different host
the test would false-fail on the admin connect, unrelated to
collation behavior.  Use cl.admin_host.

Addresses CodeRabbit review feedback on PR #5807.
The previous commit promoted MySQL_Monitor::dns_cache_{queried,
lookup_success,record_updated} to std::atomic<unsigned long long>
but missed several call sites that pass them through implicit copy:

* lib/MySQL_Thread.cpp sprintf("%llu", GloMyMon->dns_cache_*) — varargs
  cannot copy std::atomic.
* lib/PgSQL_Thread.cpp sprintf("%llu", GloPgMon->dns_cache_*) — same,
  for the PgSQL counters.
* lib/MySQL_Monitor.cpp p_update_counter(..., GloMyMon->dns_cache_*) —
  works via implicit operator T(), but explicit .load() is clearer
  and matches the rest of the file.

Promote PgSQL_Monitor::dns_cache_* to std::atomic for the same race
that motivated the MySQL side (resolver workers increment, stats
readers and p_update_metrics read from another thread), and add
.load() at every read site so we don't depend on implicit conversion.

Verified with a local 'make -j32 debug' build in the integration
worktree.
PROXYSQLGENAI was a leftover from before the genai plugin carve-out.
All genai/MCP/AI/RAG code now lives in plugins/genai/ as a runtime
loadable .so.  Since the plugin chassis (PROXYSQL40=1) already
builds and loads the genai .so, a separate flag serves no purpose.

Entrypoint changes (all three -- rhel, suse, deb):
- Build: pass PROXYSQL40=1 (not PROXYSQLGENAI=1) to make.
- Packaging: copy genai .so alongside mysqlx when PROXYSQL40=1.
- protobuf-devel install: gate on PROXYSQL40=1 (needed for mysqlx).

Automated packaging verification added to all entrypoints:
  rpm -qpl / dpkg -c confirms both .so files are in the package.
ELF smoke test: validates .so is a valid shared object exporting
  proxysql_plugin_descriptor_v1 symbol.  Fails build on mismatch.

Test Makefiles: PROXYSQLGENAI -> PROXYSQL40 for the vec.o link gate
(test/tap/tests/Makefile and test/deps/cluster_simulator/Makefile).

Updated remaining PROXYSQLGENAI references to PROXYSQL40 in:
CLAUDE.md, include/cpp.h, lib/proxy_sqlite3_symbols.cpp,
src/main.cpp (3x comments), proxysql.spec,
test/infra/control/start-proxysql-isolated.bash,
test/infra/control/run-unit-tests-asan-coverage.bash,
and 14 genai test file comments.
@coderabbitai

coderabbitai Bot commented May 23, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c3ff051a-1b4e-41f3-99e6-dee4bdb91e56

📥 Commits

Reviewing files that changed from the base of the PR and between 9719d4a and ecc60a1.

⛔ Files ignored due to path filters (2)
  • deps/parsersql/parsersql-1.0.2.tar.gz is excluded by !**/*.gz
  • deps/parsersql/parsersql-1.0.3.tar.gz is excluded by !**/*.gz
📒 Files selected for processing (83)
  • .github/workflows/CI-package-arm64-almalinux10-genai.yml
  • .github/workflows/CI-package-arm64-almalinux10.yml
  • .github/workflows/CI-package-arm64-almalinux8-genai.yml
  • .github/workflows/CI-package-arm64-almalinux8.yml
  • .github/workflows/CI-package-arm64-almalinux9-genai.yml
  • .github/workflows/CI-package-arm64-almalinux9.yml
  • .github/workflows/CI-package-arm64-centos10-genai.yml
  • .github/workflows/CI-package-arm64-centos10.yml
  • .github/workflows/CI-package-arm64-centos9-genai.yml
  • .github/workflows/CI-package-arm64-centos9.yml
  • .github/workflows/CI-package-arm64-debian12-genai.yml
  • .github/workflows/CI-package-arm64-debian12.yml
  • .github/workflows/CI-package-arm64-debian13-genai.yml
  • .github/workflows/CI-package-arm64-debian13.yml
  • .github/workflows/CI-package-arm64-fedora42-genai.yml
  • .github/workflows/CI-package-arm64-fedora42.yml
  • .github/workflows/CI-package-arm64-fedora43-genai.yml
  • .github/workflows/CI-package-arm64-fedora43.yml
  • .github/workflows/CI-package-arm64-fedora44-genai.yml
  • .github/workflows/CI-package-arm64-fedora44.yml
  • .github/workflows/CI-package-arm64-opensuse15-genai.yml
  • .github/workflows/CI-package-arm64-opensuse15.yml
  • .github/workflows/CI-package-arm64-opensuse16-genai.yml
  • .github/workflows/CI-package-arm64-opensuse16.yml
  • .github/workflows/CI-package-arm64-ubuntu22-genai.yml
  • .github/workflows/CI-package-arm64-ubuntu22.yml
  • .github/workflows/CI-package-arm64-ubuntu24-genai.yml
  • .github/workflows/CI-package-arm64-ubuntu24.yml
  • .github/workflows/CI-unit-tests-asan-coverage.yml
  • .github/workflows/gh-actions-reusable/verify-package-install.yml
  • CLAUDE.md
  • deps/parsersql/parsersql
  • docker/images/proxysql/deb-compliant/entrypoint/entrypoint.bash
  • docker/images/proxysql/rhel-compliant/entrypoint/entrypoint.bash
  • docker/images/proxysql/rhel-compliant/rpmmacros/rpmbuild/SPECS/proxysql.spec
  • docker/images/proxysql/suse-compliant/entrypoint/entrypoint.bash
  • include/DNS_Cache.hpp
  • include/MySQL_Monitor.hpp
  • include/PgSQL_Connection.h
  • include/PgSQL_Monitor.hpp
  • include/cpp.h
  • include/proxysql_structs.h
  • lib/DNS_Cache.cpp
  • lib/Makefile
  • lib/MySQL_Monitor.cpp
  • lib/MySQL_Session.cpp
  • lib/MySQL_Thread.cpp
  • lib/MySQL_Variables.cpp
  • lib/PgSQL_Connection.cpp
  • lib/PgSQL_HostGroups_Manager.cpp
  • lib/PgSQL_Monitor.cpp
  • lib/PgSQL_Session.cpp
  • lib/PgSQL_Thread.cpp
  • lib/Query_Processor_ParserSQL.cpp
  • lib/proxy_sqlite3_symbols.cpp
  • src/main.cpp
  • test/deps/cluster_simulator/Makefile
  • test/infra/control/run-unit-tests-asan-coverage.bash
  • test/infra/control/start-proxysql-isolated.bash
  • test/infra/control/verify-package-install.bash
  • test/tap/groups/groups.json
  • test/tap/tap/tap.cpp
  • test/tap/tap/tap.h
  • test/tap/tests/Makefile
  • test/tap/tests/pgsql-test_dns_cache-t.cpp
  • test/tap/tests/reg_test_5639_stmt_execute_max_allowed_packet-t.cpp
  • test/tap/tests/reg_test_5790-mariadb_collation_255-t.cpp
  • test/tap/tests/setparser_parsersql_test.cpp
  • test/tap/tests/test_mysqlx_e2e_handshake-t.cpp
  • test/tap/tests/test_mysqlx_e2e_routing-t.cpp
  • test/tap/tests/test_mysqlx_route_drop_inflight-t.cpp
  • test/tap/tests/test_mysqlx_sigterm_inflight-t.cpp
  • test/tap/tests/test_set_statement_for-t.cpp
  • test/tap/tests/tokenizer_payloads/pgsql_regular_tokenizer_digests.hjson
  • test/tap/tests/unit/genai_discovery_schema_unit-t.cpp
  • test/tap/tests/unit/genai_fts_string_unit-t.cpp
  • test/tap/tests/unit/genai_llm_clients_unit-t.cpp
  • test/tap/tests/unit/genai_mcp_endpoint_unit-t.cpp
  • test/tap/tests/unit/genai_mcp_thread_unit-t.cpp
  • test/tap/tests/unit/genai_mysql_catalog_unit-t.cpp
  • test/tap/tests/unit/genai_query_handler_unit-t.cpp
  • test/tap/tests/unit/genai_stats_parsing_unit-t.cpp
  • test/tap/tests/unit/genai_thread_unit-t.cpp

📝 Walkthrough

Walkthrough

This PR: (1) consolidates v4.0 plugin gating under PROXYSQL40 and adds package-install verification to many CI package jobs (reusable workflow + verifier), (2) introduces a shared DNS cache and resolver workers plus PgSQL monitor integration and a dedicated DNS-cache thread, (3) extends ParserSQL/PG SET parsing to support multi-value RHS, (4) fixes session/collation/version detection and other robustness issues, and (5) updates tests and TAP framework (remove skip_all → BAIL_OUT).

Changes

Plugin Chassis Tier Unification & CI package verification

Layer / File(s) Summary
CI package install verification steps
.github/workflows/CI-package-arm64-*.yml
Inserted a post-build step that finds the first produced .deb/.rpm in binaries/, fails if none found, and runs test/infra/control/verify-package-install.bash to verify clean-install on target distro.
Reusable workflow + verifier script
.github/workflows/gh-actions-reusable/verify-package-install.yml, test/infra/control/verify-package-install.bash
New reusable workflow and containerized verifier script that locate the package and run an install + basic runtime/ELF/plugin checks.
Docs and flag semantics
CLAUDE.md, include/cpp.h, lib/proxy_sqlite3_symbols.cpp
Explain that PROXYSQL40=1 is the v4.0 plugin chassis and remove standalone PROXYSQLGENAI carve-outs; update header comments and tokenizer hook notes.

Docker entrypoints & packaging smoke tests

Layer / File(s) Summary
Debian entrypoint
docker/images/proxysql/deb-compliant/entrypoint/entrypoint.bash
Gate libprotobuf and plugin inclusion on PROXYSQL40=1; add .deb plugin presence and ELF+symbol smoke tests.
RHEL entrypoint
docker/images/proxysql/rhel-compliant/entrypoint/entrypoint.bash, rpmmacros/.../proxysql.spec
Gate protobuf-devel and plugin staging on PROXYSQL40=1; add RPM contents check and extraction/smoke-test for ELF and exported symbol.
SUSE entrypoint
docker/images/proxysql/suse-compliant/entrypoint/entrypoint.bash
Same gating and RPM verification/smoke-test behavior for SUSE family images.

Shared DNS cache and resolver infrastructure

Layer / File(s) Summary
DNS cache API & impl
include/DNS_Cache.hpp, lib/DNS_Cache.cpp, lib/Makefile
New DNS_Cache types and implementation: thread-safe hostname→IP records, round-robin selection, add/add_if_not_exist/lookup/remove/clear/empty APIs, resolver worker thread(s) that run getaddrinfo, and inclusion in lib build.
MySQL monitor integration
include/MySQL_Monitor.hpp, lib/MySQL_Monitor.cpp, lib/MySQL_Thread.cpp
Move DNS helpers into shared code, wire DNS cache counters, use atomic .load() for metrics, and keep Galera query logic version-aware.

PgSQL monitor, connection integration & thread lifecycle

Layer / File(s) Summary
PgSQL headers & structs
include/PgSQL_Monitor.hpp, include/proxysql_structs.h, include/PgSQL_Connection.h
Add DNS cache fields and APIs to PgSQL monitor, thread-local config variables, atomic counters, and connect_start_DNS_lookup() declaration.
PgSQL monitor implementation
lib/PgSQL_Monitor.cpp, lib/PgSQL_HostGroups_Manager.cpp
Create dns_cache instance, implement dns_lookup/update/trigger, implement monitor_dns_cache() loop with resolver queue scaling, and trigger logic from hostgroup updates.
PgSQL connection & main thread
lib/PgSQL_Connection.cpp, src/main.cpp
Seed cache on successful PG connection; use cached hostaddr in connect_start(); create/join dedicated DNS-cache pthread with atomic shutdown coordination.
PgSQL thread-local refresh & metrics
lib/PgSQL_Thread.cpp
Add thread variables for TTL/refresh/queue size and expose PgSQL DNS cache counters in global status via atomic loads.

ParserSQL Multi-Value SET Statement Support

Layer / File(s) Summary
Query processor multi-value SET parsing
lib/Query_Processor_ParserSQL.cpp
Collect all RHS siblings for NODE_VAR_ASSIGNMENT into vector values (supports multi-value SET forms).
PgSQL session multi-value handling
lib/PgSQL_Session.cpp
Allow multi-element value lists and collapse into a comma-separated string for validation/tracking.
SET parser tests
test/tap/tests/setparser_parsersql_test.cpp
Add fixtures for MySQL/PG multi-value cases and improve diagnostics to show all expected vs actual values.

MySQL session robustness & collation handling

Layer / File(s) Summary
MySQL_Session fixes
lib/MySQL_Session.cpp
Avoid double-free in large-packet handler by checking pkt ownership/aliasing; use digest-based detection for SET STATEMENT ... FOR.
Collation/version detection
lib/MySQL_Variables.cpp
Detect MariaDB via substring and safely parse server major version to decide support for collations with id >=255.
Galera query selection
lib/MySQL_Monitor.cpp
Make Galera healthcheck query selection version-aware and MariaDB-specific across sync/async paths.

TAP test framework & test-suite updates

Layer / File(s) Summary
TAP framework changes
test/tap/tap/tap.h, test/tap/tap/tap.cpp
Remove skip_all() API; instruct callers to use BAIL_OUT() for loud failures.
Test routing & Makefile cleanup
test/tap/groups/groups.json, test/tap/tests/Makefile, test/deps/cluster_simulator/Makefile
Register pgsql-test_dns_cache, add new regression mappings, remove deleted TAP binary, and switch build gates from PROXYSQLGENAI → PROXYSQL40.
New/updated tests
test/tap/tests/pgsql-test_dns_cache-t.cpp, test/tap/tests/reg_test_5639_*, test/tap/tests/reg_test_5790_*, many unit tests doc lines
Add comprehensive PgSQL DNS cache TAP test, new regression tests for COM_STMT_EXECUTE and MariaDB collation 255, and update unit test headers to reference PROXYSQL40 where appropriate.
Tokenizer fixture tweaks
test/tap/tests/tokenizer_payloads/pgsql_regular_tokenizer_digests.hjson
Adjust expected digest formatting to include spaces around certain operators.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

Suggested labels

Packages, Compiling

🐰 A rhyming rabbit's cheer:
Plugins converge, flags set right,
DNS cached, resolvers light,
Tests assert and builds now try,
Bugs are chased, and sleep is nigh,
Hoppity hop — code runs bright!

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/kill-proxysqlgenai-build-flag

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the DNS cache infrastructure into a shared class, enabling independent DNS caching for both MySQL and PostgreSQL monitors. It introduces several critical fixes for MySQL 8.4+ and MariaDB compatibility, specifically regarding Galera healthcheck queries, collation ID handling, and whitespace-tolerant SET STATEMENT parsing. Additionally, it resolves a double-free vulnerability in large packet handling and updates the test framework to favor explicit failures over silent skips. Review feedback identifies opportunities to improve the accuracy of build-time error logging, ensure robust return-value checking for network address conversions, and refine the staggering logic used in the resolver threads.

if nm -D "${plugin_path}" 2>/dev/null | grep -q 'proxysql_plugin_descriptor_v1'; then
echo " OK ${plugin} (exports proxysql_plugin_descriptor_v1)"
else
echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message uses WARN but the logic sets ALL_OK=1, which causes the build to fail later at line 201. This should be labeled as FAIL to accurately reflect that it is a fatal error for the packaging process.

Suggested change
echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2
echo " FAIL ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

if nm -D "${plugin}" 2>/dev/null | grep -q 'proxysql_plugin_descriptor_v1'; then
echo " OK ${plugin} (exports proxysql_plugin_descriptor_v1)"
else
echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message uses WARN but the logic sets ALL_OK=1, which causes the build to fail later at line 179. This should be labeled as FAIL to accurately reflect that it is a fatal error for the packaging process.

Suggested change
echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2
echo " FAIL ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

if nm -D "${plugin}" 2>/dev/null | grep -q 'proxysql_plugin_descriptor_v1'; then
echo " OK ${plugin} (exports proxysql_plugin_descriptor_v1)"
else
echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The log message uses WARN but the logic sets ALL_OK=1, which causes the build to fail later at line 173. This should be labeled as FAIL to accurately reflect that it is a fatal error for the packaging process.

Suggested change
echo " WARN ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2
echo " FAIL ${plugin} (no proxysql_plugin_descriptor_v1 symbol)" >&2

Comment thread lib/DNS_Cache.cpp
Comment on lines +98 to +107
if (p->ai_family == AF_INET) {
struct sockaddr_in* ipv4 = (struct sockaddr_in*)p->ai_addr;
inet_ntop(p->ai_addr->sa_family, &ipv4->sin_addr, ip_addr, INET_ADDRSTRLEN);
ips.push_back(ip_addr);
}
else {
struct sockaddr_in6* ipv6 = (struct sockaddr_in6*)p->ai_addr;
inet_ntop(p->ai_addr->sa_family, &ipv6->sin6_addr, ip_addr, INET6_ADDRSTRLEN);
ips.push_back(ip_addr);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return value of inet_ntop is not checked. If it fails, an uninitialized or stale ip_addr buffer will be pushed into the ips vector. Additionally, the else block should explicitly check for AF_INET6 and use sizeof(ip_addr) for robustness.

		for (auto p = res; p != NULL; p = p->ai_next) {
			if (p->ai_family == AF_INET) {
				struct sockaddr_in* ipv4 = (struct sockaddr_in*)p->ai_addr;
				if (inet_ntop(p->ai_family, &ipv4->sin_addr, ip_addr, sizeof(ip_addr)))
					ips.push_back(ip_addr);
			}
			else if (p->ai_family == AF_INET6) {
				struct sockaddr_in6* ipv6 = (struct sockaddr_in6*)p->ai_addr;
				if (inet_ntop(p->ai_family, &ipv6->sin6_addr, ip_addr, sizeof(ip_addr)))
					ips.push_back(ip_addr);
			}
		}

Comment thread lib/DNS_Cache.cpp
Comment on lines +314 to +319
rc = pthread_rwlock_unlock(&rwlock_);

if (item_removed && counter_record_updated_)
counter_record_updated_->fetch_add(1, std::memory_order_relaxed);

assert(rc == 0);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion for the return code of pthread_rwlock_unlock should be placed immediately after the call to ensure it refers to the correct operation.

Suggested change
rc = pthread_rwlock_unlock(&rwlock_);
if (item_removed && counter_record_updated_)
counter_record_updated_->fetch_add(1, std::memory_order_relaxed);
assert(rc == 0);
rc = pthread_rwlock_unlock(&rwlock_);
assert(rc == 0);
if (item_removed && counter_record_updated_)
counter_record_updated_->fetch_add(1, std::memory_order_relaxed);

Comment thread lib/PgSQL_Monitor.cpp
Comment on lines +2954 to +2955
if (delay_us > 1000000 || delay_us <= 0)
delay_us = 10000;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for capping delay_us is inconsistent. If delay_us exceeds 1 second, it is reset to 10ms, which causes a massive drop in the intended staggering delay. It should be capped at 1 second instead.

					if (delay_us > 1000000)
						delay_us = 1000000;
					else if (delay_us <= 0)
						delay_us = 10000;

@renecannao

Copy link
Copy Markdown
Contributor Author

Wrong base — will recreate from clean v3.0

@renecannao renecannao closed this May 23, 2026
@renecannao renecannao deleted the fix/kill-proxysqlgenai-build-flag branch May 23, 2026 18:58
Add verify-package-install.bash that installs the built .deb/.rpm on a
clean Docker image of the target distro and runs smoke tests:

- proxysql --version catches missing runtime shared library deps
- Plugin .so file presence at /usr/lib/proxysql/ (auto-detected)
- ELF header validation on installed binaries

The script auto-detects whether the package ships plugins by inspecting
dpkg -c / rpm -qpl, so it works for both v3.0 (no plugins) and v4.0
genai/mysqlx packages.

Wire it into all 28 CI-package-arm64-* workflows (14 genai + 14 base)
between the Build and Upload steps. If verification fails, the build
job fails and the package never reaches the release draft.
@renecannao renecannao reopened this May 24, 2026
@renecannao

Copy link
Copy Markdown
Contributor Author

Superseded by #5815 — closing, all changes on fix/kill-proxysqlgenai-build-flag-v2

@renecannao renecannao closed this May 24, 2026
@sonarqubecloud

Copy link
Copy Markdown

❌ The last analysis has failed.

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant