Skip to content

Add support for AWS Multi-AZ Instance blue/green deployment#5861

Draft
wazir-ahmed wants to merge 20 commits into
v3.0from
feature/aws-rds-monitor
Draft

Add support for AWS Multi-AZ Instance blue/green deployment#5861
wazir-ahmed wants to merge 20 commits into
v3.0from
feature/aws-rds-monitor

Conversation

@wazir-ahmed

@wazir-ahmed wazir-ahmed commented Jun 23, 2026

Copy link
Copy Markdown
Member

Summary

ProxySQL polls RDS blue/green switchover status and repoints traffic from the old primary ("blue") to the promoted copy ("green") automatically — without manual reconfiguration and without waiting on network DNS propagation.

Design (short version)

  1. The read_only monitor, on a periodic cycle (mysql-monitor_aws_rds_topology_discovery_interval), probes mysql.rds_topology; on a blue/green deployment it auto-generates a runtime entry in runtime_mysql_aws_rds_bgd_hostgroups table if mysql-aws_blue_green_deployment_auto_discovery is enabled.
  2. A BGD monitor worker per writer hostgroup polls the switchover status and runs the FSM:
    • At AVAILABLE, resolve the green IP and keep it warm
    • At POST_PROCESSING, pin blue→green IP, drain blue connections, and handle readers (repoint blue readers to green ones if configured, otherwise shun the unmapped ones)
    • At COMPLETED, purge blue DNS, re-arm shunned readers, and clean up.
  3. Moving the writer between hostgroups is left to the read_only monitor:
    • When AWS makes blue read-only it reads read_only=1 and demotes blue (writer→reader hostgroup)
    • Once the BGD monitor pin makes the blue name resolve to green it reads read_only=0 and promotes it back (writer hostgroup, retained as a reader under writer_is_also_reader).

Sequence Diagram

sequenceDiagram
    participant App as Application
    participant BGD as BGD Monitor
    participant RO as read_only monitor
    participant DNS as DNS Cache
    participant HG as HostGroupManager
    participant RDS as AWS RDS

    Note over BGD,RDS: AVAILABLE (poll every 250ms)
    BGD->>RDS: read mysql.rds_topology
    BGD->>BGD: build blue-green map, resolve green IP and keep it warm
    Note over BGD: if green_writer_hostgroup is configured, add green writer to it

    Note over BGD,RDS: INITIATED / IN_PROGRESS (poll every 100ms)
    BGD->>RDS: keep polling via the green IP directly
    RO->>RDS: read_only check on blue returns read_only=1
    RO->>HG: demote blue writer to the reader HG (writer_is_also_reader)

    Note over BGD,RDS: POST_PROCESSING (green is now the writer)
    BGD->>DNS: pin blue name to green IP (new connections go to green)
    BGD->>HG: drain blue free connections and flag in-flight ones to be killed on next use
    BGD->>HG: shun blue readers that have no green counterpart
    RO->>RDS: read_only check on blue name now resolves to green, returns read_only=0
    RO->>HG: promote blue name back to the writer HG (kept as reader)
    App->>HG: writes and reads now land on green

    Note over BGD,RDS: COMPLETED
    BGD->>HG: re-arm shunned readers (recover on next read)
    BGD->>HG: drain stale connections on the green hostgroups
    BGD->>DNS: purge blue DNS records so names re-resolve to the promoted instances
    BGD->>BGD: clear state, back to baseline polling
Loading

Testing

Validated against a live RDS blue/green deployment (1 writer + 2 readers, auto-discovery path):

  • FSM walks AVAILABLE → INITIATED → IN_PROGRESS → POST_PROCESSING → COMPLETED and repoints to green.
  • read_only monitor takes no action on the deployment's servers during the switchover.
  • After switchover complete, runtime_mysql_servers is in the desired state.

Summary by CodeRabbit

  • New Features
    • Added AWS RDS blue/green deployment workflows, including new BGD hostgroup tables, admin/config support, and monitoring/task automation.
    • Introduced DNS pinning to keep resolution stable during blue/green transitions, with a hostname pin/unpin API and enhanced resolver helpers.
  • Bug Fixes
    • Improved handling of shunned (including BGD) states across runtime/admin views, server selection, and hostgroup reporting.
    • Refined connection reuse/reset behavior by honoring backend “healthy” state, and improved stale/missing topology handling to avoid disruptive state changes.

- Add `mysql_aws_rds_hostgroups` and `runtime_mysql_aws_rds_hostgroups`;
  register in the admin and config DBs and in the disk<->memory sync table set.
- Wire config<->runtime in `MySQL_HostGroups_Manager`: in-memory table,
  incoming staging, table generation, commit, and save/dump handling, with
  nullable `green_writer_hostgroup` / `green_reader_hostgroup` support.
- Implement `LOAD/SAVE MYSQL SERVERS` paths; save-to-memory excludes
  `auto_generated` rows so only user configuration is persisted.
- Add `proxysql.cnf` read/write and the `CHECKSUM MYSQL RDS HOSTGROUPS` command.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
- Add an RDS monitor thread (dispatcher + one worker per writer hostgroup),
  mirroring the Aurora monitor and driven by `AWS_RDS_Hosts_resultset`.
- Each worker pings a host and runs a persistent two-state probe of
  `mysql.rds_topology`: check existence, then fetch metadata.
- Detect role/status columns by name and branch on shape (Single/Multi-AZ
  Instance vs Multi-AZ Cluster); handlers currently only log.
- Handle a missing topology table gracefully; add `MON_AWS_RDS` task type.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
- The read_only monitor flags a detected blue/green deployment
  and the RDS monitor picks it up to drive the switchover,
  gated by a new opt-in config `mysql-aws_blue_green_deployment_auto_discovery`.
- Scope the dedicated RDS monitor to blue/green deployments only, renaming
  AWS_RDS -> AWS_RDS_BGD throughout.
- Keep Multi-AZ Cluster auto-discovery unchanged in the read_only monitor.
- Split the hostgroup schema to distinguish user-defined from auto-generated entries.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
- Add support for IP pinning in `DNS_Cache`; Pinned IP entries override
  resolved IPs in during `lookup()` until they are unpinned.
- Add a helper - `dns_resolve()` and reuse it in `monitor_dns_resolver_thread()`.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
- Add a new helper `set_server_shun()` to shun/unshun a server that takes
  shun auto-recovery into account. It allows shunning a server while
  preventing the server-selection path from automatically unshunning it
  after shun_recovery_time.
- Make `IsServerOffline()` / `async_send_simple_command()` act based on
  `shunned_and_kill_all_connections` alone, without considering `shunned_automatic`.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Implement `handle_aws_rds_bgd()` as a state machine over the `mysql.rds_topology`
switchover status:

- `AVAILABLE`: Build the `blue<->green` host map by hostname.
- `SWITCHOVER_INITIATED` / `IN_PROGRESS`: Resolve the green IPs and keep them warm.
- `SWITCHOVER_IN_POST_PROCESSING`: Pin each blue host onto its green IP in the DNS
  cache and drop the blue free connections; shun blue readers that have no green
  counterpart, or enforce writer_is_also_reader, to funnel reads to the green writer.
- `SWITCHOVER_COMPLETED`: Undo the above and let server selection recover the readers
  once their connections drain.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
…ver status

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
@wazir-ahmed wazir-ahmed requested a review from renecannao June 23, 2026 10:49
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bd72a400-38a2-4a4e-aa0f-ef8064711a37

📥 Commits

Reviewing files that changed from the base of the PR and between 4eac77d and 085785d.

📒 Files selected for processing (4)
  • include/MySQL_HostGroups_Manager.h
  • include/MySQL_Monitor.hpp
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
🚧 Files skipped from review as they are similar to previous changes (3)
  • include/MySQL_HostGroups_Manager.h
  • include/MySQL_Monitor.hpp
  • lib/MySQL_Monitor.cpp
📜 Recent review details
⏰ Context from checks skipped due to timeout. (5)
  • GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap-mysqlx)
  • GitHub Check: run / trigger
🧰 Additional context used
📓 Path-based instructions (2)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{cpp,h,hpp}: Feature tiers are controlled via flags: PROXYSQL31=1 for v3.1.x features (FFTO, TSDB), PROXYSQL40=1 for v4.0.x features (plugin loader). PROXYSQL40=1 implies both PROXYSQL31=1 and PROXYSQLFFTO=1 and PROXYSQLTSDB=1. Use conditional compilation with #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Class names use PascalCase with protocol prefixes: MySQL_, PgSQL_, or ProxySQL_ (e.g., MySQL_Protocol, PgSQL_Session).
Member variables use snake_case.
Constants and macros use UPPER_SNAKE_CASE.
Use C++17; conditional compilation for feature tiers via #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Use RAII for resource management; use jemalloc for memory allocation.
Use pthread mutexes for synchronization; use std::atomic<> for counters.

Files:

  • lib/MySQL_HostGroups_Manager.cpp
{lib,src}/**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

GenAI/MCP/RAG/LLM features live entirely in plugins/genai/ and load as a .so at runtime via dlopen. Do not guard with PROXYSQLGENAI in core code — that flag no longer guards any core code as of Step 7 of the GenAI plugin carve-out.

Files:

  • lib/MySQL_HostGroups_Manager.cpp
🔇 Additional comments (2)
lib/MySQL_HostGroups_Manager.cpp (2)

3872-3879: LGTM!


3888-3893: 🩺 Stability & Availability

No issue here. SHUNNED_AWS_BGD is only set on the shun path, which already marks the server shunned_automatic=true; this branch correctly keeps it shunned and defers recovery via time_last_detected_error.

			> Likely an incorrect or invalid review comment.

📝 Walkthrough

Walkthrough

Adds AWS RDS blue/green deployment monitoring, topology discovery, hostgroup schema and persistence support, DNS pinning, and connection-health handling for shunned servers.

Changes

AWS RDS Blue/Green Deployment Monitoring

Layer / File(s) Summary
DNS cache and result helpers
include/DNS_Cache.hpp, lib/DNS_Cache.cpp, include/proxysql_utils.h, lib/proxysql_utils.cpp
Adds DNS resolution helpers, pin/unpin state, pin-aware lookup metadata, and MySQL result formatting and dumping helpers.
BGD data model and public headers
include/ProxySQL_Admin_Tables_Definitions.h, include/MySQL_HostGroups_Manager.h, include/MySQL_Monitor.hpp, include/MySQL_Thread.h, include/proxysql_structs.h, include/ServerSelection.h, include/mysql_connection.h, include/proxysql_utils.h
Adds BGD table DDL, runtime status enums, monitor topology structs and task enums, the auto-discovery thread variable, and new connection health fields and helpers.
Admin registration and config persistence
lib/Admin_Bootstrap.cpp, lib/Admin_Handler.cpp, lib/ProxySQL_Admin.cpp, lib/ProxySQL_Config.cpp, lib/ProxySQL_Cluster.cpp
Registers the new BGD tables in admin startup, adds checksum support, and writes, reads, and normalizes the BGD hostgroup section through config, runtime, and cluster persistence paths.
Hostgroup manager BGD runtime
lib/MySQL_HostGroups_Manager.cpp, lib/Base_HostGroups_Manager.cpp
Adds BGD hostgroup materialization, runtime resultset refresh, shun/drain helpers, and status-string mapping in the hostgroup manager and related server-status reporting paths.
Topology parsing and BGD monitor FSM
lib/MySQL_Monitor.cpp
Splits AWS RDS discovery into standalone topology tasks, parses mysql.rds_topology, dispatches Multi-AZ or BGD handling, and runs the BGD monitor worker with switchover state transitions and stale-IP timeout suppression.
Connection health and shun handling
lib/mysql_connection.cpp, lib/mysql_data_stream.cpp, lib/MySQL_Session.cpp, lib/Base_Session.cpp, lib/MySQL_Thread.cpp, lib/MySrvConnList.cpp
Adds the healthy connection flag, marks pooled connections unhealthy during BGD transitions, and updates reset/offline handling to respect unhealthy or shunned-AWS-BGD connections.

Estimated code review effort: 5 (Critical) | ~120 minutes

Possibly related PRs

  • sysown/proxysql#5093: Shares lib/MySQL_Session.cpp error-handling paths that this PR also updates with new connection-state gating.
  • sysown/proxysql#5806: Touches the shared DNS_Cache implementation that this PR extends with pinning.
  • sysown/proxysql#5831: Also refactors AWS RDS topology discovery in MySQL_Monitor.

Suggested labels: status/triage

Poem

🐇 I hopped through DNS and found a pin,
Green lights spun up, blue lights dimmed within.
A BGD burrow, tidy and new,
With shunned servers snoozing, then waking too.
My ears are twitching, the switchover sings,
While monitor rabbits juggle many things.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.52% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the PR’s main change: adding AWS Multi-AZ blue/green deployment support.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/aws-rds-monitor

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces comprehensive support for AWS RDS Blue/Green Deployments (BGD) in ProxySQL, adding new configuration tables, a dedicated monitor thread, a status-driven switchover FSM, and DNS cache pinning capabilities. Key feedback points out critical issues: a SQLite constraint violation in the admin table definition for nullable green hostgroups, potential undefined behavior or crashes in DNS resolution when handling unexpected address families or unchecked inet_ntop results, and a potential crash on thread exit due to uninitialized memory allocation with malloc instead of calloc.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +240 to +248
#define ADMIN_SQLITE_TABLE_MYSQL_AWS_RDS_BGD_HOSTGROUPS "CREATE TABLE mysql_aws_rds_bgd_hostgroups (writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) , " \
"green_writer_hostgroup INT NOT NULL CHECK (green_writer_hostgroup>=0) , " \
"green_reader_hostgroup INT NOT NULL CHECK (green_reader_hostgroup>=0) , " \
"active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 , writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 , " \
"domain_name VARCHAR NOT NULL CHECK (domain_name = '' OR SUBSTR(domain_name,1,1) = '.') , " \
"check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 , " \
"check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 , " \
"autopurge_missing_checks INT NOT NULL CHECK (autopurge_missing_checks >= 0 AND autopurge_missing_checks <= 100) DEFAULT 0 , " \
"comment VARCHAR NOT NULL DEFAULT '' , UNIQUE (reader_hostgroup))"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The mysql_aws_rds_bgd_hostgroups table in the Admin database defines green_writer_hostgroup and green_reader_hostgroup as NOT NULL. However, for auto-discovery deployments, these columns are expected to be NULL. If a user omits them in the configuration file, loading the configuration will fail with a SQLite constraint violation. They should be defined as DEFAULT NULL to match the runtime table definition.

Suggested change
#define ADMIN_SQLITE_TABLE_MYSQL_AWS_RDS_BGD_HOSTGROUPS "CREATE TABLE mysql_aws_rds_bgd_hostgroups (writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) , " \
"green_writer_hostgroup INT NOT NULL CHECK (green_writer_hostgroup>=0) , " \
"green_reader_hostgroup INT NOT NULL CHECK (green_reader_hostgroup>=0) , " \
"active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 , writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 , " \
"domain_name VARCHAR NOT NULL CHECK (domain_name = '' OR SUBSTR(domain_name,1,1) = '.') , " \
"check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 , " \
"check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 , " \
"autopurge_missing_checks INT NOT NULL CHECK (autopurge_missing_checks >= 0 AND autopurge_missing_checks <= 100) DEFAULT 0 , " \
"comment VARCHAR NOT NULL DEFAULT '' , UNIQUE (reader_hostgroup))"
#define ADMIN_SQLITE_TABLE_MYSQL_AWS_RDS_BGD_HOSTGROUPS "CREATE TABLE mysql_aws_rds_bgd_hostgroups (writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>0) , " \
"green_writer_hostgroup INT DEFAULT NULL CHECK (green_writer_hostgroup IS NULL OR green_writer_hostgroup>=0) , " \
"green_reader_hostgroup INT DEFAULT NULL CHECK (green_reader_hostgroup IS NULL OR green_reader_hostgroup>=0) , " \
"active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1 , writer_is_also_reader INT CHECK (writer_is_also_reader IN (0,1)) NOT NULL DEFAULT 0 , " \
"domain_name VARCHAR NOT NULL CHECK (domain_name = '' OR SUBSTR(domain_name,1,1) = '.') , " \
"check_interval_ms INT NOT NULL CHECK (check_interval_ms >= 100 AND check_interval_ms <= 600000) DEFAULT 1000 , " \
"check_timeout_ms INT NOT NULL CHECK (check_timeout_ms >= 80 AND check_timeout_ms <= 3000) DEFAULT 800 , " \
"autopurge_missing_checks INT NOT NULL CHECK (autopurge_missing_checks >= 0 AND autopurge_missing_checks <= 100) DEFAULT 0 , " \
"comment VARCHAR NOT NULL DEFAULT '' , UNIQUE (reader_hostgroup))"

Comment thread lib/DNS_Cache.cpp
Comment on lines +93 to +105
char ip_addr[INET6_ADDRSTRLEN];
for (auto p = res; p != NULL; p = p->ai_next) {
if (p->ai_family == AF_INET) {
struct sockaddr_in* ipv4 = (struct sockaddr_in*)p->ai_addr;
inet_ntop(p->ai_addr->sa_family, &ipv4->sin_addr, ip_addr, INET_ADDRSTRLEN);
ips.push_back(ip_addr);
}
else {
struct sockaddr_in6* ipv6 = (struct sockaddr_in6*)p->ai_addr;
inet_ntop(p->ai_addr->sa_family, &ipv6->sin6_addr, ip_addr, INET6_ADDRSTRLEN);
ips.push_back(ip_addr);
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If getaddrinfo returns an address family other than AF_INET or AF_INET6 (which is possible if ai_family is AF_UNSPEC), executing the else block will cast the address to sockaddr_in6 and call inet_ntop with AF_INET6, leading to undefined behavior or a crash. Additionally, the return value of inet_ntop should be checked to ensure we do not push uninitialized/garbage strings to ips on failure.

	char ip_addr[INET6_ADDRSTRLEN];
	for (auto p = res; p != NULL; p = p->ai_next) {
		if (p->ai_family == AF_INET) {
			struct sockaddr_in* ipv4 = (struct sockaddr_in*)p->ai_addr;
			if (inet_ntop(AF_INET, &ipv4->sin_addr, ip_addr, sizeof(ip_addr))) {
				ips.push_back(ip_addr);
			}
		}
		else if (p->ai_family == AF_INET6) {
			struct sockaddr_in6* ipv6 = (struct sockaddr_in6*)p->ai_addr;
			if (inet_ntop(AF_INET6, &ipv6->sin6_addr, ip_addr, sizeof(ip_addr))) {
				ips.push_back(ip_addr);
			}
		}
	}

Comment thread lib/MySQL_Monitor.cpp
}
}

host_def_t *hpa = (host_def_t *)malloc(sizeof(host_def_t)*(num_hosts ? num_hosts : 1));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using malloc allocates uninitialized memory. If cur_host_idx ends up being less than num_hosts (e.g., due to any mismatch or early exit), the remaining elements in hpa will contain garbage pointers. When the thread exits, calling free(hpa[i].host) on these uninitialized pointers will cause a crash. Using calloc ensures the memory is zero-initialized, making free(NULL) safe.

	host_def_t *hpa = (host_def_t *)calloc(num_hosts ? num_hosts : 1, sizeof(host_def_t));

@wazir-ahmed

Copy link
Copy Markdown
Member Author

@nielsvogell PTAL

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (2)
lib/DNS_Cache.cpp (1)

87-108: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Wrap the getaddrinfo() result in RAII.

ips.push_back() can throw after getaddrinfo() succeeds, bypassing freeaddrinfo(res). A unique_ptr with freeaddrinfo as deleter keeps the helper exception-safe.

♻️ Proposed refactor
+ `#include` <memory>
+
-	struct addrinfo hints, *res = NULL;
+	struct addrinfo hints;
+	struct addrinfo* res_raw = NULL;
 	memset(&hints, 0, sizeof(hints));
@@
-	int gai_rc = getaddrinfo(hostname.c_str(), NULL, &hints, &res);
+	int gai_rc = getaddrinfo(hostname.c_str(), NULL, &hints, &res_raw);
+	std::unique_ptr<addrinfo, decltype(&freeaddrinfo)> res(res_raw, freeaddrinfo);
 	if (gai_rc != 0 || !res) {
 		proxy_error("An error occurred while resolving hostname: %s [%d]\n", hostname.c_str(), gai_rc);
 		return ips;
 	}
@@
-	for (auto p = res; p != NULL; p = p->ai_next) {
+	for (auto p = res.get(); p != NULL; p = p->ai_next) {
@@
-	freeaddrinfo(res);
 	return ips;

As per coding guidelines, Use RAII for resource management; use jemalloc for memory allocation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/DNS_Cache.cpp` around lines 87 - 108, Wrap the addrinfo pointer result
from getaddrinfo() in a unique_ptr with freeaddrinfo as the deleter to ensure
exception safety. Currently, if ips.push_back() throws an exception during the
loop that processes the addrinfo results, the freeaddrinfo(res) call is never
executed, causing a resource leak. Replace the raw res pointer with a unique_ptr
that automatically frees the memory when going out of scope, even if an
exception is thrown during the for loop iteration or ips.push_back() operations.

Source: Coding guidelines

include/MySQL_Monitor.hpp (1)

436-436: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Clarify green_ip_ttl semantics.

The comment states 0 => DNS_Cache-sourced, but it's unclear what a non-zero value represents. Based on the field name, it likely represents an expiry timestamp (e.g., monotonic_time() + ttl_ms), but the comment should explicitly state this to avoid ambiguity.

📝 Suggested clarification
-		unsigned long long green_ip_ttl = 0;  ///< expiry of a green_ip if it is resolved by BGD thread; 0 => DNS_Cache-sourced
+		unsigned long long green_ip_ttl = 0;  ///< expiry timestamp (monotonic_time() + ttl_ms) when green_ip was resolved by BGD thread; 0 => DNS_Cache-sourced (no expiry)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/MySQL_Monitor.hpp` at line 436, The comment for the green_ip_ttl
member variable in MySQL_Monitor.hpp is incomplete and creates ambiguity about
what non-zero values represent. Clarify the comment for green_ip_ttl to
explicitly state that a value of 0 indicates the entry is sourced from
DNS_Cache, while a non-zero value represents an expiry timestamp (specify the
format, such as monotonic_time() + ttl_ms) for entries resolved by the BGD
thread. This will make the semantic meaning of both zero and non-zero values
clear to future maintainers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@include/proxysql_structs.h`:
- Line 1391: The thread-local variable
`mysql_thread___aws_blue_green_deployment_auto_discovery` is declared with type
`int` but should be declared as `bool` to match its definition in MySQL_Thread.h
and its registration in the boolean variables map in MySQL_Thread.cpp where it's
handled with REFRESH_VARIABLE_BOOL semantics. Change the type declaration of
`mysql_thread___aws_blue_green_deployment_auto_discovery` from `int` to `bool`
in the proxysql_structs.h file, and apply the same fix to the other instance
mentioned at line 1733.

In `@lib/MySQL_HostGroups_Manager.cpp`:
- Around line 7094-7105: The code does not verify that the INSERT statement
actually succeeded before setting added to true. Capture the return value from
mydb->execute(ins.c_str()) and check the number of affected rows to confirm the
insert was successful. Only set added = true after verifying that at least one
row was actually inserted into the mysql_aws_rds_bgd_hostgroups table, so that
monitor refresh is only triggered when a new BGD entry is genuinely created and
not when the insert is ignored due to constraint violations.

In `@lib/MySQL_Monitor.cpp`:
- Around line 6784-6789: The read-only monitor can remain suspended indefinitely
when the topology table vanishes (error 1146), the hostgroup definition changes,
or the worker exits while bgd_in_progress_set is true. Extract the cleanup logic
from the COMPLETED path into a helper function, then invoke this cleanup
function in the 1146 error branch (around the topology state assignment to
TOPOLOGY_TABLE_CHECK), in the worker-exit path, and at the cleanup locations
referenced in lines 6843 and 6846-6854, ensuring the cleanup occurs before
freeing the state to properly clear any side effects from bgd_in_progress_set.
- Around line 7180-7188: The current implementation in the blue connection drain
block only drops free connections via ConnectionsFree->drop_all_connections(),
but leaves in-use active connections still attached to blue hosts during DNS
cutover. Extend the drain logic for the MySrvC server object s to also handle
active connections after dropping free connections. Implement a targeted active
connection drain or kill mechanism (without permanently shunning the server)
that clears both the free pool and any active connections currently bound to
that blue host and port before traffic shifts to green. This drain operation
should occur within the existing MyHGM write-lock section where the server s is
retrieved.
- Around line 8355-8358: The read_only_check_ERR counter is being incremented
before checking whether the error is the expected 1146 (ER_NO_SUCH_TABLE for
missing mysql.rds_topology). Reorder the logic so that the error code check (if
err == 1146) happens before the __sync_fetch_and_add call for
read_only_check_ERR, ensuring that when error 1146 is detected, the function
skips the counter increment and proceeds quietly without inflating the failure
metrics on normal RDS instances.
- Around line 3446-3451: The code does not validate the port value before adding
topology nodes to runtime servers. Since parse_aws_rds_topology() maps absent or
unparseable ports to 0, invalid ports (s.port <= 0 or out-of-range such as >
65535) should be skipped to avoid creating unusable runtime_mysql_servers
entries. Add a guard condition after the existing endpoint validation check in
the for loop iterating through discovered_servers to continue and skip the
current iteration if s.port is <= 0 or exceeds the valid port range (65535).
This validation needs to be applied in both loop locations where
discovered_servers is processed in handle_aws_rds_multi_az_cluster().
- Around line 1718-1719: The `MON_AWS_RDS_TOPOLOGY_DISCOVERY` task type is
currently falling through to the read-only log and action blocks when executed
through `monitor_read_only_thread`, which causes unnecessary read-only checks
and actions. Add a special condition check for `MON_AWS_RDS_TOPOLOGY_DISCOVERY`
task types that bypasses the read-only log/action flow entirely. Instead,
process and free the topology result (handling the multiple-column response
properly) and then return or destroy the connection without calling
`mysql_server_read_only_log` or `read_only_action_v2()`. Apply this same logic
to all locations where this task type is handled, including the areas around the
async path checks in the condition blocks.
- Around line 7318-7321: The code dereferences AWS_RDS_BGD_Hosts_resultset
without checking if it is NULL, which can cause a null pointer dereference since
the constructor initializes it to NULL. Add a NULL check for
AWS_RDS_BGD_Hosts_resultset after acquiring the aws_rds_bgd_mutex lock and
before calling raw_checksum() on it. If it is NULL, assign a default checksum
value (such as 0 or an empty configuration constant) to new_raw_checksum to
treat it as an empty configuration until the admin/runtime loader publishes a
valid resultset.
- Around line 7125-7128: The early return statement that skips execution when
strcasecmp shows the status is unchanged prevents essential per-state FSM
actions from running on subsequent polls. Remove this early return or
restructure it so that the per-state FSM actions continue to execute on every
poll iteration. Instead of returning early, use the unchanged-status check only
to suppress duplicate logging by wrapping just the log statements in a
condition, while allowing all the per-state actions (like INITIATED/IN_PROGRESS
polling and POST_PROCESSING DNS pinning retry) to run unconditionally on each
poll.

In `@lib/ProxySQL_Admin.cpp`:
- Around line 8119-8129: The incoming_servers_t struct is missing the
incoming_aws_rds_bgd_hostgroups field, which causes
load_mysql_servers_to_runtime() to always read from the local admindb table
instead of using cluster-provided resultsets. Add the
incoming_aws_rds_bgd_hostgroups field to the incoming_servers_t struct
definition in proxysql_admin.h, update the incoming_servers_t constructor in
ProxySQL_Admin.cpp to accept this new parameter, then in
load_mysql_servers_to_runtime() extract and assign the resultset using the same
conditional pattern that Aurora uses (checking if the resultset is available
before using it), replace the direct admindb->execute_statement() call at line
8120 with this conditional approach, and finally update
convert_mysql_servers_resultsets() in ProxySQL_Cluster.cpp to handle the
additional resultset pointer alongside the other incoming pointers.

---

Nitpick comments:
In `@include/MySQL_Monitor.hpp`:
- Line 436: The comment for the green_ip_ttl member variable in
MySQL_Monitor.hpp is incomplete and creates ambiguity about what non-zero values
represent. Clarify the comment for green_ip_ttl to explicitly state that a value
of 0 indicates the entry is sourced from DNS_Cache, while a non-zero value
represents an expiry timestamp (specify the format, such as monotonic_time() +
ttl_ms) for entries resolved by the BGD thread. This will make the semantic
meaning of both zero and non-zero values clear to future maintainers.

In `@lib/DNS_Cache.cpp`:
- Around line 87-108: Wrap the addrinfo pointer result from getaddrinfo() in a
unique_ptr with freeaddrinfo as the deleter to ensure exception safety.
Currently, if ips.push_back() throws an exception during the loop that processes
the addrinfo results, the freeaddrinfo(res) call is never executed, causing a
resource leak. Replace the raw res pointer with a unique_ptr that automatically
frees the memory when going out of scope, even if an exception is thrown during
the for loop iteration or ips.push_back() operations.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c6e43d8e-f745-4b01-acab-756969e5a12e

📥 Commits

Reviewing files that changed from the base of the PR and between 2deb0ac and be9187d.

📒 Files selected for processing (15)
  • include/DNS_Cache.hpp
  • include/MySQL_HostGroups_Manager.h
  • include/MySQL_Monitor.hpp
  • include/MySQL_Thread.h
  • include/ProxySQL_Admin_Tables_Definitions.h
  • include/proxysql_structs.h
  • lib/Admin_Bootstrap.cpp
  • lib/Admin_Handler.cpp
  • lib/DNS_Cache.cpp
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
  • lib/MySQL_Thread.cpp
  • lib/ProxySQL_Admin.cpp
  • lib/ProxySQL_Config.cpp
  • lib/mysql_connection.cpp
📜 Review details
⏰ Context from checks skipped due to timeout. (5)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap-mysqlx)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: run / trigger
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{cpp,h,hpp}: Feature tiers are controlled via flags: PROXYSQL31=1 for v3.1.x features (FFTO, TSDB), PROXYSQL40=1 for v4.0.x features (plugin loader). PROXYSQL40=1 implies both PROXYSQL31=1 and PROXYSQLFFTO=1 and PROXYSQLTSDB=1. Use conditional compilation with #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Class names use PascalCase with protocol prefixes: MySQL_, PgSQL_, or ProxySQL_ (e.g., MySQL_Protocol, PgSQL_Session).
Member variables use snake_case.
Constants and macros use UPPER_SNAKE_CASE.
Use C++17; conditional compilation for feature tiers via #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Use RAII for resource management; use jemalloc for memory allocation.
Use pthread mutexes for synchronization; use std::atomic<> for counters.

Files:

  • include/ProxySQL_Admin_Tables_Definitions.h
  • lib/Admin_Bootstrap.cpp
  • include/proxysql_structs.h
  • lib/mysql_connection.cpp
  • lib/Admin_Handler.cpp
  • include/MySQL_Thread.h
  • include/MySQL_Monitor.hpp
  • include/MySQL_HostGroups_Manager.h
  • lib/DNS_Cache.cpp
  • lib/ProxySQL_Admin.cpp
  • lib/ProxySQL_Config.cpp
  • include/DNS_Cache.hpp
  • lib/MySQL_Thread.cpp
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
include/**/*.{h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Include guards in headers use #ifndef __CLASS_*_H format (e.g., #ifndef __MYSQL_PROTOCOL_H).

Files:

  • include/ProxySQL_Admin_Tables_Definitions.h
  • include/proxysql_structs.h
  • include/MySQL_Thread.h
  • include/MySQL_Monitor.hpp
  • include/MySQL_HostGroups_Manager.h
  • include/DNS_Cache.hpp
{lib/ProxySQL_Admin.cpp,lib/Admin_Handler.cpp,include/ProxySQL_Admin_Tables_Definitions.h}

📄 CodeRabbit inference engine (CLAUDE.md)

ProxySQL_Admin.cpp and Admin_Handler.cpp implement the Admin Interface (SQL-based configuration via SQLite3). Admin schema versions are tracked in ProxySQL_Admin_Tables_Definitions.h. Support runtime config changes without restart.

Files:

  • include/ProxySQL_Admin_Tables_Definitions.h
  • lib/Admin_Handler.cpp
  • lib/ProxySQL_Admin.cpp
{lib,src}/**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

GenAI/MCP/RAG/LLM features live entirely in plugins/genai/ and load as a .so at runtime via dlopen. Do not guard with PROXYSQLGENAI in core code — that flag no longer guards any core code as of Step 7 of the GenAI plugin carve-out.

Files:

  • lib/Admin_Bootstrap.cpp
  • lib/mysql_connection.cpp
  • lib/Admin_Handler.cpp
  • lib/DNS_Cache.cpp
  • lib/ProxySQL_Admin.cpp
  • lib/ProxySQL_Config.cpp
  • lib/MySQL_Thread.cpp
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
🪛 ast-grep (0.44.0)
lib/ProxySQL_Config.cpp

[error] 1519-1519: Use of an unbounded buffer function that can overflow the destination; use a size-bounded equivalent (fgets, strncpy/strlcpy, strncat/strlcat, snprintf).
Context: strcpy(green_writer_str, "NULL")
Note: [CWE-120] Buffer Copy without Checking Size of Input ('Classic Buffer Overflow').

(dangerous-buffer-functions-cpp)


[error] 1524-1524: Use of an unbounded buffer function that can overflow the destination; use a size-bounded equivalent (fgets, strncpy/strlcpy, strncat/strlcat, snprintf).
Context: strcpy(green_reader_str, "NULL")
Note: [CWE-120] Buffer Copy without Checking Size of Input ('Classic Buffer Overflow').

(dangerous-buffer-functions-cpp)


[error] 1540-1540: Use of an unbounded buffer function that can overflow the destination; use a size-bounded equivalent (fgets, strncpy/strlcpy, strncat/strlcat, snprintf).
Context: sprintf(query,q, writer_hostgroup, reader_hostgroup, green_writer_str, green_reader_str, active, writer_is_also_reader, p, check_interval_ms, check_timeout_ms, autopurge_missing_checks, o)
Note: [CWE-120] Buffer Copy without Checking Size of Input ('Classic Buffer Overflow').

(dangerous-buffer-functions-cpp)

🪛 Cppcheck (2.21.0)
lib/DNS_Cache.cpp

[warning] 86-86: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)


[warning] 138-138: If memory allocation fails, then there is a possible null pointer dereference

(nullPointerOutOfMemory)

🔇 Additional comments (18)
include/DNS_Cache.hpp (1)

24-24: LGTM!

Also applies to: 88-125, 185-194

lib/DNS_Cache.cpp (1)

119-172: LGTM!

Also applies to: 254-274, 294-297, 316-373

lib/mysql_connection.cpp (1)

2170-2170: LGTM!

Also applies to: 3037-3037

lib/MySQL_Monitor.cpp (1)

8503-8503: 🩺 Stability & Availability

No action needed — Monitor_Poll dynamically grows via expand().

The Monitor_Poll::add() method already handles capacity expansion. When len_ == capacity_, it calls expand(1), which reallocates both fds_ and mmsds_ arrays using realloc() to a size determined by near_pow_2(). This ensures additional tasks from discovery checks (read-only tasks plus AWS topology tasks) are safely accommodated without dropping tasks or writing past allocated storage.

include/MySQL_Monitor.hpp (5)

63-63: LGTM!


207-208: LGTM!


477-503: LGTM!


517-517: LGTM!


528-529: LGTM!

Also applies to: 577-599

include/MySQL_Thread.h (1)

456-457: LGTM!

lib/MySQL_Thread.cpp (4)

403-403: LGTM!


1278-1278: LGTM!


2535-2535: LGTM!


4660-4660: LGTM!

include/ProxySQL_Admin_Tables_Definitions.h (1)

240-243: 🗄️ Data Integrity & Integration

The NOT NULL constraints on green hostgroups in the admin table are correct by design.

Auto-discovered entries with auto_generated=1 are explicitly filtered out before inserting into the admin table (see line 7595: if (!_runtime && r->fields[11] && atoi(r->fields[11]) != 0) { continue; }). Only manually configured rows (auto_generated=0) are persisted to the config database, and these should always have green hostgroups defined. The separation between the admin table (manual config only, NOT NULL) and runtime table (includes auto-discovered entries, DEFAULT NULL) is intentional and working as documented in the code comments at lines 7556-7558.

lib/ProxySQL_Admin.cpp (1)

150-150: LGTM!

Also applies to: 1513-1514, 7553-7623, 7955-7955, 8189-8192

lib/ProxySQL_Config.cpp (2)

1531-1533: 🗄️ Data Integrity & Integration

Config-read defaults are correctly aligned with schema DEFAULTs.

The hardcoded values (check_interval_ms=1000, check_timeout_ms=800, autopurge_missing_checks=0) match the DEFAULT clauses in both mysql_aws_rds_bgd_hostgroups and mysql_aws_aurora_hostgroups schema definitions. No changes needed.


1113-1124: 🗄️ Data Integrity & Integration

No action required. The addField function correctly skips NULL and empty values.

The implementation at line 54-64 of lib/ProxySQL_Config.cpp confirms the inline comment's assertion:

void ProxySQL_Config::addField(std::string& data, const char* name, const char* value, const char* dq) {
    std::stringstream ss;
    if (!value || !strlen(value)) return;  // Early return for NULL or empty
    // ...
}

The condition if (!value || !strlen(value)) return; handles NULL pointers explicitly, returning before any field is written. Unit tests in test/tap/tests/unit/config_validation_unit-t.cpp confirm this behavior: test_addField_null_value() asserts data.empty() after passing nullptr. For green_writer_hostgroup and green_reader_hostgroup, when SQLite fields are NULL, addField will produce no output, allowing the config to be regenerated without those fields intact. The code is correct.

__thread int mysql_thread___monitor_ping_max_failures;
__thread int mysql_thread___monitor_ping_timeout;
__thread int mysql_thread___monitor_aws_rds_topology_discovery_interval;
__thread int mysql_thread___aws_blue_green_deployment_auto_discovery;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Fix type mismatch: int vs bool for aws_blue_green_deployment_auto_discovery.

The thread-local variable is declared as int here, but it's defined as bool in MySQL_Thread.h (line 457), registered in the boolean variables map in MySQL_Thread.cpp (line 2535), and refreshed via REFRESH_VARIABLE_BOOL (line 4660). This type inconsistency will cause undefined behavior when the variable is accessed via different translation units or when boolean semantics are expected.

🔧 Proposed fix
-__thread int mysql_thread___aws_blue_green_deployment_auto_discovery;
+__thread bool mysql_thread___aws_blue_green_deployment_auto_discovery;
-extern __thread int mysql_thread___aws_blue_green_deployment_auto_discovery;
+extern __thread bool mysql_thread___aws_blue_green_deployment_auto_discovery;

Also applies to: 1733-1733

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/proxysql_structs.h` at line 1391, The thread-local variable
`mysql_thread___aws_blue_green_deployment_auto_discovery` is declared with type
`int` but should be declared as `bool` to match its definition in MySQL_Thread.h
and its registration in the boolean variables map in MySQL_Thread.cpp where it's
handled with REFRESH_VARIABLE_BOOL semantics. Change the type declaration of
`mysql_thread___aws_blue_green_deployment_auto_discovery` from `int` to `bool`
in the proxysql_structs.h file, and apply the same fix to the other instance
mentioned at line 1733.

Comment on lines +7094 to +7105
if (!exists) {
std::string ins =
"INSERT INTO mysql_aws_rds_bgd_hostgroups ("
"writer_hostgroup, reader_hostgroup, green_writer_hostgroup, green_reader_hostgroup, "
"active, writer_is_also_reader, domain_name, check_interval_ms, check_timeout_ms, "
"autopurge_missing_checks, comment, auto_generated"
") VALUES ("
+ std::to_string(writer_hg) + ", " + std::to_string(reader_hg)
+ ", NULL, NULL, 1, 0, '', 1000, 800, 0, '', 1)";
mydb->execute(ins.c_str());
added = true;
proxy_info(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Only mark BGD auto-discovery as added after a successful INSERT.

At Line 7103, mydb->execute(ins.c_str()) is unchecked and added is set to true immediately. If the insert is ignored/rejected (for example on UNIQUE(reader_hostgroup)), this returns a false success and triggers monitor refresh as if a row was created.

Proposed fix
-				mydb->execute(ins.c_str());
-				added = true;
-				proxy_info(
-					"AWS RDS: auto-generated blue/green hostgroup entry (writer HG %d, reader HG %d) from server %s:%d\n",
-					writer_hg, reader_hg, hostname.c_str(), port
-				);
+				std::string ins =
+					"INSERT OR IGNORE INTO mysql_aws_rds_bgd_hostgroups ("
+						"writer_hostgroup, reader_hostgroup, green_writer_hostgroup, green_reader_hostgroup, "
+						"active, writer_is_also_reader, domain_name, check_interval_ms, check_timeout_ms, "
+						"autopurge_missing_checks, comment, auto_generated"
+					") VALUES ("
+					+ std::to_string(writer_hg) + ", " + std::to_string(reader_hg)
+					+ ", NULL, NULL, 1, 0, '', 1000, 800, 0, '', 1)";
+				mydb->execute(ins.c_str());
+
+				char* chg_error = nullptr;
+				int chg_cols = 0;
+				int chg_affected_rows = 0;
+				SQLite3_result* chg_res = nullptr;
+				mydb->execute_statement("SELECT changes()", &chg_error, &chg_cols, &chg_affected_rows, &chg_res);
+				const bool inserted = (chg_res && chg_res->rows_count > 0 && atoi(chg_res->rows[0]->fields[0]) > 0);
+				if (chg_res) delete chg_res;
+
+				if (inserted) {
+					added = true;
+					proxy_info(
+						"AWS RDS: auto-generated blue/green hostgroup entry (writer HG %d, reader HG %d) from server %s:%d\n",
+						writer_hg, reader_hg, hostname.c_str(), port
+					);
+				}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_HostGroups_Manager.cpp` around lines 7094 - 7105, The code does not
verify that the INSERT statement actually succeeded before setting added to
true. Capture the return value from mydb->execute(ins.c_str()) and check the
number of affected rows to confirm the insert was successful. Only set added =
true after verifying that at least one row was actually inserted into the
mysql_aws_rds_bgd_hostgroups table, so that monitor refresh is only triggered
when a new BGD entry is genuinely created and not when the insert is ignored due
to constraint violations.

Comment thread lib/MySQL_Monitor.cpp Outdated
Comment thread lib/MySQL_Monitor.cpp
Comment on lines +3446 to +3451
for (const AWS_RDS_Topology_Node& s : discovered_servers) {
if (s.endpoint.empty()) {
continue;
}
const string& current_discovered_hostname = s.endpoint;
int current_discovered_port_int = s.port;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟡 Minor | ⚡ Quick win

Skip topology rows with invalid ports before adding runtime servers.

parse_aws_rds_topology() maps absent/unparseable ports to 0, and handle_aws_rds_multi_az_cluster() then adds that as a runtime server port. Skip s.port <= 0 or out-of-range ports to avoid creating unusable runtime_mysql_servers entries.

🛡️ Proposed guard
 	for (const AWS_RDS_Topology_Node& s : discovered_servers) {
-		if (s.endpoint.empty()) {
+		if (s.endpoint.empty() || s.port <= 0 || s.port > 65535) {
 			continue;
 		}

Also applies to: 3511-3516

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_Monitor.cpp` around lines 3446 - 3451, The code does not validate
the port value before adding topology nodes to runtime servers. Since
parse_aws_rds_topology() maps absent or unparseable ports to 0, invalid ports
(s.port <= 0 or out-of-range such as > 65535) should be skipped to avoid
creating unusable runtime_mysql_servers entries. Add a guard condition after the
existing endpoint validation check in the for loop iterating through
discovered_servers to continue and skip the current iteration if s.port is <= 0
or exceeds the valid port range (65535). This validation needs to be applied in
both loop locations where discovered_servers is processed in
handle_aws_rds_multi_az_cluster().

Comment thread lib/MySQL_Monitor.cpp
Comment thread lib/MySQL_Monitor.cpp
Comment thread lib/MySQL_Monitor.cpp Outdated
Comment thread lib/MySQL_Monitor.cpp
Comment on lines +7318 to +7321
// respawn the per-writer-HG workers when the host list/definition changes
pthread_mutex_lock(&aws_rds_bgd_mutex);
uint64_t new_raw_checksum = AWS_RDS_BGD_Hosts_resultset->raw_checksum();
pthread_mutex_unlock(&aws_rds_bgd_mutex);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Guard AWS_RDS_BGD_Hosts_resultset before computing its checksum.

The constructor initializes AWS_RDS_BGD_Hosts_resultset to NULL, but the BGD dispatcher starts unconditionally and dereferences it here. Treat a null resultset as an empty configuration until the admin/runtime loader publishes one.

🛡️ Proposed guard
 		// respawn the per-writer-HG workers when the host list/definition changes
 		pthread_mutex_lock(&aws_rds_bgd_mutex);
-		uint64_t new_raw_checksum = AWS_RDS_BGD_Hosts_resultset->raw_checksum();
+		uint64_t new_raw_checksum =
+			AWS_RDS_BGD_Hosts_resultset ? AWS_RDS_BGD_Hosts_resultset->raw_checksum() : 0;
 		pthread_mutex_unlock(&aws_rds_bgd_mutex);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_Monitor.cpp` around lines 7318 - 7321, The code dereferences
AWS_RDS_BGD_Hosts_resultset without checking if it is NULL, which can cause a
null pointer dereference since the constructor initializes it to NULL. Add a
NULL check for AWS_RDS_BGD_Hosts_resultset after acquiring the aws_rds_bgd_mutex
lock and before calling raw_checksum() on it. If it is NULL, assign a default
checksum value (such as 0 or an empty configuration constant) to
new_raw_checksum to treat it as an empty configuration until the admin/runtime
loader publishes a valid resultset.

Comment thread lib/MySQL_Monitor.cpp
Comment on lines +8355 to +8358
__sync_fetch_and_add(&read_only_check_ERR, 1);
unsigned int err = mmsd->mysql ? mysql_errno(mmsd->mysql) : 0;
if (err == 1146) {
// mysql.rds_topology absent (no active blue/green deployment); expected, skip quietly

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Do not count missing mysql.rds_topology as a monitor error.

The branch treats ER_NO_SUCH_TABLE as expected, but read_only_check_ERR is already incremented. Move the 1146 check before the error counter so discovery cycles on normal RDS instances do not inflate failure metrics.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_Monitor.cpp` around lines 8355 - 8358, The read_only_check_ERR
counter is being incremented before checking whether the error is the expected
1146 (ER_NO_SUCH_TABLE for missing mysql.rds_topology). Reorder the logic so
that the error code check (if err == 1146) happens before the
__sync_fetch_and_add call for read_only_check_ERR, ensuring that when error 1146
is detected, the function skips the counter increment and proceeds quietly
without inflating the failure metrics on normal RDS instances.

Comment thread lib/ProxySQL_Admin.cpp
Comment on lines +8119 to +8129
// support for AWS RDS, table mysql_aws_rds_bgd_hostgroups
query=(char *)"SELECT a.* FROM mysql_aws_rds_bgd_hostgroups a LEFT JOIN mysql_aws_rds_bgd_hostgroups b ON (a.writer_hostgroup=b.reader_hostgroup) WHERE b.reader_hostgroup IS NULL ORDER BY writer_hostgroup";
proxy_debug(PROXY_DEBUG_ADMIN, 4, "%s\n", query);
admindb->execute_statement(query, &error , &cols , &affected_rows , &resultset_aws_rds_bgd);
if (error) {
proxy_error("Error on %s : %s\n", query, error);
} else {
// Pass the resultset to MyHGM
MyHGM->save_incoming_mysql_table(resultset_aws_rds_bgd,"mysql_aws_rds_bgd_hostgroups");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check whether mysql_aws_rds_bgd_hostgroups is carried through incoming_servers_t and peer/incoming loaders.
# Expectation: incoming_servers_t has a BGD resultset field, constructor parameter, call-site wiring, and the load block uses it before falling back to admindb.

rg -n -C3 'incoming_servers_t|incoming_aurora_hostgroups|incoming_aws_rds_bgd|mysql_aws_rds_bgd_hostgroups' lib include

Repository: sysown/proxysql

Length of output: 33500


Add incoming_aws_rds_bgd_hostgroups field to incoming_servers_t and plumb it through the loader.

The incoming_servers_t struct lacks the incoming_aws_rds_bgd_hostgroups field that Aurora already has. This causes load_mysql_servers_to_runtime() to always read the local main.mysql_aws_rds_bgd_hostgroups table instead of using peer/cluster-provided resultsets, leaving cluster/runtime config out of sync when incoming BGD hostgroups are available.

Required changes:

  • Add incoming_aws_rds_bgd_hostgroups field to incoming_servers_t struct definition (include/proxysql_admin.h:155)
  • Update incoming_servers_t constructor to accept the new parameter (lib/ProxySQL_Admin.cpp:874)
  • Extract and assign it in load_mysql_servers_to_runtime() with the other incoming pointers (lib/ProxySQL_Admin.cpp:7960)
  • Replace the direct admindb->execute_statement() call at line 8120 with the conditional check used for Aurora (lines 8107–8111)
  • Update convert_mysql_servers_resultsets() (lib/ProxySQL_Cluster.cpp:1862) to account for the additional resultset pointer
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/ProxySQL_Admin.cpp` around lines 8119 - 8129, The incoming_servers_t
struct is missing the incoming_aws_rds_bgd_hostgroups field, which causes
load_mysql_servers_to_runtime() to always read from the local admindb table
instead of using cluster-provided resultsets. Add the
incoming_aws_rds_bgd_hostgroups field to the incoming_servers_t struct
definition in proxysql_admin.h, update the incoming_servers_t constructor in
ProxySQL_Admin.cpp to accept this new parameter, then in
load_mysql_servers_to_runtime() extract and assign the resultset using the same
conditional pattern that Aurora uses (checking if the resultset is available
before using it), replace the direct admindb->execute_statement() call at line
8120 with this conditional approach, and finally update
convert_mysql_servers_resultsets() in ProxySQL_Cluster.cpp to handle the
additional resultset pointer alongside the other incoming pointers.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
- Log and dump malformed `read_only` check resultsets for diagnostics
  and avoid falling back to the `read_only=1` default path.
- Keep AWS RDS topology discovery out of `read_only` result processing.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
- Classify monitor query timeouts separately when the connection target no longer
  matches the current DNS cache entry.
- Keep the timeout visible in monitor logs, but avoid applying backend state changes
  based on the stale connection.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
- During switchover, drop free backend connections for affected servers and mark
  used connections with `MySQL_Connection::healthy=false` and `reusable=false`.
- Treat unhealthy backend connections as offline in active backend paths, and
  prevent reset-algorithm-2 from resetting drained connections.
- Purge idle monitor pool entries when BGD repoints or unpins blue hosts,
  preventing monitor checks from reusing stale connections after DNS/cache changes.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
lib/MySQL_Monitor.cpp (2)

8133-8149: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Return the actual stale-IP timeout result.

mark_task_as_timeout() can set TASK_RESULT_TIMEOUT_STALE_IP, but this branch always returns TASK_RESULT_TIMEOUT, so async processors miss the stale-DNS classification on the timeout call.

Proposed fix
 		if (task_expiry_time_ > 0 && now > task_expiry_time_) {
 `#ifdef` DEBUG
 			mark_task_as_timeout((GloMyMon->proxytest_forced_timeout == false) ? now : monotonic_time());
 `#else`
 			mark_task_as_timeout(now);
 `#endif`
-			return MySQL_Monitor_State_Data_Task_Result::TASK_RESULT_TIMEOUT;
+			return task_result_;
 		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_Monitor.cpp` around lines 8133 - 8149, The timeout path in the
event handling branch always returns a generic timeout even when
mark_task_as_timeout() sets TASK_RESULT_TIMEOUT_STALE_IP, so the stale-IP
classification is lost. Update the logic around the event_ check and
task_expiry_time_ handling to return the actual task_result_ after marking the
task as timed out, preserving TASK_RESULT_TIMEOUT_STALE_IP when that is what
mark_task_as_timeout() assigns. Use the existing symbols mark_task_as_timeout(),
task_result_, and TASK_RESULT_TIMEOUT_STALE_IP to keep the fix localized.

7283-7301: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Do not leave mapped blue hosts online when their green IP is unavailable.

When p.green_ip is empty, the code logs and continues without pinning, draining, or shunning that mapped host. During POST_PROCESSING this can keep traffic on blue instead of failing over or funneling to a safe target.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_Monitor.cpp` around lines 7283 - 7301, The mapped blue-host
handling in the AWS_RDS_BGD_State::BlueGreenPair loop currently skips all action
when p.green_ip is empty, leaving that host online. Update the branch in the
bg_map iteration so an empty green_ip still triggers a safe failover path
instead of just continue: use the existing proxy_warning context, then
pin/drain/shun or otherwise remove the blue host from service via MyHGM and
My_Conn_Pool so traffic cannot remain on blue during POST_PROCESSING.
lib/MySQL_HostGroups_Manager.cpp (1)

2402-2422: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Avoid reading ca[0] before validating cnt.

c = ca[i] runs even when cnt == 0, and the new while (i < cnt) loop dereferences null entries. Iterate with a local per element after the bounds check.

Proposed fix
-	unsigned int i=0; // Index variable for iterating through the array
-	MySQL_Connection *c = nullptr; // Pointer to hold the current connection from the array
-	c=ca[i];
-
 	// Acquire a write lock to perform the operations atomically
 	wrlock();
 
 	// Iterate through the array of connections
-	while (i < cnt) {
+	for (unsigned int i = 0; i < cnt; i++) {
+		MySQL_Connection *c = ca[i];
+		if (!c) {
+			continue;
+		}
 		if (!c->reusable) {
 			c->send_quit = false;
 			destroy_MyConn_from_pool(c, false);
 		} else {
 			// Push the current connection back to the pool without acquiring a lock for each individual push
 			push_MyConn_to_pool(c, false);
 		}
-		i++;
-		if (i<cnt)
-			c=ca[i];
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_HostGroups_Manager.cpp` around lines 2402 - 2422, In
MySQL_HostGroups_Manager::push_MyConn_to_pool_array, avoid dereferencing ca[0]
before confirming cnt is nonzero and ensure each element is read only after the
bounds check. Move the per-iteration connection fetch inside the loop after
validating i < cnt, and keep the null/reusable handling within that guarded path
so the function never touches ca[i] when cnt is 0 or an entry is null. Use the
existing push_MyConn_to_pool_array, push_MyConn_to_pool, and destroy path logic,
but restructure the iteration to validate before access.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/DNS_Cache.cpp`:
- Around line 219-224: The DNS cache pin handling currently treats pinned_until
== 0 as “inactive,” so pins created by pin(host, ip) never stay authoritative.
Update the lookup/validation logic in DNS_Cache to consider ttl_ms == 0 pins as
permanently active in both affected paths, including the checks around
pinned_until, pinned_ip, and the other lookup blocks referenced by the review.
Keep the existing expiry behavior for nonzero TTLs, but make zero-TTL pins
always match pinned_ip until explicitly unpinned or replaced.

In `@lib/mysql_data_stream.cpp`:
- Around line 1767-1770: Clear the session’s last_HG_affected_rows before the
early return in the non-reusable backend path so the stale hostgroup marker is
not left behind after destroy_MySQL_Connection_From_Pool(true) detaches the
connection. Update the cleanup logic around the mc->reusable check in
MySQL_Data_Stream so that sess->last_HG_affected_rows is reset whenever the
backend being destroyed matches the tracked affected-rows hostgroup, even when
the function returns early.

In `@lib/MySQL_Monitor.cpp`:
- Around line 1791-1800: Short-circuit stale-IP timeout cases before the generic
timeout/error accounting paths. In the timeout handling around stale IP
detection in MySQL_Monitor.cpp, make sure stale_ip_timeout and
TASK_RESULT_TIMEOUT_STALE_IP are checked and returned/handled before any
proxy_error logging or p_update_mysql_error_counter-style increments. Update the
related timeout branches (including the other listed timeout handlers) so DNS/IP
churn only records the specific stale-IP message and does not fall through to
backend-failure accounting.

In `@lib/MySrvConnList.cpp`:
- Around line 51-56: MySrvConnList::mark_connections_unhealthy only clears
flags, but get_random_MyConn still has a direct remove_index_fast return path
that can hand out free-list entries even after they were marked unhealthy.
Update get_random_MyConn to skip or drain any MySQL_Connection whose
healthy/reusable flags were cleared, especially on the no-userinfo path, so
poisoned connections are never returned. Use the existing MySrvConnList,
get_random_MyConn, and mark_connections_unhealthy symbols to keep the unhealthy
state consistent across all selection paths.

---

Outside diff comments:
In `@lib/MySQL_HostGroups_Manager.cpp`:
- Around line 2402-2422: In MySQL_HostGroups_Manager::push_MyConn_to_pool_array,
avoid dereferencing ca[0] before confirming cnt is nonzero and ensure each
element is read only after the bounds check. Move the per-iteration connection
fetch inside the loop after validating i < cnt, and keep the null/reusable
handling within that guarded path so the function never touches ca[i] when cnt
is 0 or an entry is null. Use the existing push_MyConn_to_pool_array,
push_MyConn_to_pool, and destroy path logic, but restructure the iteration to
validate before access.

In `@lib/MySQL_Monitor.cpp`:
- Around line 8133-8149: The timeout path in the event handling branch always
returns a generic timeout even when mark_task_as_timeout() sets
TASK_RESULT_TIMEOUT_STALE_IP, so the stale-IP classification is lost. Update the
logic around the event_ check and task_expiry_time_ handling to return the
actual task_result_ after marking the task as timed out, preserving
TASK_RESULT_TIMEOUT_STALE_IP when that is what mark_task_as_timeout() assigns.
Use the existing symbols mark_task_as_timeout(), task_result_, and
TASK_RESULT_TIMEOUT_STALE_IP to keep the fix localized.
- Around line 7283-7301: The mapped blue-host handling in the
AWS_RDS_BGD_State::BlueGreenPair loop currently skips all action when p.green_ip
is empty, leaving that host online. Update the branch in the bg_map iteration so
an empty green_ip still triggers a safe failover path instead of just continue:
use the existing proxy_warning context, then pin/drain/shun or otherwise remove
the blue host from service via MyHGM and My_Conn_Pool so traffic cannot remain
on blue during POST_PROCESSING.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e8823f6e-b70a-4b0c-801f-dc1c218d209d

📥 Commits

Reviewing files that changed from the base of the PR and between be9187d and 63dca4b.

📒 Files selected for processing (20)
  • include/Base_HostGroups_Manager.h
  • include/DNS_Cache.hpp
  • include/MySQL_HostGroups_Manager.h
  • include/MySQL_Monitor.hpp
  • include/ProxySQL_Admin_Tables_Definitions.h
  • include/ServerSelection.h
  • include/mysql_connection.h
  • include/proxysql_structs.h
  • include/proxysql_utils.h
  • lib/Base_HostGroups_Manager.cpp
  • lib/Base_Session.cpp
  • lib/DNS_Cache.cpp
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
  • lib/MySQL_Session.cpp
  • lib/MySQL_Thread.cpp
  • lib/MySrvConnList.cpp
  • lib/mysql_connection.cpp
  • lib/mysql_data_stream.cpp
  • lib/proxysql_utils.cpp
📜 Review details
⏰ Context from checks skipped due to timeout. (5)
  • GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap-mysqlx)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: run / trigger
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{cpp,h,hpp}: Feature tiers are controlled via flags: PROXYSQL31=1 for v3.1.x features (FFTO, TSDB), PROXYSQL40=1 for v4.0.x features (plugin loader). PROXYSQL40=1 implies both PROXYSQL31=1 and PROXYSQLFFTO=1 and PROXYSQLTSDB=1. Use conditional compilation with #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Class names use PascalCase with protocol prefixes: MySQL_, PgSQL_, or ProxySQL_ (e.g., MySQL_Protocol, PgSQL_Session).
Member variables use snake_case.
Constants and macros use UPPER_SNAKE_CASE.
Use C++17; conditional compilation for feature tiers via #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Use RAII for resource management; use jemalloc for memory allocation.
Use pthread mutexes for synchronization; use std::atomic<> for counters.

Files:

  • include/mysql_connection.h
  • include/proxysql_utils.h
  • lib/proxysql_utils.cpp
  • include/Base_HostGroups_Manager.h
  • include/ProxySQL_Admin_Tables_Definitions.h
  • include/ServerSelection.h
  • lib/Base_Session.cpp
  • lib/MySrvConnList.cpp
  • include/DNS_Cache.hpp
  • lib/Base_HostGroups_Manager.cpp
  • lib/MySQL_Thread.cpp
  • include/proxysql_structs.h
  • lib/MySQL_Session.cpp
  • lib/mysql_data_stream.cpp
  • include/MySQL_Monitor.hpp
  • lib/mysql_connection.cpp
  • lib/DNS_Cache.cpp
  • include/MySQL_HostGroups_Manager.h
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
include/**/*.{h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Include guards in headers use #ifndef __CLASS_*_H format (e.g., #ifndef __MYSQL_PROTOCOL_H).

Files:

  • include/mysql_connection.h
  • include/proxysql_utils.h
  • include/Base_HostGroups_Manager.h
  • include/ProxySQL_Admin_Tables_Definitions.h
  • include/ServerSelection.h
  • include/DNS_Cache.hpp
  • include/proxysql_structs.h
  • include/MySQL_Monitor.hpp
  • include/MySQL_HostGroups_Manager.h
{lib,src}/**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

GenAI/MCP/RAG/LLM features live entirely in plugins/genai/ and load as a .so at runtime via dlopen. Do not guard with PROXYSQLGENAI in core code — that flag no longer guards any core code as of Step 7 of the GenAI plugin carve-out.

Files:

  • lib/proxysql_utils.cpp
  • lib/Base_Session.cpp
  • lib/MySrvConnList.cpp
  • lib/Base_HostGroups_Manager.cpp
  • lib/MySQL_Thread.cpp
  • lib/MySQL_Session.cpp
  • lib/mysql_data_stream.cpp
  • lib/mysql_connection.cpp
  • lib/DNS_Cache.cpp
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
{lib/ProxySQL_Admin.cpp,lib/Admin_Handler.cpp,include/ProxySQL_Admin_Tables_Definitions.h}

📄 CodeRabbit inference engine (CLAUDE.md)

ProxySQL_Admin.cpp and Admin_Handler.cpp implement the Admin Interface (SQL-based configuration via SQLite3). Admin schema versions are tracked in ProxySQL_Admin_Tables_Definitions.h. Support runtime config changes without restart.

Files:

  • include/ProxySQL_Admin_Tables_Definitions.h
🔇 Additional comments (18)
include/mysql_connection.h (1)

169-169: 🩺 Stability & Availability

healthy is already initialized in the constructor and reset path.

			> Likely an incorrect or invalid review comment.
include/MySQL_Monitor.hpp (1)

211-214: 🎯 Functional Correctness

No extra consumer handling is needed for TASK_RESULT_TIMEOUT_STALE_IP. The state-changing monitor paths already branch on the stale-IP timeout and skip backend state updates; the remaining handlers only do generic error cleanup/logging.

			> Likely an incorrect or invalid review comment.
include/proxysql_structs.h (1)

22-23: 🎯 Functional Correctness

No action needed for MYSQL_SERVER_STATUS_SHUNNED_AWS_BGD.

lib/MySQL_Monitor.cpp (3)

7244-7247: Do not skip BGD FSM work when the status string is unchanged.

This still prevents idempotent per-poll actions like green IP refresh and POST_PROCESSING pin/drain retries.


8488-8495: Still counts absent mysql.rds_topology as a read-only error.

The 1146 branch skips the detailed error counter, but read_only_check_ERR is already incremented before that check.


247-256: LGTM!

Also applies to: 350-381, 838-859, 1763-1780, 1832-1841, 1859-1938, 2037-2243, 2395-2695, 7294-7379, 7767-7782, 8541-8598, 8826-8828, 9388-9390

lib/proxysql_utils.cpp (1)

849-923: LGTM!

Also applies to: 934-941

include/Base_HostGroups_Manager.h (1)

89-89: LGTM!

Also applies to: 98-98, 120-120

lib/mysql_connection.cpp (1)

431-431: LGTM!

Also applies to: 2164-2185, 3042-3048, 3101-3101

lib/DNS_Cache.cpp (1)

234-257: LGTM!

Also applies to: 268-295, 313-325, 376-399, 452-477

include/MySQL_HostGroups_Manager.h (1)

79-79: LGTM!

Also applies to: 88-88, 110-110, 189-189, 606-629, 733-744, 1070-1102, 1188-1209

lib/MySQL_HostGroups_Manager.cpp (1)

663-663: LGTM!

Also applies to: 1887-1889, 2552-2552, 3513-3515, 3848-3939

include/ProxySQL_Admin_Tables_Definitions.h (1)

150-150: LGTM!

Also applies to: 240-259

include/ServerSelection.h (1)

23-29: LGTM!

lib/Base_Session.cpp (1)

512-518: LGTM!

lib/Base_HostGroups_Manager.cpp (1)

1766-1783: LGTM!

Also applies to: 3130-3148

lib/MySQL_Thread.cpp (1)

403-403: LGTM!

Also applies to: 1278-1278, 2535-2535, 4660-4660

include/DNS_Cache.hpp (1)

120-128: 🩺 Stability & Availability

Ensure IP_ADDR::counter is only incremented under exclusive locking. If get_next_ip() can run while rwlock_ is held for reads, concurrent lookups race on this mutable counter.

Comment thread lib/DNS_Cache.cpp
Comment thread lib/mysql_data_stream.cpp Outdated
Comment thread lib/MySQL_Monitor.cpp
Comment on lines +1791 to +1800
stale_ip_timeout = GloMyMon->timeout_validate_ip_change(mmsd);
mmsd->mysql_error_msg=strdup(stale_ip_timeout ? "resolved IP no longer valid" : "timeout check");
if (stale_ip_timeout) {
proxy_debug(PROXY_DEBUG_MONITOR, 5,
"Ignoring read_only timeout for %s:%d because resolved IP is no longer valid\n",
mmsd->hostname, mmsd->port);
} else {
proxy_error("Timeout on read_only check for %s:%d after %lldms. If the server is overload, increase mysql-monitor_read_only_timeout.\n", mmsd->hostname, mmsd->port, (now-mmsd->t1)/1000);
MyHGM->p_update_mysql_error_counter(p_mysql_error_type::proxysql, mmsd->hostgroup_id, mmsd->hostname, mmsd->port, ER_PROXYSQL_READ_ONLY_CHECK_TIMEOUT);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Short-circuit stale-IP timeouts before generic error accounting.

The timeout sites identify stale DNS/IP changes, but later generic failure paths still increment ERR/error counters and emit error logs. Handle stale_ip_timeout / TASK_RESULT_TIMEOUT_STALE_IP before the generic failure branches so DNS churn does not look like backend health failure.

Also applies to: 1974-1987, 2023-2027, 8471-8506, 8514-8525, 8724-8743, 9210-9238

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySQL_Monitor.cpp` around lines 1791 - 1800, Short-circuit stale-IP
timeout cases before the generic timeout/error accounting paths. In the timeout
handling around stale IP detection in MySQL_Monitor.cpp, make sure
stale_ip_timeout and TASK_RESULT_TIMEOUT_STALE_IP are checked and
returned/handled before any proxy_error logging or
p_update_mysql_error_counter-style increments. Update the related timeout
branches (including the other listed timeout handlers) so DNS/IP churn only
records the specific stale-IP message and does not fall through to
backend-failure accounting.

Comment thread lib/MySrvConnList.cpp
Comment on lines +51 to +56
void MySrvConnList::mark_connections_unhealthy() {
for (unsigned int i = 0; i < conns_length(); i++) {
MySQL_Connection *conn = index(i);
conn->healthy=false;
conn->reusable=false;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Free-list connections marked unhealthy can still be reused.

This only flips flags. MySrvConnList::get_random_MyConn() still has direct remove_index_fast() return paths when the caller lacks userinfo, so sockets poisoned during a blue/green cutover can still be handed back out instead of being drained.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/MySrvConnList.cpp` around lines 51 - 56,
MySrvConnList::mark_connections_unhealthy only clears flags, but
get_random_MyConn still has a direct remove_index_fast return path that can hand
out free-list entries even after they were marked unhealthy. Update
get_random_MyConn to skip or drain any MySQL_Connection whose healthy/reusable
flags were cleared, especially on the no-userinfo path, so poisoned connections
are never returned. Use the existing MySrvConnList, get_random_MyConn, and
mark_connections_unhealthy symbols to keep the unhealthy state consistent across
all selection paths.

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
include/MySQL_HostGroups_Manager.h (1)

717-728: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

Include BGD hostgroups in the MySQL servers v2 checksum path.

The new incoming_aws_rds_bgd_hostgroups staging table is not represented in HGM_TABLES or the checksum table list shown above. Changes to mysql_aws_rds_bgd_hostgroups can therefore miss the mysql_servers_v2 checksum/cluster propagation path and leave peers with stale BGD routing config.

Minimal direction
 enum HGM_TABLES {
 	MYSQL_SERVERS_V2 = 0,
 	MYSQL_REPLICATION_HOSTGROUPS,
 	MYSQL_GROUP_REPLICATION_HOSTGROUPS,
 	MYSQL_GALERA_HOSTGROUPS,
 	MYSQL_AWS_AURORA_HOSTGROUPS,
+	MYSQL_AWS_RDS_BGD_HOSTGROUPS,
 	MYSQL_HOSTGROUP_ATTRIBUTES,

Also update the checksum-generation and cluster fetch/convert paths for the new table.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/MySQL_HostGroups_Manager.h` around lines 717 - 728, The new
incoming_aws_rds_bgd_hostgroups/mysql_aws_rds_bgd_hostgroups table is missing
from the mysql_servers_v2 checksum/propagation flow, so updates can be skipped
for peers. Add this table to HGM_TABLES and the checksum table list used for
mysql_servers_v2, and make sure the checksum-generation plus cluster
fetch/convert paths recognize it so staged BGD hostgroups are included
end-to-end. Verify the new logic is wired through the same paths that handle
generate_mysql_aws_rds_bgd_hostgroups_table and incoming_aws_rds_bgd_hostgroups.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@include/MySQL_HostGroups_Manager.h`:
- Around line 717-728: The new
incoming_aws_rds_bgd_hostgroups/mysql_aws_rds_bgd_hostgroups table is missing
from the mysql_servers_v2 checksum/propagation flow, so updates can be skipped
for peers. Add this table to HGM_TABLES and the checksum table list used for
mysql_servers_v2, and make sure the checksum-generation plus cluster
fetch/convert paths recognize it so staged BGD hostgroups are included
end-to-end. Verify the new logic is wired through the same paths that handle
generate_mysql_aws_rds_bgd_hostgroups_table and incoming_aws_rds_bgd_hostgroups.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d4543c6f-6876-4d07-8333-d268afcd5a6d

📥 Commits

Reviewing files that changed from the base of the PR and between 63dca4b and 4eac77d.

📒 Files selected for processing (9)
  • include/DNS_Cache.hpp
  • include/MySQL_HostGroups_Manager.h
  • include/MySQL_Monitor.hpp
  • lib/DNS_Cache.cpp
  • lib/MySQL_HostGroups_Manager.cpp
  • lib/MySQL_Monitor.cpp
  • lib/ProxySQL_Admin.cpp
  • lib/ProxySQL_Cluster.cpp
  • lib/mysql_data_stream.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
  • lib/mysql_data_stream.cpp
  • lib/DNS_Cache.cpp
  • include/DNS_Cache.hpp
  • include/MySQL_Monitor.hpp
  • lib/MySQL_Monitor.cpp
📜 Review details
⏰ Context from checks skipped due to timeout. (5)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap-mysqlx)
  • GitHub Check: CI-builds / builds (ubuntu24,-tap-genai-gcov)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: run / trigger
🧰 Additional context used
📓 Path-based instructions (4)
**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{cpp,h,hpp}: Feature tiers are controlled via flags: PROXYSQL31=1 for v3.1.x features (FFTO, TSDB), PROXYSQL40=1 for v4.0.x features (plugin loader). PROXYSQL40=1 implies both PROXYSQL31=1 and PROXYSQLFFTO=1 and PROXYSQLTSDB=1. Use conditional compilation with #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Class names use PascalCase with protocol prefixes: MySQL_, PgSQL_, or ProxySQL_ (e.g., MySQL_Protocol, PgSQL_Session).
Member variables use snake_case.
Constants and macros use UPPER_SNAKE_CASE.
Use C++17; conditional compilation for feature tiers via #ifdef PROXYSQL31, #ifdef PROXYSQL40, #ifdef PROXYSQLFFTO, #ifdef PROXYSQLTSDB, #ifdef PROXYSQLCLICKHOUSE.
Use RAII for resource management; use jemalloc for memory allocation.
Use pthread mutexes for synchronization; use std::atomic<> for counters.

Files:

  • lib/ProxySQL_Cluster.cpp
  • lib/ProxySQL_Admin.cpp
  • lib/MySQL_HostGroups_Manager.cpp
  • include/MySQL_HostGroups_Manager.h
{lib,src}/**/*.{cpp,h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

GenAI/MCP/RAG/LLM features live entirely in plugins/genai/ and load as a .so at runtime via dlopen. Do not guard with PROXYSQLGENAI in core code — that flag no longer guards any core code as of Step 7 of the GenAI plugin carve-out.

Files:

  • lib/ProxySQL_Cluster.cpp
  • lib/ProxySQL_Admin.cpp
  • lib/MySQL_HostGroups_Manager.cpp
{lib/ProxySQL_Admin.cpp,lib/Admin_Handler.cpp,include/ProxySQL_Admin_Tables_Definitions.h}

📄 CodeRabbit inference engine (CLAUDE.md)

ProxySQL_Admin.cpp and Admin_Handler.cpp implement the Admin Interface (SQL-based configuration via SQLite3). Admin schema versions are tracked in ProxySQL_Admin_Tables_Definitions.h. Support runtime config changes without restart.

Files:

  • lib/ProxySQL_Admin.cpp
include/**/*.{h,hpp}

📄 CodeRabbit inference engine (CLAUDE.md)

Include guards in headers use #ifndef __CLASS_*_H format (e.g., #ifndef __MYSQL_PROTOCOL_H).

Files:

  • include/MySQL_HostGroups_Manager.h
🪛 ast-grep (0.44.0)
lib/ProxySQL_Cluster.cpp

[error] 2254-2254: Use of an unbounded buffer function that can overflow the destination; use a size-bounded equivalent (fgets, strncpy/strlcpy, strncat/strlcat, snprintf).
Context: sprintf(query, q, row[0], row[1], row[2], row[3], status, row[5], row[6], row[7], row[8], row[9], row[10], o)
Note: [CWE-120] Buffer Copy without Checking Size of Input ('Classic Buffer Overflow').

(dangerous-buffer-functions-cpp)

🔇 Additional comments (4)
lib/ProxySQL_Cluster.cpp (1)

2251-2255: LGTM!

lib/ProxySQL_Admin.cpp (1)

7333-7361: LGTM!

Also applies to: 7557-7627

lib/MySQL_HostGroups_Manager.cpp (1)

3605-3607: LGTM!

Also applies to: 3902-3938

include/MySQL_HostGroups_Manager.h (1)

1101-1106: 🩺 Stability & Availability

No counter imbalance here

aws_rds_bgd_set_bgd_in_progress() and aws_rds_bgd_clear_bgd_in_progress() are already guarded by AWS_RDS_BGD_State::bgd_in_progress_set, so each transition is applied at most once per state.

			> Likely an incorrect or invalid review comment.

- Move the blue readers to SHUNNED with a fixed recovery time
  after BGD status `SWITCHOVER_COMPLETED`

Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
@sonarqubecloud

sonarqubecloud Bot commented Jul 2, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
D Security Rating on New Code (required ≥ A)

See analysis details on SonarQube Cloud

Catch issues before they fail your Quality Gate with our IDE extension SonarQube for IDE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant