Review/test: Add wait events for server logging destination writes (v4-0001)#45
Review/test: Add wait events for server logging destination writes (v4-0001)#45NikolayS wants to merge 1 commit into
Conversation
When a backend writes server log output, the underlying call can
block: write(2) to the syslogger pipe or to stderr once the pipe
buffer fills up or the output device is slow, and syslog(3) when the
system logger is slow. These blocking calls were not instrumented, so
pg_stat_activity reported wait_event IS NULL during that time. Many
monitoring tools interpret NULL as on-CPU work, which made
heavy-logging stalls hard to attribute.
Add three new WaitEventIO events and report them around the relevant
calls:
IO / SysloggerWrite - write(2) to the syslogger pipe inside
write_pipe_chunks().
IO / StderrWrite - write(2) to stderr inside write_console().
IO / SyslogWrite - syslog(3) inside write_syslog().
The instrumentation is limited to the leaf write/syslog call. It uses
only the existing pgstat_report_wait_start()/end() inline helpers,
which are allocation-free and safe to call before MyProc is set up, so
this remains safe to invoke from within error reporting paths.
|
Subject: Re: Add wait events for server logging destination writes Hi Seongjun, I tested v4-0001 on Linux and macOS. It applies and builds cleanly, and On Andrey's nesting concern: I checked the most likely case, One minor thing: a short comment noting that the write happens before I also exercised 0002 on Windows via CI (MSVC build, heavy logging into Nik |
What this is
Patch v4-0001 from Seongjun Shin's thread "Add wait events for server
logging destination writes", applied on top of current upstream
masterforreview and testing. Authorship is preserved (
git am).0002 (
WriteConsoleW+EventlogWrite) is not included here.The text below is my review, recorded here and intended to be sent to
pgsql-hackers.
Review
Hi Seongjun,
I reviewed and tested v4-0001 (the portable part). Short version: it applies
cleanly, builds without new warnings, and does exactly what it claims — log
write stalls that previously showed
wait_event IS NULL("on CPU") are nowattributable in
pg_stat_activity. +1 on the approach.Test setup
master(assert-enabled build); the patch also rebasescleanly onto the current tip.
Windows box for 0002).
RAISE LOGlines, while a server-side sampler snapshottedpg_stat_activityevery ~2 ms (~5000 samples).Results
logging_collector=onwrite_pipe_chunks()-> syslogger pipeIO / SysloggerWrite(33,575 samples)logging_collector=offwrite_console()-> stderrIO / StderrWrite(34,206 samples)On unpatched master the same workload shows
wait_event IS NULLthroughout.I did not exercise
IO / SyslogWrite(nosyslogdestination configured),but its instrumentation is structurally identical to the two paths above.
Code
write()sites inwrite_pipe_chunks()(the full-payload loop and thefinal chunk), the
write()inwrite_console(), and bothsyslog()branches in
write_syslog(). No Unix-side path is missed.pgstat_report_wait_start()writes to*my_wait_event_info, which defaults to a process-local variable(
local_my_wait_event_info) untilpgstat_set_wait_event_storage()runs atbackend startup. So invoking these before
MyProcis attached — or from thepostmaster — is safe; it simply won't be visible in
pg_stat_activity.That's the expected/acceptable behavior, but it might be worth a one-line
comment so readers don't expect postmaster-side log writes to surface.
error reporting.
Minor / open
pgstat_report_wait_start()site could help,though the event names are self-explanatory.
upthread.
(
WriteConsoleW/EventlogWrite) — already flagged in the thread.From my side the Unix portion is ready-for-committer; the outstanding item is
Windows verification of 0002.
Tested-by: Nikolay Samokhvalov nik@postgres.ai