Testpq/tap pq session/v14#3
Open
adunstan wants to merge 7 commits into
Open
Conversation
Introduce a session object for TAP tests that talks to the server directly through libpq instead of spawning a psql child process for each background session. Connecting in-process avoids the overhead and fragility of driving psql over pipes and gives tests synchronous and asynchronous query interfaces, notice capture, and pipelined queries. libpq is reached through an FFI::Platypus wrapper, so no compiled XS module is required. The new modules are: PostgreSQL::Test::Session - the libpq session object PostgreSQL::PqFFI - FFI bindings to libpq PostgreSQL::PqConstants - libpq constants PostgreSQL::PGTypes - libpq type OID definitions PostgreSQL::FindLib - locate the libpq shared library Unless the connection string already names a role, the session pins the connecting user to the operating-system user rather than letting a stray PGUSER in the environment select it; this matches how the rest of the test framework connects and, in particular, the SSPI usermap used on Windows. A failed connection records the libpq error message (in $PostgreSQL::Test::Session::connect_error) before returning undef, so callers that treat the failure as fatal can report why. Install the new modules from both the make and meson builds.
Make the test framework itself benefit from in-process libpq access:
* PostgreSQL::Test::Cluster::safe_psql() runs a single, parameterless
statement through a Session instead of spawning psql, and reports the
libpq connection error if the session cannot be established. It falls
back to psql for multi-statement or parameterized calls, and also when
PGOPTIONS is set: PGOPTIONS injects connection-time GUCs the caller
relies on, and an FFI-loaded libpq does not reliably see a PGOPTIONS
set in %ENV at run time on Windows (separate C runtime), whereas a
spawned psql inherits it.
* poll_query_until() polls through a Session.
* add poll_until_connection() to wait for the server to accept
connections.
Also harden PostgreSQL::Test::Utils against errors raised during global
destruction. Such errors happen after the test body has finished (for
example FFI::Platypus bindings or callback closures being freed in an
unpredictable order as the process exits); reporting them by calling
done_testing() a second time would spuriously fail an otherwise-passing
test, so they are now logged and ignored.
Convert tests that drove a long-lived background psql process (or repeatedly spawned psql) to use the libpq Session object instead. This covers concurrent-session and asynchronous-query tests across contrib, src/bin and src/test: amcheck, bloom, pg_visibility, test_decoding, pg_amcheck, pg_upgrade, authentication, oauth_validator, test_aio, test_checksums, test_misc, test_slru, xid_wraparound, postmaster, recovery and subscription. Where a test must load libpq from a different installation than the one it connects to (pg_upgrade's cross-version workload), it passes that node so Session locates the library correctly on all platforms.
PostgreSQL::Test::Session loads libpq in-process via FFI::Platypus, which the CI environments do not yet provide, and the switch exposed several other gaps in the CI setups: * Install FFI::Platypus at job setup time: from the Debian package on the Linux containers, via MacPorts on macOS, and from CPAN on MinGW (MSYS2 does not package it). The Windows VS job needs nothing, as Strawberry Perl already bundles FFI::Platypus. * Linux - Meson (64-bit) builds with -fsanitize=address. Loading the ASan-instrumented libpq into the uninstrumented perl aborts every TAP test with "ASan runtime does not come first in initial library list" (reported as exit status 250 via testwrap). Preload the ASan runtime for that job's "Test world" step, scoped to the step so the build and ccache are unaffected. * Linux - Meson (32-bit) runs its TAP tests with the image's i386 perl, so it needs the i386 build of libffi-platypus-perl. That package cannot be installed alongside the amd64 one (it depends on perl:i386, which conflicts with the installed amd64 perl), so download it and extract it into place with dpkg -x, the same way the image provides the i386 perl itself. Verified against the linux_debian_trixie_ci container image. * Adding p5.34-ffi-platypus to the macOS package list changed the MacPorts cache key, and the resulting fresh MacPorts installation revealed that curl and libuuid were never in the package list even though the job configures with -Dlibcurl=enabled and -Duuid=e2fs; both had been supplied by stale cached installations. Add curl and libuuid (MacPorts' build of util-linux's libuuid, which installs uuid/uuid.h and uuid.pc). Eventually the Linux requirements should instead be handled in the pg-vm-images container builds.
PostgreSQL::Test::Session, used throughout the TAP test framework, requires FFI::Platypus, but neither build system checked for it at configure time; a missing module surfaced as hundreds of individual test failures instead. Add the module to config/check_modules.pl, which both configure and meson already run when TAP tests are requested, requiring at least version 1.00 (PostgreSQL::PqFFI uses the api => 1 interface and record_layout_1). Mention the new requirement in the documentation and in src/test/perl/README. The generated configure was re-created with Autoconf 2.69.
4fffbe0 to
7a6a6ca
Compare
Every TAP test using PostgreSQL::Test::Session emitted a "die during global destruction (ignored): Can't use an undefined value as a subroutine reference" diagnostic, plus a Test2 "context destroyed" trace, at exit. The culprit is FFI::Platypus::Record: it keeps the PGnotify record's Record::Meta object in a package-glob closure, so the object survives until global destruction. Record::Meta's DESTROY is an FFI-attached function taking a custom type marshalled by a Perl closure; by the time it runs during global destruction that closure may already have been freed, so invoking it dies (observed with FFI::Platypus 2.08). Override the destructor to do nothing during global destruction; the process is exiting anyway, so the memory is reclaimed by the OS.
The conversion to PostgreSQL::Test::Session (commit dd52766) changed the background session's pg_logical_slot_peek_binary_changes() calls from background_psql's synchronous query_safe() to do_async(). That let the subsequent pg_drop_replication_slot() race against the still-running peek, which holds the slot, producing intermittent "replication slot \"regression_slot5\" is active for PID n" failures (seen on the slower ASan CI job). Use Session's synchronous query_safe() instead, restoring the original ordering: the peek completes (releasing the slot but keeping a local reference to the stats entry) before the slot is dropped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.