Skip to content

cluster: make SELinux and THP check auto-fixes work on non-RHEL hosts#2712

Open
yahonda wants to merge 2 commits into
pingcap:masterfrom
yahonda:fix/selinux-status-check-no-getenforce
Open

cluster: make SELinux and THP check auto-fixes work on non-RHEL hosts#2712
yahonda wants to merge 2 commits into
pingcap:masterfrom
yahonda:fix/selinux-status-check-no-getenforce

Conversation

@yahonda

@yahonda yahonda commented Jun 4, 2026

Copy link
Copy Markdown
Member

What problem does this PR solve?

Two recently-merged checks added auto-fix commands that assume a RHEL-family host. They break tiup cluster check --apply (and the cluster/dm integration tests) on Debian/Ubuntu, where the assumed tools/files don't exist. Both currently fail on master.

1. SELinux status check (#2496). CheckSELinuxStatus runs getenforce. On hosts without the SELinux userspace tools (most Debian/Ubuntu, incl. the CI containers) the command is missing, so the check is reported as Fail. With --apply that triggers the SELinux fix:

sed -i 's/.../' /etc/selinux/config && setenforce 0

but /etc/selinux/config doesn't exist there → sed: can't read /etc/selinux/config: No such file or directory → the run aborts.

2. THP check (#2498). The THP auto-fix was extended with:

grubby --update-kernel=ALL --args="transparent_hugepage=never"

grubby is a RHEL-family tool and is absent on Debian/Ubuntu → grubby: command not found (exit 127) → the run aborts.

What is changed and how it works?

  • CheckSELinuxStatus: treat a getenforce execution error (binary absent → SELinux not installed/enforcing) as disabled rather than a failure, restoring the lenient pre-cluster: check both SELinux status and config #2496 behavior. Enforcing still fails and Permissive still warns. The configuration is still validated independently by CheckSELinuxConf.
  • THP auto-fix: guard the grubby call with command -v grubby so the persistent kernel argument is set where grubby exists, but skipped on hosts without it. The runtime setting (echo never > .../enabled) still applies everywhere.

Together these make the SELinux and THP checks Debian/non-RHEL safe and unblock the cluster/dm integration tests.

Check List

Tests

  • Unit test (go test ./pkg/cluster/operation/... ./pkg/cluster/manager/...)
  • Integration test (cluster/dm CI on this PR)

Side effects

  • None on RHEL-family hosts; behavior there is unchanged.

Related changes

pingcap#2496 added a getenforce-based SELinux status probe in addition to the
config-file check. On hosts without the SELinux userspace tools (most
Debian/Ubuntu systems, including the integration-test containers),
getenforce is missing, so CheckSELinuxStatus failed. With --apply that
failed check triggers the SELinux fix, which runs

    sed -i 's/.../' /etc/selinux/config && setenforce 0

but /etc/selinux/config does not exist on those hosts, so the command
errors with 'sed: can't read /etc/selinux/config: No such file or
directory' and the whole check/apply run aborts. This breaks the cluster
and dm integration tests on master.

Treat a getenforce execution error as 'SELinux disabled' (the host has no
SELinux), matching the lenient pre-pingcap#2496 behavior, instead of a failure.
Enforcing still fails and Permissive still warns. The configuration is
still validated independently by CheckSELinuxConf.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ti-chi-bot ti-chi-bot Bot requested a review from breezewish June 4, 2026 07:48
@ti-chi-bot

ti-chi-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bb7133 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot requested a review from kaaaaaaang June 4, 2026 07:48
@ti-chi-bot ti-chi-bot Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 4, 2026
@codecov-commenter

codecov-commenter commented Jun 4, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.30%. Comparing base (9bdae04) to head (48f3e74).

Files with missing lines Patch % Lines
pkg/cluster/operation/check.go 0.00% 2 Missing ⚠️
pkg/cluster/manager/check.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2712      +/-   ##
==========================================
- Coverage   42.72%   42.30%   -0.42%     
==========================================
  Files         424      424              
  Lines       49744    47146    -2598     
==========================================
- Hits        21253    19943    -1310     
+ Misses      25797    24504    -1293     
- Partials     2694     2699       +5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

pingcap#2498 added a persistent THP disable via

    grubby --update-kernel=ALL --args="transparent_hugepage=never"

to the THP check auto-fix. grubby is a RHEL-family tool and is not present
on Debian/Ubuntu (including the integration-test containers), so with
--apply the fix aborts with 'grubby: command not found' (exit 127) and the
whole check/apply run fails.

Guard the grubby invocation with 'command -v grubby' so the persistent
kernel argument is still set where grubby exists, but is skipped on hosts
without it. The runtime THP setting (echo never > .../enabled) still
applies everywhere.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yahonda yahonda changed the title cluster: don't fail SELinux status check when getenforce is unavailable cluster: make SELinux and THP check auto-fixes work on non-RHEL hosts Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants