Skip to content

MCO: GOMAXPROCS Injection for Go Containers#2047

Open
haircommander wants to merge 1 commit into
openshift:masterfrom
haircommander:gomaxprocs-injection
Open

MCO: GOMAXPROCS Injection for Go Containers#2047
haircommander wants to merge 1 commit into
openshift:masterfrom
haircommander:gomaxprocs-injection

Conversation

@haircommander

Copy link
Copy Markdown
Member

Summary

This enhancement proposes adding a gomaxprocsInjectionBehavior field to the ContainerRuntimeConfig API that enables CRI-O to automatically inject the GOMAXPROCS environment variable into containers based on their CPU resource requests.

Details

  • New API field: gomaxprocsInjectionBehavior (enum: Inject | Disabled)
  • CRI-O configuration: Translates to min_injected_gomaxprocs in CRI-O config
  • Calculation formula: max(ceil(cpu_request_in_cores) * 2, 1)
  • Feature gate: GomaxprocsInjection (TechPreviewNoUpgrade)

Example Usage

apiVersion: machineconfiguration.openshift.io/v1
kind: ContainerRuntimeConfig
metadata:
  name: enable-gomaxprocs
spec:
  machineConfigPoolSelector:
    matchLabels:
      pools.operator.machineconfiguration.openshift.io/worker: ''
  containerRuntimeConfig:
    gomaxprocsInjectionBehavior: Inject

Result:

  • Container with requests.cpu: 1500mGOMAXPROCS=4
  • Container with requests.cpu: 250mGOMAXPROCS=2

Related Issue

Tracking: https://redhat.atlassian.net/browse/OCPNODE-4600

Test Plan

  • Integration tests for API field validation
  • E2E tests for CRI-O configuration rendering
  • Functional tests for GOMAXPROCS injection behavior

🤖 Generated with Claude Code

@openshift-ci openshift-ci Bot requested review from dgoodwin and yuqi-zhang June 25, 2026 19:53
@openshift-ci

openshift-ci Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


2. The Machine Config Operator renders the CRI-O configuration setting `min_injected_gomaxprocs = 1`

3. CRI-O calculates GOMAXPROCS for each container as `max(ceil(cpu_request_in_cores) * 2, 1)` and injects it as an environment variable before the container starts

@harche harche Jun 26, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we double check the order of operations here? On master the hook doubles the CPU shares first and then takes the ceil, not ceil then double. calculateGOMAXPROCS returns max((shares*2+1023)/1024, 1), and per the function doc kubelet sets shares = cores * 1024, so it reduces to max(ceil(cores * 2), 1, floor). The unit tests confirm the doubling: 3 cores to 6, 4 cores to 8, 5 cores to 10, 8 cores to 16. So 1500m lands on 3 (ceil(1.5 * 2)), not 4.

Comment on lines +71 to +72
- Container with `requests.cpu: 1500m` → `GOMAXPROCS=4` (ceil(1.5) * 2 = 4)
- Container with `requests.cpu: 250m` → `GOMAXPROCS=2` (ceil(0.25) = 1, 1 * 2 = 2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following from the formula, would these values need updating? The unit tests give the exact answers for the floor of 1 that this enhancement uses: 250m (shares=256), floor=1 returns 1. And 1500m works out to 3 via the same doubling the tests show (3 cores to 6, so 1.5 cores to 3). So I make these 3 and 1 rather than 4 and 2. Does that match what you see?

- Container with `requests.cpu: 1500m` → `GOMAXPROCS=4` (ceil(1.5) * 2 = 4)
- Container with `requests.cpu: 250m` → `GOMAXPROCS=2` (ceil(0.25) = 1, 1 * 2 = 2)
- Container with `requests.cpu: 100m` → `GOMAXPROCS=2` (ceil(0.1) = 1, 1 * 2 = 2)
- Container with no CPU request → No GOMAXPROCS injection

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing: best-effort pods (no CPU request) do get an injection on master, using the floor value directly rather than being skipped. The hook only returns early for pods outside burstable or besteffort (cgroup parent check), and the calculation returns the floor for shares=2 (best-effort). The config doc also states the floor is used directly for best-effort, and injecting a value like 1 is exercised in tests. Should this say they receive the floor value instead?


### Implementation Details/Notes/Constraints

CRI-O's GOMAXPROCS injection feature (https://github.com/cri-o/cri-o/pull/9961) sets a minimum floor value via the `min_injected_gomaxprocs` configuration option. When set to `1`, CRI-O injects GOMAXPROCS as `max(ceil(cpu_request_in_cores) * 2, 1)` for each container.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we point this at the base feature PR? #9961 is the later "run hook if pod is in workload partition" change; the min_injected_gomaxprocs option itself landed in cri-o/cri-o#9860. Also, the option reads as a floor rather than an enable flag: the hook is only wired in when the value is greater than 0, and that same value becomes the fallback floor in the calculation. Would it read more clearly to say injection runs whenever it is greater than 0, with 1 setting the floor to 1?


**Behavior for edge cases:**
- Container with no CPU request: CRI-O does not inject GOMAXPROCS (Go default behavior applies)
- Container with CPU limit but no explicit request: Kubernetes defaults the request to equal the limit, so CRI-O will inject based on that implicit request value

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks inverted on master. When a container has a CPU limit (quota greater than 0) the hook skips injection entirely, the stated reason being that Go 1.25+ already derives GOMAXPROCS from the cgroup quota. Could we flip this bullet? It might also be worth noting the Go 1.25 behavior in Motivation, since it narrows the scope to burstable requests without a limit.

**Behavior for edge cases:**
- Container with no CPU request: CRI-O does not inject GOMAXPROCS (Go default behavior applies)
- Container with CPU limit but no explicit request: Kubernetes defaults the request to equal the limit, so CRI-O will inject based on that implicit request value
- Container with CPU request < 500m: Receives `GOMAXPROCS=2` (minimum from formula: ceil(cpu < 1.0) = 1, doubled = 2)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the double then ceil formula, requests below 0.5 cores resolve to 1 rather than 2. The unit tests show this directly: 250m (shares=256), floor=1 returns 1. The same holds at 500m: the calculated value is 1 before the floor applies, visible in 500m (shares=512), floor=4 returns 4 (with a floor of 1 it returns 1). Could we adjust this bullet to match?


- Language detection or runtime-specific injection (CRI-O injects GOMAXPROCS for all containers; non-Go applications ignore it)
- Dynamic adjustment of GOMAXPROCS based on runtime CPU throttling or actual usage
- Per-pod or per-container override mechanisms (this is cluster-level configuration; users can override via pod spec environment variables)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cri-o does ship a per-pod opt out via the skip-gomaxprocs.crio.io annotation, which skips injection for that pod even when enabled globally. The hook checks it first in PreCreate. Would it be worth mentioning here so readers know a pod level escape hatch exists?

@haircommander haircommander force-pushed the gomaxprocs-injection branch from 04a98c3 to f16f0da Compare June 26, 2026 16:41

@harche harche left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick turnaround, the formula and edge cases read accurately now. On a fresh pass I noticed the old "no CPU request means no injection" assumption still lingers in a couple of sections, plus a few scope and wording points. Links to the relevant CRI-O code and tests are inline. As before, happy to be corrected if I am misreading anything.

- Deploy ContainerRuntimeConfig with injection enabled, verify GOMAXPROCS is set in containers with CPU requests
- Deploy ContainerRuntimeConfig with injection disabled, verify GOMAXPROCS is not injected
- Verify containers with explicit GOMAXPROCS environment variable use their own value (not injected)
- Verify containers without CPU requests do not receive GOMAXPROCS injection

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This e2e check looks like it now conflicts with the corrected edge case behavior. Lines 76 and 123 say best-effort pods (no CPU request) receive the floor value, and CRI-O agrees: best-effort (shares=2) returns the floor, and injecting a value like 1 is exercised. Should this verify that no-request pods receive the floor rather than no injection?


- **GOMAXPROCS not injected**: Check CPU request is set on container, verify feature gate is enabled, check CRI-O config
- **Wrong GOMAXPROCS value**: Verify CPU request matches expected value, check for explicit GOMAXPROCS in pod spec overriding injection
- **Container using default GOMAXPROCS**: Likely no CPU request set, or explicit override in pod/image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same point as the test plan note: a best-effort pod with no CPU request still gets the floor injected (shares=2 returns the floor), so "no CPU request set" would not explain a container running on Go defaults. Could we drop or reword this bullet?


### Non-Goals

- Language detection or runtime-specific injection (CRI-O injects GOMAXPROCS for all containers; non-Go applications ignore it)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small accuracy point: CRI-O does not inject into all containers. It only runs for burstable and best-effort pods (cgroup gate) that have no CPU limit (quota greater than 0 is skipped) and no pre-existing GOMAXPROCS. Guaranteed and limited pods are excluded. Could we soften "all containers" here?


### Drawbacks

- **Universal injection may be unexpected**: CRI-O injects GOMAXPROCS into all containers when enabled, regardless of whether they are Go applications. Non-Go containers simply ignore the environment variable, but this may confuse users inspecting container environments.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same "all containers" wording as in Non-Goals. Injection is limited to burstable and best-effort pods without a CPU limit, see the cgroup gate and the limit skip. Worth aligning this line with that scope?


**Behavior for edge cases:**
- Container with no CPU request (best-effort): Receives the floor value directly (`GOMAXPROCS=1`)
- Container with CPU limit but no explicit request: CRI-O **skips injection entirely**. Go 1.25+ automatically derives GOMAXPROCS from the cgroup CPU quota, making injection redundant for limited containers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we generalize this? On master any CPU limit skips injection (quota greater than 0 returns early), not only the "limit but no explicit request" case. So a normal burstable pod that sets both a request and a limit is skipped too, which is a large share of real workloads. It might also help to state explicitly that Guaranteed pods are excluded, since they hit both the limit skip and the burstable or besteffort gate. Relatedly, would it be worth adding "no CPU limit" to the examples on lines 73 to 76 so readers do not expect injection when a limit is present?


**Common troubleshooting scenarios**:

- **GOMAXPROCS not injected**: Check CPU request is set on container, verify feature gate is enabled, check CRI-O config

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For "GOMAXPROCS not injected", could we add the two most likely causes on master: the container has a CPU limit, or the pod is Guaranteed QoS. Both skip injection even when the feature is enabled, so a CPU request alone is not sufficient.


### Open Questions

None.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth listing a couple here? Two come to mind.

First, how this composes with the high-performance and CPU pinning hooks. CRI-O assembles hooks into a list and can attach more than one: a pod on a high-performance runtime handler gets the HighPerformanceHooks, and separately, when min_injected_gomaxprocs > 0, it also gets the GomaxprocsHooks, with both wrapped in CompositeHooks. The CompositeHooks doc calls out the exact overlap: a burstable pod on a high-performance handler runs both. So once injection is on cluster wide, a burstable no-limit pod on that path gets GOMAXPROCS set to ceil(2 * request), which can exceed the cpuset it is actually pinned to, and the GOMAXPROCS skip is decided on the CPU quota, which the high-performance hook (running first in the chain) may itself modify. It might be fine, or the answer might be to skip injection on high-performance runtimes, but it seems worth reasoning about rather than leaving implicit.

Second, the GA default flip to Inject (line 211). With Inject as the platform default, every cluster that does not set Disabled starts injecting into every eligible burstable or best-effort workload on upgrade. A Go service that relies on the default GOMAXPROCS = node CPU count would silently switch to ceil(2 * request). The Risks section argues this is safe because an explicit GOMAXPROCS wins, but the workloads that get changed are exactly the ones that do not set it, so on by default at GA feels like a decision worth recording here rather than assuming.

)
```

The field is optional. When omitted, the platform chooses a reasonable default (currently `Inject`).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional and non-blocking: since the API is an Inject or Disabled enum that maps to min_injected_gomaxprocs 1 or 0, the integer floor is hidden and fixed at 1. That seems reasonable given the Alternatives section, but an enum cannot carry a value later, so if a configurable floor or multiplier is ever wanted the API would need a new field. Would a sentence on why the enum is preferred over an optional integer be worth adding?

@haircommander haircommander force-pushed the gomaxprocs-injection branch from f16f0da to 769e98c Compare June 26, 2026 19:49
Signed-off-by: Peter Hunt <pehunt@redhat.com>
@haircommander haircommander force-pushed the gomaxprocs-injection branch from 769e98c to f22b5ca Compare June 26, 2026 20:31
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@haircommander: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants