Skip to content

Handle graceful shutdown in Kubernetes for seamless, no downtime upgrades #128

Description

@emanueleaina

Our runners often deal with very long running jobs that may even take 24h to complete.

For that reason, when deploying a new versions it is not reasonable to manually pause the runner and wait for all the jobs to complete before updating and unpausing. Doing so may require keeping the runner paused while a job takes many hours to complete, preventing other jobs from being processed in the meantime.

Kubernetes actually provides a way to handle this situation automatically via graceful shutdowns. In short, we can set a very long terminationGracePeriodSeconds value and, when getting SIGTERM from Kubernetes, stop polling for new jobs and exit once the last running job completes.

Once all the right APIs and examples are available in gitlab-runner-rs (see collabora/gitlab-runner-rs#129) we need to set a large terminationGracePeriodSeconds by default (letting people customize it) and wire up all the right logic to drain jobs on termination, providing seamless, no downtime upgrades.

See also collabora/obs-gitlab-runner#125

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions