Handle graceful shutdown in Kubernetes for seamless, no downtime upgrades

Our runners often deal with very long running jobs that may even take 24h to complete.

For that reason, when deploying a new versions it is not reasonable to manually pause the runner and wait for all the jobs to complete before updating and unpausing. Doing so may require keeping the runner paused while a job takes many hours to complete, preventing other jobs from being processed in the meantime.

Kubernetes actually provides a way to handle this situation automatically via [graceful shutdowns](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination). In short, we can set a very long `terminationGracePeriodSeconds` value and, when getting `SIGTERM` from Kubernetes, stop polling for new jobs and exit once the last running job completes.

Once all the right APIs and examples are available in `gitlab-runner-rs` (see https://github.com/collabora/gitlab-runner-rs/issues/129) we need to set a large `terminationGracePeriodSeconds`  by default (letting people customize it) and wire up all the right logic to drain jobs on termination, providing seamless, no downtime upgrades.

See also https://github.com/collabora/obs-gitlab-runner/issues/125

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle graceful shutdown in Kubernetes for seamless, no downtime upgrades #128

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Handle graceful shutdown in Kubernetes for seamless, no downtime upgrades #128

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions