tools: add pg backfill_toofull fix by JoshuaGabriel · Pull Request #51 · clyso/otto

JoshuaGabriel · 2025-11-04T06:59:24Z

reads the failure domain of a pool then upmaps the backfill_toofull pg into another OSD based on %utilization

Signed-off-by: Joshua Blanch <joshua.blanch@clyso.com>

yzhan298 · 2025-11-04T20:12:06Z

+```
+
+Problem:
+Usually when a node goes down or when draining capacity, there are some OSDs that become nearfull and eventually can lead to PGs being backfill_toofull warning pops up.


Usually when a node goes down or when draining capacity seems broken. Should it be Usually when a node goes down or when draining with limited capacity?

sam0044

this is a nice script to have. I would rephrase the problem statement to be a bit more clear. Outside that, the logic looks pretty solid.

JoshuaGabriel · 2025-11-06T19:58:42Z

actually I don't think this will take into account device class for the crush rule, only tried this on all nvme cluster. If there were mixed hdd/ssd it could create an upmap to one outside its device class

dvanders · 2026-05-21T01:24:46Z

Tried this today on a Pacific cluster with some 6+3 PGs in backfill_toofull.
It did not output any upmaps.

dvanders · 2026-05-21T01:24:58Z

@bstillwell ^

JoshuaGabriel · 2026-05-21T16:28:37Z

+        if ("nearfull" in st) or ("backfillfull" in st):
+            flagged.add(oid)


@bstillwell @dvanders
if the osds aren't marked as nearfull / backfillfull they aren't flagged taken into account. this may be possible if operator manually changes the ratio warning.

It probably should just check for most utilized here, but I had an assumption that nearfull/backfillfull osds were the most utilized
or
work backwards from pg that is backfill_toofull to the OSD in that is 'full'

can use pg state backfill_toofull to determine the 'full' OSD by looking at the up/acting set

JoshuaGabriel requested a review from sam0044 November 4, 2025 06:59

tools: add pg backfill_toofull fix

9ebe69c

Signed-off-by: Joshua Blanch <joshua.blanch@clyso.com>

JoshuaGabriel force-pushed the toolkit/pg_toofull branch from 189310e to 9ebe69c Compare November 4, 2025 20:10

yzhan298 reviewed Nov 4, 2025

View reviewed changes

sam0044 requested changes Nov 6, 2025

View reviewed changes

JoshuaGabriel commented May 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tools: add pg backfill_toofull fix#51

tools: add pg backfill_toofull fix#51
JoshuaGabriel wants to merge 1 commit into
clyso:mainfrom
JoshuaGabriel:toolkit/pg_toofull

JoshuaGabriel commented Nov 4, 2025

Uh oh!

yzhan298 Nov 4, 2025

Uh oh!

sam0044 left a comment

Uh oh!

JoshuaGabriel commented Nov 6, 2025

Uh oh!

dvanders commented May 21, 2026

Uh oh!

dvanders commented May 21, 2026

Uh oh!

JoshuaGabriel May 21, 2026

Uh oh!

JoshuaGabriel May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		if ("nearfull" in st) or ("backfillfull" in st):
		flagged.add(oid)

Conversation

JoshuaGabriel commented Nov 4, 2025

Uh oh!

yzhan298 Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

sam0044 left a comment

Choose a reason for hiding this comment

Uh oh!

JoshuaGabriel commented Nov 6, 2025

Uh oh!

dvanders commented May 21, 2026

Uh oh!

dvanders commented May 21, 2026

Uh oh!

JoshuaGabriel May 21, 2026

Choose a reason for hiding this comment

Uh oh!

JoshuaGabriel May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants