Skip to content

otto/prometheus: add --format prometheus to checkup output#84

Open
JoshuaGabriel wants to merge 1 commit into
clyso:mainfrom
JoshuaGabriel:wip-prometheus-output
Open

otto/prometheus: add --format prometheus to checkup output#84
JoshuaGabriel wants to merge 1 commit into
clyso:mainfrom
JoshuaGabriel:wip-prometheus-output

Conversation

@JoshuaGabriel

@JoshuaGabriel JoshuaGabriel commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Add --format prometheus output to checkup command, to be used with prometheus
outputting total score and warn/fail/pass on each section.

$ otto checkup --ceph_report_json ~/collects/ceph-collect/cluster_health-report --format prometheus
# HELP otto_checkup_score Overall checkup score
# TYPE otto_checkup_score gauge
otto_checkup_score 29.5
# HELP otto_checkup_max_score Maximum possible checkup score
# TYPE otto_checkup_max_score gauge
otto_checkup_max_score 34
# HELP otto_checkup_section_score Per-section checkup score
# TYPE otto_checkup_section_score gauge
otto_checkup_section_score{section="Cluster"} 0.5
otto_checkup_section_score{section="Version"} 2.5
otto_checkup_section_score{section="Operating System"} 1.0
otto_checkup_section_score{section="Capacity"} 1.0
otto_checkup_section_score{section="Pools"} 8.0
otto_checkup_section_score{section="CephFS"} 0.5
otto_checkup_section_score{section="MON Health"} 3.0
otto_checkup_section_score{section="OSD Health"} 13.0
otto_checkup_section_score{section="Configuration"} 0
# HELP otto_checkup_check_status Check status (0=PASS, 1=WARN, 2=UNKNOWN, 3=FAIL)
# TYPE otto_checkup_check_status gauge
otto_checkup_check_status{section="Cluster",check="Health"} 1
otto_checkup_check_status{section="Version",check="Release"} 1
otto_checkup_check_status{section="Version",check="Check for Known Issues in Running Version"} 0
otto_checkup_check_status{section="Version",check="Mixing Ceph Versions"} 0
otto_checkup_check_status{section="Operating System",check="OS Support"} 0
otto_checkup_check_status{section="Capacity",check="Check Cluster Capacity Fullness"} 0
otto_checkup_check_status{section="Pools",check="Recommended Flags"} 3
otto_checkup_check_status{section="Pools",check="Pool Sizing"} 0
otto_checkup_check_status{section="Pools",check="Pool Autoscale Mode"} 0
otto_checkup_check_status{section="Pools",check="Minimum PG Count"} 0
otto_checkup_check_status{section="Pools",check="Pool CRUSH Failure Domain Buckets"} 0
otto_checkup_check_status{section="Pools",check="Zero weight buckets in CRUSH Tree"} 0
otto_checkup_check_status{section="Pools",check="CRUSH Tree Balanced"} 0
otto_checkup_check_status{section="Pools",check="Pool Average Object Size"} 0
otto_checkup_check_status{section="Pools",check="Pool Space Amplification"} 3
otto_checkup_check_status{section="Pools",check="Cache Tiering"} 0
otto_checkup_check_status{section="CephFS",check="Multi-MDS Safety"} 1
otto_checkup_check_status{section="MON Health",check="Monitor Committed Maps"} 0
otto_checkup_check_status{section="MON Health",check="Number of Monitors"} 0
otto_checkup_check_status{section="MON Health",check="Even Number of Monitors"} 0
otto_checkup_check_status{section="OSD Health",check="Check osdmap flags"} 0
otto_checkup_check_status{section="OSD Health",check="Check require_osd_release flag"} 0
otto_checkup_check_status{section="OSD Health",check="Check OSD Primary Affinity"} 0
otto_checkup_check_status{section="OSD Health",check="Check OSD Weights"} 3
otto_checkup_check_status{section="OSD Health",check="Check osdmap pg_upmap list"} 0
otto_checkup_check_status{section="OSD Health",check="Check BlueFS DB/Journal is on Flash"} 0
otto_checkup_check_status{section="OSD Health",check="Check OSD bluefs db size"} 0
otto_checkup_check_status{section="OSD Health",check="Check OSD bluefs wal size"} 0
otto_checkup_check_status{section="OSD Health",check="OSD bluestore min_alloc_size"} 0
otto_checkup_check_status{section="OSD Health",check="OSD host memory"} 0
otto_checkup_check_status{section="OSD Health",check="OSD host swap"} 0
otto_checkup_check_status{section="OSD Health",check="Check number of osdmaps stored"} 0
otto_checkup_check_status{section="OSD Health",check="Check CRUSH Tunables"} 0
otto_checkup_check_status{section="OSD Health",check="Dedicated Cluster Network"} 0

Signed-off-by: Joshua Blanch <joshua.blanch@clyso.com>
@JoshuaGabriel JoshuaGabriel requested a review from sam0044 June 10, 2026 21:30
@JoshuaGabriel

Copy link
Copy Markdown
Collaborator Author

@sam0044 maybe we have a json dashboard file that can be imported to grafana after this is setup

Comment thread README.md

```bash
# /etc/cron.d/otto-checkup — refresh metrics every 15 minutes
*/15 * * * * root otto cluster checkup --format prometheus > /var/lib/node_exporter/otto.prom.$$ && mv /var/lib/node_exporter/otto.prom.$$ /var/lib/node_exporter/otto.prom

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/cluster/

also maybe should give the relevant cephadm path ?

@JoshuaGabriel

Copy link
Copy Markdown
Collaborator Author

cephadm instructions for textfile and prometheus

  service_type: node-exporter
  placement:
    host_pattern: '*'
  extra_entrypoint_args:
    - "--collector.textfile.directory=/var/lib/node_exporter/textfile_collector"
  extra_container_args:
    - "-v"
    - "/var/lib/node_exporter/textfile_collector:/var/lib/node_exporter/textfile_collector:z"
    
 $ ceph orch apply -i node-exporter.yaml   

# need to dump the textfile otto generates to the chosen directory and prometheus will pick it up automagically 
  mkdir -p /var/lib/node_exporter/textfile_collector
  DIR=/var/lib/node_exporter/textfile_collector
  otto checkup --format prometheus > $DIR/otto.prom.$$ && mv $DIR/otto.prom.$$ $DIR/otto.prom

    

@JoshuaGabriel

Copy link
Copy Markdown
Collaborator Author

example in grafana
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant