✨ B4/Trustees > Add missing live/readiness probes#2673
Open
yuvalkom-M wants to merge 7 commits into
Open
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Adds missing HTTP live/readiness probe endpoints for trustee-related services and refactors probe checks to reuse shared helpers in sequent-core, improving operational observability for deployments tied to meta issue #6677.
Changes:
- Refactor Windmill readiness checks to use shared
sequent-coreprobe helpers (PostgresSELECT 1, S3 JWKS fetch). - Add new probe HTTP servers to
b3andbraid(readiness checks dependencies;braidadditionally checks disk headroom). - Introduce
sequent-core::services::setup_probemodule and expandprobefeature dependencies accordingly.
Reviewed changes
Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/windmill/src/services/probe.rs | Updates DB/S3 readiness checks to use new shared helpers (but currently contains a compile-breaking bug). |
| packages/sequent-core/src/services/setup_probe.rs | New shared probe helper functions for Postgres SELECT 1 and S3 JWKS retrieval. |
| packages/sequent-core/src/services/mod.rs | Exposes the new setup_probe module under the probe feature. |
| packages/sequent-core/Cargo.toml | Extends probe feature to include dependencies needed by setup_probe. |
| packages/Cargo.lock | Updates lockfile for new/adjusted dependencies (e.g., warp, sequent-core). |
| packages/braid/src/probe.rs | Adds a Warp-based probe server with readiness checks for B3 reachability and disk space. |
| packages/braid/src/lib.rs | Exports the new probe module. |
| packages/braid/src/bin/main.rs | Starts the braid probe server during startup. |
| packages/braid/src/bin/main_m.rs | Starts the braid probe server during startup (and removes redundant store_root init). |
| packages/braid/Dockerfile | Copies sequent-core into the build context (needed due to new optional path dep in workspace). |
| packages/braid/Cargo.toml | Adds warp and libc dependencies for the new probe implementation. |
| packages/b3/src/probe.rs | Adds a Warp-based probe server with readiness checks for Postgres and B3 index reachability. |
| packages/b3/src/lib.rs | Exposes probe module behind the probe feature. |
| packages/b3/src/bin/server.rs | Starts the B3 probe server when running in server mode. |
| packages/b3/Cargo.toml | Introduces probe feature and dependencies (warp, optional sequent-core). |
| .devcontainer/.env.development | Adds dev env variables for B3 and braid probe configuration. |
Comments suppressed due to low confidence (2)
packages/b3/src/probe.rs:200
/liveis currently wired to the fullreadiness(...)check (PostgreSQL + boards). In Kubernetes this can cause unnecessary restarts during transient dependency outages; typically liveness should only assert the process is running/serving, while readiness gates traffic on dependencies. Consider makingset_livealways returntrue(or only check internal invariants) and keep dependency checks in/ready.
if let Ok(addr) = addr {
let ph = ProbeHandler::new(&live_path, &ready_path, addr);
let f = ph.future();
let pg0 = pg.clone();
ph.set_live(move || {
let pg = pg0.clone();
Box::pin(async move { readiness(pg).await })
})
.await;
let pg1 = pg;
ph.set_ready(move || {
let pg = pg1.clone();
Box::pin(async move { readiness(pg).await })
})
packages/braid/src/probe.rs:264
/liveis currently set to run the full readiness check (B3 connectivity + disk headroom). If the B3 endpoint is temporarily unreachable, Kubernetes liveness failures can trigger restarts even though the process is healthy. Consider making/livea lightweight self-check (always true once started) and reserve dependency checks for/ready.
if let Ok(addr) = addr {
let ph = ProbeHandler::new(&live_path, &ready_path, addr);
let f = ph.future();
let url0 = b3_url.clone();
let root0 = store_root.clone();
let min0 = min_free_bytes;
ph.set_live(move || {
let url = url0.clone();
let root = root0.clone();
let min = min0;
Box::pin(async move { readiness(url, root, min).await })
})
.await;
let url1 = b3_url;
let root1 = store_root;
let min1 = min_free_bytes;
ph.set_ready(move || {
let url = url1.clone();
let root = root1.clone();
let min = min1;
Box::pin(async move { readiness(url, root, min).await })
})
.await;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+117
to
+118
| Ok(config) => { | ||
| Some(check_postgres_select_one_no_tls(&cfg).await)}, |
Comment on lines
+24
to
+113
| pub struct ProbeHandler { | ||
| address: SocketAddr, | ||
| live_path: String, | ||
| ready_path: String, | ||
| is_live: Arc< | ||
| Mutex< | ||
| Box< | ||
| dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>> | ||
| + Send | ||
| + Sync, | ||
| >, | ||
| >, | ||
| >, | ||
| is_ready: Arc< | ||
| Mutex< | ||
| Box< | ||
| dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>> | ||
| + Send | ||
| + Sync, | ||
| >, | ||
| >, | ||
| >, | ||
| } | ||
|
|
||
| impl ProbeHandler { | ||
| pub fn new(live_path: &str, ready_path: &str, address: impl Into<SocketAddr>) -> ProbeHandler { | ||
| ProbeHandler { | ||
| address: address.into(), | ||
| live_path: live_path.to_string(), | ||
| ready_path: ready_path.to_string(), | ||
| is_live: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))), | ||
| is_ready: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))), | ||
| } | ||
| } | ||
|
|
||
| pub fn future(&self) -> impl Future<Output = ()> { | ||
| let il = Arc::clone(&self.is_live); | ||
| let ir = Arc::clone(&self.is_ready); | ||
|
|
||
| let filter = warp::get().and( | ||
| warp::path(self.live_path.to_string()) | ||
| .and_then(move || { | ||
| let il = Arc::clone(&il); | ||
| async move { | ||
| let is_live = il.lock().await; | ||
| let is_live_future = is_live(); | ||
| if is_live_future.await { | ||
| Ok::<_, warp::Rejection>( | ||
| Response::builder() | ||
| .status(warp::http::StatusCode::OK) | ||
| .body("Live") | ||
| .unwrap(), | ||
| ) | ||
| } else { | ||
| Ok::<_, warp::Rejection>( | ||
| Response::builder() | ||
| .status(warp::http::StatusCode::BAD_REQUEST) | ||
| .body("Not live") | ||
| .unwrap(), | ||
| ) | ||
| } | ||
| } | ||
| }) | ||
| .or(warp::path(self.ready_path.to_string()).and_then(move || { | ||
| let ir = Arc::clone(&ir); | ||
| async move { | ||
| let is_ready = ir.lock().await; | ||
| let is_ready_future = is_ready(); | ||
| if is_ready_future.await { | ||
| Ok::<_, warp::Rejection>( | ||
| Response::builder() | ||
| .status(warp::http::StatusCode::OK) | ||
| .body("Ready") | ||
| .unwrap(), | ||
| ) | ||
| } else { | ||
| Ok::<_, warp::Rejection>( | ||
| Response::builder() | ||
| .status(warp::http::StatusCode::BAD_REQUEST) | ||
| .body("Not ready") | ||
| .unwrap(), | ||
| ) | ||
| } | ||
| } | ||
| })), | ||
| ); | ||
|
|
||
| warp::serve(filter).bind(self.address) | ||
| } | ||
|
|
Comment on lines
+44
to
+80
| pub struct ProbeHandler { | ||
| address: SocketAddr, | ||
| live_path: String, | ||
| ready_path: String, | ||
| is_live: Arc< | ||
| Mutex< | ||
| Box< | ||
| dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>> | ||
| + Send | ||
| + Sync, | ||
| >, | ||
| >, | ||
| >, | ||
| is_ready: Arc< | ||
| Mutex< | ||
| Box< | ||
| dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>> | ||
| + Send | ||
| + Sync, | ||
| >, | ||
| >, | ||
| >, | ||
| } | ||
|
|
||
| impl ProbeHandler { | ||
| pub fn new(live_path: &str, ready_path: &str, address: impl Into<SocketAddr>) -> ProbeHandler { | ||
| ProbeHandler { | ||
| address: address.into(), | ||
| live_path: live_path.to_string(), | ||
| ready_path: ready_path.to_string(), | ||
| is_live: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))), | ||
| is_ready: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))), | ||
| } | ||
| } | ||
|
|
||
| pub fn future(&self) -> impl Future<Output = ()> { | ||
| let il = Arc::clone(&self.is_live); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
https://github.com/sequentech/meta/issues/6677