Skip to content

✨ B4/Trustees > Add missing live/readiness probes#2673

Open
yuvalkom-M wants to merge 7 commits into
mainfrom
feat/meta-6677/main
Open

✨ B4/Trustees > Add missing live/readiness probes#2673
yuvalkom-M wants to merge 7 commits into
mainfrom
feat/meta-6677/main

Conversation

@yuvalkom-M
Copy link
Copy Markdown
Contributor

@yuvalkom-M yuvalkom-M marked this pull request as ready for review May 18, 2026 07:31
Copilot AI review requested due to automatic review settings May 18, 2026 07:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing HTTP live/readiness probe endpoints for trustee-related services and refactors probe checks to reuse shared helpers in sequent-core, improving operational observability for deployments tied to meta issue #6677.

Changes:

  • Refactor Windmill readiness checks to use shared sequent-core probe helpers (Postgres SELECT 1, S3 JWKS fetch).
  • Add new probe HTTP servers to b3 and braid (readiness checks dependencies; braid additionally checks disk headroom).
  • Introduce sequent-core::services::setup_probe module and expand probe feature dependencies accordingly.

Reviewed changes

Copilot reviewed 15 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
packages/windmill/src/services/probe.rs Updates DB/S3 readiness checks to use new shared helpers (but currently contains a compile-breaking bug).
packages/sequent-core/src/services/setup_probe.rs New shared probe helper functions for Postgres SELECT 1 and S3 JWKS retrieval.
packages/sequent-core/src/services/mod.rs Exposes the new setup_probe module under the probe feature.
packages/sequent-core/Cargo.toml Extends probe feature to include dependencies needed by setup_probe.
packages/Cargo.lock Updates lockfile for new/adjusted dependencies (e.g., warp, sequent-core).
packages/braid/src/probe.rs Adds a Warp-based probe server with readiness checks for B3 reachability and disk space.
packages/braid/src/lib.rs Exports the new probe module.
packages/braid/src/bin/main.rs Starts the braid probe server during startup.
packages/braid/src/bin/main_m.rs Starts the braid probe server during startup (and removes redundant store_root init).
packages/braid/Dockerfile Copies sequent-core into the build context (needed due to new optional path dep in workspace).
packages/braid/Cargo.toml Adds warp and libc dependencies for the new probe implementation.
packages/b3/src/probe.rs Adds a Warp-based probe server with readiness checks for Postgres and B3 index reachability.
packages/b3/src/lib.rs Exposes probe module behind the probe feature.
packages/b3/src/bin/server.rs Starts the B3 probe server when running in server mode.
packages/b3/Cargo.toml Introduces probe feature and dependencies (warp, optional sequent-core).
.devcontainer/.env.development Adds dev env variables for B3 and braid probe configuration.
Comments suppressed due to low confidence (2)

packages/b3/src/probe.rs:200

  • /live is currently wired to the full readiness(...) check (PostgreSQL + boards). In Kubernetes this can cause unnecessary restarts during transient dependency outages; typically liveness should only assert the process is running/serving, while readiness gates traffic on dependencies. Consider making set_live always return true (or only check internal invariants) and keep dependency checks in /ready.
    if let Ok(addr) = addr {
        let ph = ProbeHandler::new(&live_path, &ready_path, addr);
        let f = ph.future();
        let pg0 = pg.clone();
        ph.set_live(move || {
            let pg = pg0.clone();
            Box::pin(async move { readiness(pg).await })
        })
        .await;

        let pg1 = pg;
        ph.set_ready(move || {
            let pg = pg1.clone();
            Box::pin(async move { readiness(pg).await })
        })

packages/braid/src/probe.rs:264

  • /live is currently set to run the full readiness check (B3 connectivity + disk headroom). If the B3 endpoint is temporarily unreachable, Kubernetes liveness failures can trigger restarts even though the process is healthy. Consider making /live a lightweight self-check (always true once started) and reserve dependency checks for /ready.
    if let Ok(addr) = addr {
        let ph = ProbeHandler::new(&live_path, &ready_path, addr);
        let f = ph.future();
        let url0 = b3_url.clone();
        let root0 = store_root.clone();
        let min0 = min_free_bytes;
        ph.set_live(move || {
            let url = url0.clone();
            let root = root0.clone();
            let min = min0;
            Box::pin(async move { readiness(url, root, min).await })
        })
        .await;

        let url1 = b3_url;
        let root1 = store_root;
        let min1 = min_free_bytes;
        ph.set_ready(move || {
            let url = url1.clone();
            let root = root1.clone();
            let min = min1;
            Box::pin(async move { readiness(url, root, min).await })
        })
        .await;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/windmill/src/services/probe.rs Outdated
Comment on lines +117 to +118
Ok(config) => {
Some(check_postgres_select_one_no_tls(&cfg).await)},
Comment thread packages/b3/src/probe.rs Outdated
Comment on lines +24 to +113
pub struct ProbeHandler {
address: SocketAddr,
live_path: String,
ready_path: String,
is_live: Arc<
Mutex<
Box<
dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>>
+ Send
+ Sync,
>,
>,
>,
is_ready: Arc<
Mutex<
Box<
dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>>
+ Send
+ Sync,
>,
>,
>,
}

impl ProbeHandler {
pub fn new(live_path: &str, ready_path: &str, address: impl Into<SocketAddr>) -> ProbeHandler {
ProbeHandler {
address: address.into(),
live_path: live_path.to_string(),
ready_path: ready_path.to_string(),
is_live: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))),
is_ready: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))),
}
}

pub fn future(&self) -> impl Future<Output = ()> {
let il = Arc::clone(&self.is_live);
let ir = Arc::clone(&self.is_ready);

let filter = warp::get().and(
warp::path(self.live_path.to_string())
.and_then(move || {
let il = Arc::clone(&il);
async move {
let is_live = il.lock().await;
let is_live_future = is_live();
if is_live_future.await {
Ok::<_, warp::Rejection>(
Response::builder()
.status(warp::http::StatusCode::OK)
.body("Live")
.unwrap(),
)
} else {
Ok::<_, warp::Rejection>(
Response::builder()
.status(warp::http::StatusCode::BAD_REQUEST)
.body("Not live")
.unwrap(),
)
}
}
})
.or(warp::path(self.ready_path.to_string()).and_then(move || {
let ir = Arc::clone(&ir);
async move {
let is_ready = ir.lock().await;
let is_ready_future = is_ready();
if is_ready_future.await {
Ok::<_, warp::Rejection>(
Response::builder()
.status(warp::http::StatusCode::OK)
.body("Ready")
.unwrap(),
)
} else {
Ok::<_, warp::Rejection>(
Response::builder()
.status(warp::http::StatusCode::BAD_REQUEST)
.body("Not ready")
.unwrap(),
)
}
}
})),
);

warp::serve(filter).bind(self.address)
}

Comment thread packages/braid/src/probe.rs Outdated
Comment on lines +44 to +80
pub struct ProbeHandler {
address: SocketAddr,
live_path: String,
ready_path: String,
is_live: Arc<
Mutex<
Box<
dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>>
+ Send
+ Sync,
>,
>,
>,
is_ready: Arc<
Mutex<
Box<
dyn Fn() -> std::pin::Pin<Box<dyn std::future::Future<Output = bool> + Send>>
+ Send
+ Sync,
>,
>,
>,
}

impl ProbeHandler {
pub fn new(live_path: &str, ready_path: &str, address: impl Into<SocketAddr>) -> ProbeHandler {
ProbeHandler {
address: address.into(),
live_path: live_path.to_string(),
ready_path: ready_path.to_string(),
is_live: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))),
is_ready: Arc::new(Mutex::new(Box::new(|| Box::pin(async { false })))),
}
}

pub fn future(&self) -> impl Future<Output = ()> {
let il = Arc::clone(&self.is_live);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants