Health check reports tiers 'healthy' while the vLLM engine loop crashes on every request

### Summary
`mlx-stack up`/`status` report a tier as `healthy` even when its engine loop is erroring on **every** inference request. The health probe appears to check only `/v1/models` (which responds even when the generation engine is dead), so a fully-broken server shows green.

### How I hit it
With `continuous_batching` enabled (see #51), the draft tier logged `Engine loop error: ArraysCache...` on every request and all completions hung — yet:
```
draft   qwen3.5-9b                  8000  healthy   3m 46s
judge   qwen3.5-27b-opus-distilled  8001  healthy   3m 38s
litellm proxy                       4000  healthy   3m 31s
```
A user sees all-green and has no signal that inference is 100% broken.

### Suggested fix
Health check should perform a minimal **generation** probe (e.g. `max_tokens: 1`) and require a valid completion before marking a tier healthy — not just a `/v1/models` (or port-open) check. This would also let `up` fail fast on issues like #51.

### Severity
High — it silently masks complete inference failure, which is the hardest class of problem for a user to diagnose.

**Environment**
- mlx-stack 0.3.8
- vllm-mlx v0.2.6
- mlx 0.31.1
- macOS 26.2 (arm64), Apple M4 Pro, 64 GB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Health check reports tiers 'healthy' while the vLLM engine loop crashes on every request #52

Summary

How I hit it

Suggested fix

Severity

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Health check reports tiers 'healthy' while the vLLM engine loop crashes on every request #52

Description

Summary

How I hit it

Suggested fix

Severity

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions