[FEATURE] Cluster Validation Metrics & Automated Group Detection

## Problem

ProtSpace visualizes protein embeddings but provides no quantitative measure of how well biological annotations separate in embedding space, nor any way to automatically discover natural groupings.

## Proposed Solution

### 1. Cluster Validation Metrics

Compute standard clustering quality scores for any selected feature/annotation:

- **Silhouette Score** — cohesion vs. separation per point (higher = better)
- **Davies-Bouldin Index (DBI)** — avg cluster overlap (lower = better)
- **Calinski-Harabasz Index (CH)** — between/within variance ratio (higher = better)
- **ARI / NMI** — agreement with ground-truth labels (external validation)

> **Key:** Metrics should be computed on the **original high-dimensional embeddings**, not on UMAP/t-SNE projections which distort distances.

### 2. Automated Optimal *k* Detection

- **Elbow Method** — inertia vs. *k*, detect the knee
- **Silhouette Analysis** — avg silhouette for *k* = 2..k_max
- **Gap Statistic** — compare to null reference
- **BIC/AIC via GMM** — information criterion minimization

### 3. Auto-Clustering

Run K-means, HDBSCAN, or GMM with the detected *k* and store results as a new feature column in the parquetbundle for immediate visualization.

## CLI Integration
```bash
protspace-local -i embeddings.h5 -o out/ --cluster-metrics --feature pfam
protspace-local -i embeddings.h5 -o out/ --auto-cluster --k-range 2:20
```

**Output:** metrics as JSON/CSV, elbow/silhouette plots as PNG/SVG, new `auto_cluster` feature in the parquetbundle.

## Implementation Notes

- All metrics available via scikit-learn (already a dependency)
- HDBSCAN via `sklearn.cluster.HDBSCAN` (sklearn ≥ 1.3)
- Web UI integration (metrics panel, interactive plots, auto-cluster button) → separate `protspace_web` issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Cluster Validation Metrics & Automated Group Detection #31

Problem

Proposed Solution

1. Cluster Validation Metrics

2. Automated Optimal k Detection

3. Auto-Clustering

CLI Integration

Implementation Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[FEATURE] Cluster Validation Metrics & Automated Group Detection #31

Description

Problem

Proposed Solution

1. Cluster Validation Metrics

2. Automated Optimal k Detection

3. Auto-Clustering

CLI Integration

Implementation Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions