Python implementation of COMBSS (Continuous Optimisation for Best Subset Selection) for generalised linear models. COMBSS reformulates the NP-hard discrete subset selection problem as a continuous optimisation over the hypercube [0,1]^p, making it scalable to high-dimensional settings with p >> n.
Supported model types:
- Linear regression (continuous response)
- Binary logistic regression (two-class classification)
- Multinomial logistic regression (multi-class, C > 2)
Moka, Liquet, Zhu & Muller (2024). COMBSS: best subset selection via continuous optimization. Statistics and Computing.
Mathur, Liquet, Muller & Moka (2026). Parsimonious Subset Selection for Generalized Linear Models with Biomedical Applications. arXiv preprint.
pip install combssimport combss
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=1000, n_features=50, n_informative=5,
noise=0.1, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.4)
# Frank-Wolfe method (default) — with validation data
model = combss.linear.model()
model.fit(X_train, y_train, X_val=X_val, y_val=y_val, q=10)
print("Best subset:", model.subset)
print("Best MSE:", model.mse)
# All subsets for k = 1, ..., q
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")Without validation data, subset_list is still available
but subset, mse, and coef_ will be None:
model = combss.linear.model()
model.fit(X_train, y_train, q=10)
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")To use the original Adam + dynamic-lambda method from v1.x:
model = combss.linear.model()
model.fit(X_train, y_train, X_val=X_val, y_val=y_val, q=10, method='original')
print("Best subset:", model.subset)
print("Validation MSE:", model.mse)
print("Optimal lambda:", model.lambda_)import combss
# X_train, X_val : (n, p) feature matrices
# y_train, y_val : binary labels {0, 1}
model = combss.logistic.model()
model.fit(X_train, y_train, X_val=X_val, y_val=y_val, q=15)
print("Best subset:", model.subset)
print("Best accuracy:", model.accuracy)
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")import combss
import numpy as np
import pandas as pd
# Example with the Khan SRBCT dataset (4 tumour classes, 2308 genes)
train = pd.read_csv('data/Khan_train.csv')
test = pd.read_csv('data/Khan_test.csv')
y_train = train.iloc[:, 0].values.astype(int)
X_train = train.iloc[:, 1:].values.astype(float)
y_test = test.iloc[:, 0].values.astype(int)
X_test = test.iloc[:, 1:].values.astype(float)
C = len(np.unique(y_train))
model = combss.multinomial.model()
model.fit(X_train, y_train, X_val=X_test, y_val=y_test, q=20, C=C)
print("Best subset:", model.subset)
print("Best accuracy:", model.accuracy)
for k, feat in enumerate(model.subset_list, 1):
print(f"k={k:2d} features={feat.tolist()}")Select the ridge penalty using leave-one-out cross-validation:
import combss
# Classification
best_lam, best_lam_per_k, cv_df = combss.cv.select_lambda(
X_train, y_train,
q=15, C=4,
model_type='multinomial',
)
# Linear regression
best_lam, best_lam_per_k, cv_df = combss.cv.select_lambda(
X_train, y_train,
q=15,
model_type='linear',
)Best subset selection for linear regression.
model = combss.linear.model()
model.fit(X_train, y_train, q=10) # FW (default), no validation
model.fit(X_train, y_train, X_val, y_val, q=10) # FW with validation
model.fit(X_train, y_train, X_val, y_val, q=10, # Original method
method='original')Sparsity is controlled by k (model size): COMBSS returns selected features for
each k = 1, ..., q. The lam_ridge parameter is an optional ridge regularisation
on the coefficients in the inner solver, unrelated to the sparsity penalty lambda
used in the original method.
| Parameter | Default | Description |
|---|---|---|
q |
min(n,p) | Maximum subset size |
Niter |
25 | Number of homotopy iterations |
lam_ridge |
0 | Ridge penalty on coefficients in the inner solver |
alpha |
0.01 | Frank-Wolfe step size |
scale |
True | Column-normalise X |
mandatory_features |
None | 1-indexed features always included |
inner_tol |
1e-4 | Inner solver tolerance |
patience |
20 | Early stopping: stop after this many k with no improvement. Requires validation data. Set to None to disable |
min_k |
20 | Minimum k values before early stopping can trigger. Automatically disabled if min_k + patience > q |
verbose |
True | Print progress |
Sparsity is controlled by lambda: a grid of lambda values is searched, and each
lambda yields a different subset. The best subset is selected by validation MSE.
| Parameter | Default | Description |
|---|---|---|
q |
min(n,p) | Maximum subset size |
nlam |
50 | Number of lambda values in dynamic grid |
scaling |
True | Enable feature scaling |
tau |
0.5 | Threshold parameter |
delta_frac |
1 | n/delta in objective function |
eta |
0.001 | Truncation parameter |
gd_patience |
10 | Patience for Adam termination |
gd_maxiter |
1000 | Maximum Adam iterations |
gd_tol |
1e-5 | Adam convergence tolerance |
cg_maxiter |
n | Conjugate gradient max iterations |
cg_tol |
1e-5 | Conjugate gradient tolerance |
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed). Requires validation data, otherwise None |
coef_ |
Regression coefficients (length p, zeros for unselected). Requires validation data, otherwise None |
mse |
Validation MSE for best subset. Requires validation data, otherwise None |
lam_ridge |
Ridge penalty value used |
subset_list |
List of subsets for k = 1, ..., q (0-indexed). May be shorter if early stopping triggered |
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed) |
coef_ |
Regression coefficients (length p, zeros for unselected) |
mse |
Validation MSE for best subset |
lambda_ |
Optimal lambda value |
subset_list |
List of subsets across the lambda grid (0-indexed) |
lambda_list |
Lambda grid values |
Best subset selection for binary logistic regression.
model = combss.logistic.model()
model.fit(X_train, y_train, q=15) # Without validation
model.fit(X_train, y_train, X_val, y_val, q=15) # With validationLabels y must be binary {0, 1}. Parameters are the same as the Frank-Wolfe method
in combss.linear.model.
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed). Requires validation data, otherwise None |
coef_ |
Logistic coefficients (length p, zeros for unselected). Requires validation data, otherwise None |
accuracy |
Validation accuracy for best subset. Requires validation data, otherwise None |
lam_ridge |
Ridge penalty value used |
subset_list |
List of subsets for k = 1, ..., q (0-indexed) |
Best subset selection for multinomial logistic regression.
model = combss.multinomial.model()
model.fit(X_train, y_train, q=20, C=4) # Without validation
model.fit(X_train, y_train, X_val, y_val, q=20, C=4) # With validationLabels y must be in {1, ..., C}. Parameters are the same as the Frank-Wolfe method
in combss.linear.model, plus:
| Additional Parameter | Default | Description |
|---|---|---|
C |
inferred | Number of classes |
| Attribute | Description |
|---|---|
subset |
Best subset (0-indexed). Requires validation data, otherwise None |
coef_ |
Multinomial coefficients (shape (C, p), zeros for unselected). Requires validation data, otherwise None |
accuracy |
Validation accuracy for best subset. Requires validation data, otherwise None |
lam_ridge |
Ridge penalty value used |
subset_list |
List of subsets for k = 1, ..., q (0-indexed) |
LOOCV-based ridge penalty selection.
best_lam, best_lam_per_k, results_df = combss.cv.select_lambda(
X, y, q, C=None, lambda_grid=None, model_type='multinomial', ...
)| Parameter | Default | Description |
|---|---|---|
X |
-- | Feature matrix (n, p), no intercept |
y |
-- | Response / labels |
q |
-- | Maximum subset size |
C |
None | Number of classes (required for classification) |
lambda_grid |
[0] + logspace(-3,1,10) | Candidate ridge penalty values |
Niter |
50 | Homotopy iterations |
model_type |
'multinomial' | 'logit', 'multinomial', or 'linear' |
lambda_refit |
0 | Ridge penalty for LOOCV refit |
Returns: (best_lambda, best_lambda_per_k, results_df)
- Classification:
results_dfhas columnslambda, k, loocv_acc, selected - Linear:
results_dfhas columnslambda, k, loocv_mse, selected
Performance metrics for variable selection evaluation.
result = combss.metrics.performance_metrics(X, beta_true, beta_pred)
# result keys: pe, mcc, acc, sens, spec, f1, precThe Frank-Wolfe homotopy algorithm operates as follows:
- Initialise t = (k/p, ..., k/p) -- centroid of the k-sparse simplex T_k
- For i = 1, ..., N (homotopy loop):
- Set delta_i on a geometric schedule from delta_min to delta_max
- Solve the ridge-penalised GLM inner problem at current t
- Compute the Danskin gradient (no Hessian required)
- Find the Frank-Wolfe vertex (k smallest gradient components)
- Update: t <- (1 - alpha) t + alpha s
- Select the k features with the largest final t values
The penalty schedule is auto-calibrated from the spectral norm of X, so no tuning is needed in practice.
The intercept is handled internally and is not subject to selection. For the Frank-Wolfe method, an intercept column is prepended automatically. For the original method, the intercept is treated the same as other features.
- Python 3.8+
- NumPy (>= 1.21.0)
- SciPy (>= 1.7.0)
- scikit-learn (>= 1.0.0)
- pandas (>= 1.3.0)
- Sarat Moka
- Anant Mathur
- Hua Yang Hu
Contributions are welcome! Please open an issue or submit a pull request at github.com/saratmoka/combss.
Apache 2.0
