obliquetree

obliquetree is an advanced decision tree implementation designed to provide high-performance and interpretable models. It supports both classification and regression tasks, enabling a wide range of applications. By offering traditional and oblique splits, it ensures flexibility and improved generalization with shallow trees. This makes it a powerful alternative to regular decision trees.

Getting Started

obliquetree combines advanced capabilities with efficient performance. It supports oblique splits, leveraging a custom L-BFGS optimization routine to determine the best linear weights for splits, ensuring both speed and accuracy.

In traditional mode, without oblique splits, obliquetree outperforms scikit-learn in terms of speed and adds support for categorical variables, providing a significant advantage over many traditional decision tree implementations.

When the oblique feature is enabled, obliquetree dynamically selects the optimal split type between oblique and traditional splits. If no weights can be found to reduce impurity, it defaults to an axis-aligned split, ensuring robustness and adaptability in various scenarios.

In very large trees (e.g., depth 10 or more), the performance of obliquetree may converge closely with traditional trees. The true strength of obliquetree lies in their ability to perform exceptionally well at shallower depths, offering improved generalization with fewer splits. Moreover, thanks to linear projections, obliquetree significantly outperform traditional trees when working with datasets that exhibit linear relationships.

Installation

To install obliquetree, use the following pip command:

pip install obliquetree

Using the obliquetree library is simple and intuitive. Here's a more generic example that works for both classification and regression:

from obliquetree import Classifier, Regressor

# Initialize the model (Classifier or Regressor)
model = Classifier(  # Replace "Classifier" with "Regressor" if performing regression
    use_oblique=True,       # Enable oblique splits
    max_depth=2,            # Set the maximum depth of the tree
    n_pair=2,               # Number of feature pairs for optimization
    top_k=None,             # Optional: keep the default oblique feature screening heuristic
    random_state=42,        # Set a random state for reproducibility
    categories=[0, 10, 32], # Specify which features are categorical
)

# Train the model on the training dataset
model.fit(X_train, y_train)

# Predict on the test dataset
y_pred = model.predict(X_test)

Documentation

For example usage, API details, comparisons with axis-aligned trees, and in-depth insights into the algorithmic foundation, we strongly recommend referring to the full documentation.

Key Features

Oblique Splits
Perform oblique splits using linear combinations of features to capture complex patterns in data. Supports both linear and soft decision tree objectives for flexible and accurate modeling.
Axis-Aligned Splits
Offers conventional (axis-aligned) splits, enabling users to leverage standard decision tree behavior for simplicity and interpretability.
Feature Constraints
Limit the number of features used in oblique splits with the n_pair parameter, promoting simpler, more interpretable tree structures while retaining predictive power.
Seamless Categorical Feature Handling
Natively supports categorical columns with minimal preprocessing. Only label encoding is required, removing the need for extensive data transformation.
Robust Handling of Missing Values
Automatically assigns NaN values to the optimal leaf for axis-aligned splits.
Customizable Tree Structures
The flexible API empowers users to design their own tree architectures easily.
Exact Equivalence with scikit-learn
Guarantees results identical to scikit-learn's decision trees when oblique and categorical splitting are disabled.
Parallel Split Search Utilizes OpenMP-based parallelism during tree construction to evaluate splits in parallel across features, enabling significant speedups on multi-core systems.
Optimized Performance Outperforms scikit-learn in terms of speed and efficiency when oblique and categorical splitting are disabled:
- Up to 600% faster for datasets with float columns.
- Up to 400% faster for datasets with integer columns.

Contributing

Contributions are welcome! If you'd like to improve obliquetree or suggest new features, feel free to fork the repository and submit a pull request.

License

obliquetree is released under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
obliquetree		obliquetree
tests		tests
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

obliquetree

Getting Started

Installation

Documentation

Key Features

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

obliquetree

Getting Started

Installation

Documentation

Key Features

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages