Releases: w2rc/ucluster
Releases · w2rc/ucluster
v1.0.0
This is a substantial modernization release with breaking API and dependency changes. Pin ucluster<1.0 if you need the old behavior.
Removed
TransformerClusterhas been removed.FuzzyClustereris now the transformer-based clusterer; there is no separate class. Update imports:from ucluster import TransformerCluster→from ucluster import FuzzyClusterer.tf-clusterVisiData command has been removed (redundant withfuzzy-cluster).- FastText is no longer a dependency. The "train word vectors from scratch on each run" pipeline is gone. If you depended on per-corpus FastText training, pin
ucluster<1.0.
Changed
FuzzyClusterernow uses theparaphrase-multilingual-MiniLM-L12-v2sentence-transformer model instead of training FastText from scratch. The model is downloaded from Hugging Face Hub on first use (~470MB) and cached. First run now requires network access.FuzzyClusterer.outlier_probabilities()semantics have changed. It previously returnedhdbscan.HDBSCAN.outlier_scores_(the GLOSH algorithm — per-point outlier-ness even for in-cluster members, effectively unbounded). It now returns1 - probabilities_(bounded in[0, 1], with1.0for noise points). The numbers are not comparable to the old GLOSH scores.FuzzyClusterer.__init__signature changed: thedimsparameter (FastText embedding dimension) is gone. A newmodelparameter accepts a sentence-transformer model name or path.- HDBSCAN now comes from
sklearn.cluster.HDBSCAN(upstreamed in scikit-learn 1.3) instead of the standalonehdbscanpackage. Same algorithm, no separate Cython wheel to compile. - VisiData dependency bumped from
^2.11to>=3.0. The plugin works on both 2.x and 3.x runtime, but the lockfile pins 3.x. - NLTK punkt resource is now downloaded as
punkt_tab(NLTK 3.9+ split). Existing~/nltk_data/tokenizers/punktinstalls will trigger a one-time re-download. - Python floor raised from
^3.10(which was equivalent to>=3.10,<4) to a plain>=3.10. No upper bound. - Packaging migrated from old-style Poetry (
[tool.poetry]) to PEP 621 ([project]) withhatchlingas the build backend. Useuv syncinstead ofpoetry install.
Removed installation pain
- The Mac M-series + OpenBLAS + Conda dance is gone.
uv syncworks out of the box on Apple Silicon and x86 becausescikit-learn,torch, andsentence-transformersall ship pre-built wheels. env.yml(Conda environment) has been deleted.poetry.lockhas been deleted;uv.locktakes its place.
Plugin
- VisiData plugin bumped to
2.0.0to reflect the removedtf-clustercommand.
v0.3.2b3
Initial release
v0.1.0 README fixes