Releases: peterc/whatlanguage
v2.0.0 - Full rewrite
After many years dormant (though still working well!) I've finally bashed WhatLanguage into shape with a more modern approach than hammering away at megabytes of Bloom filters..
The new approach is based on trigram detection, with data vendored from the established whatlang, itself a port of Franc, whose models are built from the public-domain UDHR corpus (see Credits). The model is a ~220 KB JSON file.
v2.0 has many breaking changes as the entire library has been rewritten, though the core WhatLanguage.language API remains similar. Versions 1.0.6 and earlier (so the 2007-2025 run of the library) used a Bloom-filter technique and had 5MB of binary files to handle ~20 languages. Version 2.0 is more accurate, faster, and supports more languages from a single 220KB JSON file :-)
Ruby 3.0+ is now required, and the README has all the goodies you'll need to use it.