See wiki links:
- k-Means generally: https://en.wikipedia.org/wiki/K-means_clustering#Software_implementations
- Elbow method: https://en.wikipedia.org/wiki/Elbow_method_(clustering)
- Silhouette score: https://en.wikipedia.org/wiki/Silhouette_(clustering)
There are two main ways of executing the algorithm. The difference is in how the original centroids are established.
I would recommend using kMeans, as shown below. The kMeans function randomizes the points to create the original centroids. Every time the function is called, a different set of centroids will be initialized.
import KMeans.Algorithm
-- kMeans :: Point a => Int -> [a] -> IO [Cluster a]
main :: IO ()
main =
do clusters <- kMeans k points
putStrLn $ displayClusters clusters
where
k = 2 -- How many clusters to create
points = [(5, 3), (9, 2), (2, 5), (4, 1), (7, 3), (10, 4), (7, 2), (1, 2)]
But the library also contains kMeansStatic, which allows you to specify a seed for the randomized centroids. This way, you can get the same output from the algorithm every time you call it, which can be helpful for testing.
-- kMeansStatic :: Point a => Int -> Int -> [a] -> [Cluster a]
main :: IO ()
main =
kMeansStatic seed k points >>=
(putStrLn . displayClusters)
where
seed = 6 -- You can try different seeds to get specific outcomes
k = 2
points = -- ...
The library provides the following instances of Point:
Point (Double, Double), intended for coordinates. The distance between two points is the hypotenuse, as determined according to the Pythagorean theorem.Point Int, intended mostly to facilitate other instances involving numerical values.Point Char, again intended to facilitate another instance, in this casePoint Text.Point Text, allows clustering of text. Distance is determined according to the Levenshtein distance.Point [a], allows clustering of lists of anything that with an instance ofPoint.
If the library doesn't contain an instance of the Point class to your needs, feel free to create one.
The library also provides a module to permit multidimensional scaling (MDS) using the SMACOF (Scaling by Majorizing a Complicated Function) method. Specifically, the user can create a set of coordinates on a 2d Cartesian plane, to visualize a set of points, even if the Point type doesn't lend itself to visual presentation.
import KMeans.Scaling
numbers :: [Int]
numbers = [ 10, 16, 20, 17 ]
main = print $ plotPoints numbers
-- [ (-1.87, -5.40)
-- , (0.84, 0.11)
-- , (0.23, 4.16)
-- , (0.80, 1.13)
-- ]
The above coordinates are plotted below:
