While not a particularly interesting computation, it is a now-classic
benchmark for Big Data ML. We specifically developed our PC $k$-means implementation to closely matchthe implementation in Spark’s mllib.
Both implementations use the standard trick, where, to find the centroid closest to a given point,
a lower bound $||a – b||_2 \geq abs(||a||_2 – ||b||_2)$ is first computed to avoid unnecessary distance computations.

K-Means Clustering

Leave a Reply

Your email address will not be published. Required fields are marked *