Streaming Implementation of K-Means - Spark 1.2

Spark 1.2 introduced a streaming implementation of k-means with the ability to dynamically detect (and remove) clusters over time. The key to this feature is forgetfulness, which is implemented as a half-life parameter to decay old data. The Databricks blog has a post with more details on the algorithm, including several visualizations of it in action.

http://databricks.com/blog/2015/01/28/introducing-streaming-k-means-in-spark-1-2.html