The document reviews existing methods for the k-means clustering algorithm. It discusses how k-means clustering works and some of its limitations when dealing with large datasets, such as being dependent on the initial choice of centroids. It then proposes using Hadoop to overcome big data challenges and calculate preliminary centroids for k-means clustering in a distributed manner. Finally, it reviews different techniques that have been proposed in other research to improve k-means clustering, such as methods for selecting better initial centroids or determining the optimal number of clusters.