The Analysis and Implementation of the K - Means Algorithm Based on Hadoop Platform

Liu Xiang Wei

Abstract


In today's society has entered the era of big data, data of the diversity and the amount of data increases to the data storage and processing brought great challenges, Hadoop HDFS and MapReduce better solves the these two problems. Classical K-means algorithm is the most widely used one based on the partition of the clustering algorithm. At the completion of the cluster configuration based on, the k-means algorithm in cluster mode of operation principle and in the cluster mode realized kmeans algorithm, and the experimental results are research and analysis, summarized the k-means algorithm is run on the Hadoop platform's strengths and limitations.


Full Text:

PDF


DOI: https://doi.org/10.5539/cis.v11n1p98

Copyright (c) 2018 Liu Xiang Wei

License URL: http://creativecommons.org/licenses/by/4.0

Computer and Information Science   ISSN 1913-8989 (Print)   ISSN 1913-8997 (Online)  Email: cis@ccsenet.org


Copyright © Canadian Center of Science and Education

To make sure that you can receive messages from us, please add the 'ccsenet.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.