Data mining classification   problem of clustering is  improved with the help of K-Mean++ Algorithm

Arvind Singh; Pratibha Thakur

Arvind Singh Research Scholar, Department of Computer Science, SIRT, M.P,
Pratibha Thakur Research Scholar, Department of Computer Science, MCU, M.P,

Keywords: Knowledge discovery, KDD, Data mining, K-Mean Rule, Clustering

Abstract

The goal of data mining is to extract or â€œmine" knowledge from large amounts of data. Knowledge and understanding of a problem is always the first step in identifying effective solutions. However, data is often collected by several different sites. Privacy, legal and commercial concerns restrict centralized access to this data .KDD process assumes that all the data is easily accessible at a central location or through centralized access mechanisms such as federated databases and virtual ware houses .

The application of data mining techniques on official data has great potential in supporting good public policy. Itâ€™s a technique can be used to detect errors in data collection, cluster, classify, make prediction, and generate interesting association patterns of survey databases.

Recommender systems based on automated collaborative filtering predict new items of interest for a user based on predictive relationships discovered between that user and other participants of a community. Most

of the successful research and commercial systems in collaborative filtering use a nearest-neighbor model Process of semi-automatically analyzing large databases to find interesting and useful patterns. Overlaps

for generating predictions. Automated collaborative filtering systems based on the nearest-neighbor method work in three simple phases

Downloads

Download data is not yet available.

References

Chun Sheng Li, "Cluster Center Initialization Method for K -means Algorithm Over Data Sets with Two Clusters,"International Conference on Advances in Engineering, vol. 24, pp. 324 â€“ 328, 2011

Rendle, S.,and L. Schmidt-Thieme. Online-updating regularized kernel matrix factorization models for large-scale recommender systems. In Recsys Proceedings of the ACM conference on Recommender Systems, 2008.

Data Mining Tutorial: Graham Williams, Markus Hegland and Stephen Roberts

S. Kalyani and K.S. Swarup, "Particle swarm optimization based K-means clustering approach for security assessment in power systems," Expert Systems with Applications, vol. 30, pp. 10839â€“10846, 2011.

M.C. Naldi, R.J.G.B. Campello, E.R. Hruschka, and A.C.P.L.F. Carvalho, "Efficiency issues of evolutionaryk-means," Applied Soft Computing, vol.11, pp. 1938â€“1952,2011.

http://www.cse.ohiostate.edu/~johansek/clustering.pdf Clustering Techniques for Financial Diversification, March 2009.

http://www.cs.sfu.ca/coursecentral/884/G2/2002-3/references/high00.pdf

Schclar, A., Tsikinovsky, A., Rokach, L., Meisels, A., and Antwarg, L., Ensemble methods for improving the performance of neighborhood-based collaborative filtering.

Comparison of Leading Data Mining Tools John F.Elder IV

Arthur, D. and Vassilvitskii,S."How slow is the k-means method?", Proceedings of the twenty-second annual symposium on Computational geometry, pp. 144â€“153Unknown parameter,

http://lingpipe-blog.com//03/23/arthur-vassilvitskii-kmeans-the-advantages-of-careful-seeding/ Lingpipe Blog,2009

Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications, World Scientific Publishing,2008

R. Maitra, "Initializing partition-optimization algorithms," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 6, pp. 144â€“157, 2009

In RecSys : Proceedings of the third ACM conference on Recommender systems, pages 261â€“264, New York, NY, USA, 2009

http://sirlab.usc.edu/publicationsICWSM2LEES.pdf Discovering Relationships among Tags and Geotags, Arthur, D. and Vassilvitski S,"k-means++:

The advantages of careful seeding". Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms. pp. 1027â€“1035.