预览加载中,请您耐心等待几秒...
1/3
2/3
3/3

在线预览结束,喜欢就下载吧,查找使用更方便

如果您无法下载资料,请参考说明:

1、部分资料下载需要金币,请确保您的账户上有足够的金币

2、已购买过的文档,再次下载不重复扣费

3、资料包下载后请先用软件解压,在使用对应软件打开

基于MapReduce模型的并行粒子群分簇算法研究 Abstract Particleswarmclustering(PSC)isapopularalgorithmforclusteringlargedatasetsduetoitssimplicityandeffectiveness.However,asthesizeofdatasetscontinuestoincrease,sequentialPSCalgorithmsbecomeimpractical.Inthispaper,weproposeaparallelPSCalgorithmbasedontheMapReducemodel.OurapproachdividesthedatasetintosubsetsandappliesthePSCalgorithmtoeachsubsetinparallel.Theresultsfromeachsubsetarethencombinedtoobtainthefinalclusteringsolution.WeevaluateourparallelPSCalgorithmonseverallargedatasetsanddemonstratesignificantspeedupcomparedtothesequentialPSCalgorithm. Keywords:Particleswarmclustering,MapReduce,parallelalgorithms,largedatasets Introduction Clusteringisafundamentaltaskindataminingandmachinelearning,whichaimstopartitionagivendatasetintogroupsorclustersbasedonthesimilarityofdatapointswithineachcluster.Onepopularapproachtoclusteringistheparticleswarmclustering(PSC)algorithm,whichemploysapopulationofparticlesthatmoveinthesearchspacetofindtheoptimalclusteringsolution.PSChasbeenwidelyusedinvariousapplications,suchasimagesegmentation,textclustering,andgeneexpressionanalysis. However,asdatasetscontinuetogrowinsize,sequentialPSCalgorithmsbecomeincreasinglyimpracticalduetotheircomputationalcomplexity.Toaddressthisissue,parallelalgorithmshavebeenproposedtoacceleratetheclusteringprocess.Inthispaper,weproposeaparallelPSCalgorithmbasedontheMapReducemodel. MapReduceisaprogrammingmodelandassociatedimplementationforprocessingandgeneratinglargedatasets.Itprovidesasimpleandscalablewaytoparallelizedata-intensiveapplications,suchasclustering.ThebasicideaofMapReduceistodividealargedatasetintosubsetsandprocesseachsubsetinparallel.Theresultsfromeachsubsetarethencombinedtoobtainthefinalsolution. OurparallelPSCalgorithmtakesadvantageoftheMapReduceframeworktopartitionthedatasetintosubsetsandapplythePSCalgorithmtoeachsubsetinparallel.Theresultsfromeachsubsetarethenmergedtoobtainthefinalclusteringsolution. AlgorithmDesign OurparallelPSCalgorithmconsistsofthefollowingsteps