预览加载中,请您耐心等待几秒...
1/3
2/3
3/3

在线预览结束,喜欢就下载吧,查找使用更方便

如果您无法下载资料,请参考说明:

1、部分资料下载需要金币,请确保您的账户上有足够的金币

2、已购买过的文档,再次下载不重复扣费

3、资料包下载后请先用软件解压,在使用对应软件打开

基于聚类和自动编码机的缺失数据填充算法 Abstract Inrecentyears,theincreasingamountofmissingdatahasbecomeachallengingproblemfordatascientists.Withthedevelopmentofmachinelearningalgorithms,manymethodshavebeenproposedforfillinginmissingdata.However,therearestillsomechallengesthatneedtobetackled.Inthispaper,weproposeamissingdataimputationmethodbasedonclusteringandautoencoder.Theclusteringalgorithmisusedtogroupsimilardataobjectsintoclusters,andautoencoderisusedtolearnthefeaturerepresentationofthedata.Experimentalresultsshowthatourmethodoutperformsthestate-of-the-artmethodsintermsofaccuracyandefficiency. Introduction Missingdataisacommonissueinmanyfields,suchashealthcare,finance,andscientificresearch.Missingdatacanbecausedbyvariousreasons,suchasinstrumentfailure,dataentryerrors,andnon-response.Todealwithmissingdata,researchershaveproposedvariousmethods,suchasdatadeletion,meanimputation,andregressionimputation.However,thesemethodshavesomelimitations.Datadeletioncanleadtoalossofvaluableinformation,meanimputationassumesthatmissingdataisrandom,andregressionimputationrequirestheassumptionoflinearrelationshipsbetweenvariables. Recently,machinelearningalgorithmshavebecomeapopularapproachforfillinginmissingdata.Inthispaper,weproposeamissingdataimputationmethodbasedonclusteringandautoencoder.Clusteringalgorithmisusedtogroupsimilardataobjectsintoclusters,andautoencoderisusedtolearnthefeaturerepresentationofthedata.Ourproposedmethodcanhandledifferenttypesofmissingdataanddoesnotrequiretheassumptionoflinearrelationshipsbetweenvariables. Methodology Ourproposedmethodconsistsoftwomainsteps:clusteringandautoencoder.Theclusteringalgorithmisusedtogroupsimilardataobjectsintoclusters,andautoencoderisusedtolearnthefeaturerepresentationofthedata.TheoverallframeworkofourproposedmethodisshowninFig.1. ![image.png](attachment:image.png) Figure1.Overallframeworkofourproposedmethod Step1:Clustering Intheclusteringstep,weusetheK-meansalgorithmtoclusterthedata.TheK-meansalgorithmisapopularunsupervisedlearningalgorithmtha