预览加载中,请您耐心等待几秒...
1/2
2/2

在线预览结束,喜欢就下载吧,查找使用更方便

如果您无法下载资料,请参考说明:

1、部分资料下载需要金币,请确保您的账户上有足够的金币

2、已购买过的文档,再次下载不重复扣费

3、资料包下载后请先用软件解压,在使用对应软件打开

WebSpam技术研究综述(英文) Introduction WebSpamisamalicioustechniquethataimstodeceivesearchenginesbymanipulatingitsalgorithmstoranklow-qualitywebsiteshigherinsearchresults.TheemergenceofsearchengineshasledtothebirthofWebSpam.ThepurposeofthispaperistoprovideacomprehensiveoverviewandanalysisofWebSpamtechniques. WebSpamTechniques WebSpamcanbegenerallycategorizedintothreemainphases:WebCrawling,IndexingandRanking. 1.WebCrawling WebCrawlingreferstotheprocessofsearchinganddiscoveringURLsforfurtherextractionoftheircontents.SeveralWebSpamtechniquescanbeappliedduringtheWebCrawlingphase: (i)Cloaking:Thistechniqueisusedtodisplaydifferentversionsofwebpagestosearchenginesanduserstomanipulatesearchresults.Thecontentpresentedtothesearchenginespiderisdifferentfromwhattheusersees. (ii)Doorwaypages:Thesepagesaredesignedspecificallyforsearchengines,nothumans,andcontainmanyspamkeywordsandlinkstolow-qualitywebsites. (iii)Linkfarms:Linkfarmsarenetworksofwebsitesthatcontainnumerousunrelatedlinkstoothersitessolelytoincreasethelinkedsites’rankingsonsearchengines. 2.Indexing Indexingistheprocessofaddingcrawledwebpagestothesearchengine'sdatabase.ThefollowingWebSpamtechniquesareappliedduringtheIndexingphase: (i)DuplicateContent:Thistechniqueinvolvescopyinganentirewebpageorpartsofitandspreadingittomultiplesources,resultinginmultiplepagescontainingthesamecontent. (ii)KeywordStuffing:Thistechniqueinvolvesoverusingkeywordsinanexcessiveamountonwebpages,resultinginlower-qualitycontent. 3.Ranking Rankingistheprocessofprioritizingsearchengineresultsandplacingthemaccordingtotheirrelevance.ThefollowingWebSpamtechniquesareappliedduringtheRankingphase: (i)LinkSpamming:Thistechniqueinvolvescreatingmultiplehyperlinksleadingtoawebsiteregardlessoftheirrelevance. (ii)HiddenText:Thistechniqueinvolveshidingthetextbyusingthesamecolorasthebackgroundormakingthefontsizeverysmall. (iii)Clickbait:Thistechniqueinvolvescreatingsensationalheadlinestolurereaderstoclickalink,leadingtoinorganictraffic. Conclusion WebSpamtechniquescan