预览加载中,请您耐心等待几秒...
1/2
2/2

在线预览结束,喜欢就下载吧,查找使用更方便

如果您无法下载资料,请参考说明:

1、部分资料下载需要金币,请确保您的账户上有足够的金币

2、已购买过的文档,再次下载不重复扣费

3、资料包下载后请先用软件解压,在使用对应软件打开

网构软件中实体发现和排序的TEA方法(英文) Introduction Astheamountofdataontheinternetcontinuestoincrease,moreandmoredataisbeingcreatedintheformofunstructuredtext.Thismakesitdifficultforhumanstosortthrough,andevenmoredifficultformachinestounderstand.Entityextractionistheprocessofidentifyingandextractingentitiesfromunstructuredtext,andisakeycomponentofnaturallanguageprocessing(NLP).Entitiescanbeanythingfrompeople,places,organizations,oranyothernamedentitieswithinthetext.TheTEA(TopicEntityAssociation)methodisawidelyusedalgorithmthatisusedtodiscoverandrankentitieswithinunstructuredtext.ThispaperwillexploretheTEAmethodinmoredetail,includingitsbackground,methodology,andapplications. Background TheTEAmethodwasdevelopedbyresearchersattheUniversityofCalifornia,Berkeley,asawaytoautomaticallyidentifyandrankentitiesfoundwithintext.Themethodisbasedontheassumptionthatentitiesthatco-occurinthesamesentencearelikelytoberelated.Additionally,themethodincorporatesinformationaboutthetopicsdiscussedinthetext,allowingthealgorithmtoidentifyentitiesthataremostrelevanttothetopicathand. Methodology TheTEAmethodconsistsofthreemainsteps:preprocessing,entitydiscovery,andentityranking.Preprocessinginvolvesparsingthetextandbreakingitdownintosentences.Theentitydiscoverystepusesnaturallanguageprocessingtechniquestoidentifyentitieswithineachsentence.Finally,theentityrankingsteprankstheidentifiedentitiesbasedontheirrelevancetothetopicathand. Intheentitydiscoverystep,theTEAmethodusesastatisticalmodeltoidentifyentitiesthatco-occurtogetherinthesamesentence.Themethodthencalculatesascoreforeachidentifiedentitybasedonitsco-occurrencewithotherentities.Thescoreisbasedontheassumptionthatentitiesthatco-occurmorefrequentlyaremorelikelytoberelated.TheTEAmethodalsotakesintoaccounttheorderinwhichentitiesappearwithinthesentence,withentitiesclosertogetherbeingmorestronglyrelated. Intheentityrankingstep,theTEAmethodcalculatesarelevancescoreforeachidentifiedentity.Therelevancescoreisbasedontheentity'sco-occurrencewithotherentities,aswell