低维显式语义空间下的语义关联度计算方法

Semantic Relatedness Computation Method Under Low Dimensional Explicit Semantic Space

  • 摘要: 语义关联度计算是数据科学中的一个关键性基础问题,在信息检索及自然语言处理等方面有着广泛的应用.针对ESA(Explicit Semantic Analysis)算法存在的局限性,提出一种显式语义特征选择算法,并构建低维语义空间.在此基础上,根据特征概念在Wikipedia中的映射信息,提出一种低维显式语义空间下的语义关联度计算方法.该方法解决了ESA算法在后续语义关联度计算过程中,因高维稀疏空间导致计算效果不够准确的问题.实验结果表明,与当前其他方法相比,该方法的计算结果在皮尔逊相关系数(P)及斯皮尔曼相关系数(S)上与人们的认知判断之间具有更好的一致性.

     

    Abstract: Semantic relatedness computation is a critical fundamental issue in data science. It has a wide range of applications in information retrieval and natural language processing. In view of the current limitations of ESA (Explicit Semantic Analysis) algorithm, a feature selection algorithm is presented to filter the explicit semantic features, and the low dimensional semantic space is constructed. On this basis, according to the mapping information of feature concepts in Wikipedia, a semantic relatedness computation method is proposed under low dimensional explicit semantic space. This method can improve the efficiency of ESA in the following relatedness computing process under high dimensional sparse space. Finally, the experimental results demonstrate that the proposed method has a better correlation on Pearson's (P) and Spearman's (S) correlation coefficient with the intuitions of human judgments than other related works.

     

/

返回文章
返回