一种汉语叠词的自动获取方法

Automatic acquisition method of Chinese reduplication words

Abstract

The invention discloses an automatic acquisition method of Chinese reduplication words. A reasonably-structured quintuple model is utilized for carrying out statistics on linguistic data obtained after word segmentation so as to obtain candidate sets of kinds of reduplication words, and on this basis, automatic acquisition of the AAB type reduplication words, the ABB type reduplication words, the ABA type reduplication words, the ABAB type reduplication words and the AABB type reduplication words is achieved through calculation and judgment of the reduplication degree; on the basis of judgment of the reduplication degree, automatic acquisition of the AA type reduplication words is further achieved through calculation and judgment of left adjacent entropy and right adjacent entropy. According to the method, quantified judgment and automatic acquisition of the reduplication words are achieved according to statistical information obtained by the reasonably-structured quintuple model and judgment of the reduplication degree and the information entropy. As is shown in experiments, the method is high in accuracy and beneficial for carrying out informatization processing on natural languages more accurately, has very obvious practical significance in the natural language processing field and can be widely applied and popularized.
本发明公开了一种汉语叠词的自动获取方法,利用结构合理的五元组模型对分词后的语料进行统计以获得各类叠词候选集;并在此基础上,通过叠词度的运算判断实现AAB式、ABB式、ABA式、ABAB式、AABB式叠词的自动获取;并在叠词度判断的基础上,进一步通过左、右邻接熵的运算判断实现AA式叠词的自动获取。本发明根据结构合理的五元组模型获得的统计信息,结合叠词度和信息熵的判断,实现了叠词的量化判断和自动获取,经实验证明,本发明准确率高,有利于更为更为精准的开展自然语言的信息化处理工作,在自然语言处理领域中具有十分明显的实用意义,可广泛推广使用。

Claims

Description

Topics

Download Full PDF Version (Non-Commercial Use)

Patent Citations (3)

    Publication numberPublication dateAssigneeTitle
    CN-101950306-AJanuary 19, 2011北京新媒传信科技有限公司Method for filtering character strings in process of discovering new words
    CN-102831194-ADecember 19, 2012人民搜索网络股份公司New word automatic searching system and new word automatic searching method based on query log
    US-2011202334-A1August 18, 2011Meaningful Machines, LLCKnowledge System Method and Apparatus

NO-Patent Citations (0)

    Title

Cited By (1)

    Publication numberPublication dateAssigneeTitle
    CN-105512106-AApril 20, 2016江苏科技大学Automatic recognition method of Chinese separable words