A Double-layer Word Segmentation Combined with Local Ambiguity Word Grid and CRF

This paper presents a double-layer model of Chinese word segmentation based on the combination of Local Ambiguity Word Grid and Conditional Random Fields. Firstly, the Local Ambiguity Word Grid algorithm is used to generate rough segmentation results in the lower level. Then, the text is segmented again based on CRF, where the rough results are set as one feature. The Local Ambiguity Word Grid algorithm has the advantage of detecting ambiguity from the process of Chinese word segmentation, while CRF can cope with vocabulary and out-of-vocabulary word equally. Therefore, the hybrid Local Ambiguity Word Grid and CRF approach is the effective resolution for the ambiguity and out-of-vocabulary word. The system is closed tested in the MSRA and PKU testing sets that are provided by the SIGHAN2005 Chinese Language Processing Bakeoff, along with the comparison between four characters and six characters in a set of label. The experiments show that F-measures of the MSRA and PKU testing sets in the closed test reach 97.1% and 95.1% respectively. Additional, the experimental results of open test reveal the practical application of the model. http://www.ivypub.org/cst/paperInfo.aspx?ID=2304

Data and Resources

Additional Info

Field Value
Last Updated October 10, 2013, 19:29 (UTC)
Created May 15, 2013, 09:04 (UTC)
comments powered by Disqus
comments powered by Disqus