A New Model for Automatic Sentence Segmentation

Funkun Xing

Abstract


Context Overlapping Model (COM) is presented in this article for the task of Automatic Sentence Segmentation (ASS). Comparing with HMM, COM expands observation from single word to n-gram unit and there is an overlapping part between the neighboring units. Due to the co-occurrence constraint and transition constraint, COM model reduces the search space and improves tagging accuracy. We treated ASS as a task of sequence labeling and applied 2-gram COM to it. The experiment results show that the overall correct rate of the open test is as high as 90.11%, which is significantly higher than the baseline model (second order HMM), which is 85.16%.

Full Text: PDF DOI: 10.5539/cis.v4n4p134

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Computer and Information Science   ISSN 1913-8989 (Print)   ISSN 1913-8997 (Online)
Copyright © Canadian Center of Science and Education

To make sure that you can receive messages from us, please add the 'ccsenet.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.