Representation of textual documents by the approach wordnet and n-grams for the unsupervised classification (clustering) with 2D cellular automata: a comparative study

HAMOU Reda Mohamed, LEHIRECHE Ahmed, LOKBANI Ahmed Chaouki, RAHMANI Mohamed

Abstract


In this article we present a 2D cellular automaton (Class_AC) to solve a problem of text mining in the case of unsupervised classification (clustering). Before to experiment the cellular automaton, we vectorized our data indexing textual documents from the database REUTERS 21,578 by Wordnet approach and the representation of text documents by the method n-grams. Our work is to make a comparative study of two approaches to representation that is the conceptual approach (Wordnet) and the n-grams. Section 1 gives an introduction on the biomimétisme and text mining, Section 2 presents representation of texts based on Wordnet approach and  the n grams, Section 3  describes the cellular automaton for clustering, Section 4 shows the experimentation and comparison results and finally Section 5  gives a conclusion and perspectives.

Full Text: PDF

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Computer and Information Science   ISSN 1913-8989 (Print)   ISSN 1913-8997 (Online)
Copyright © Canadian Center of Science and Education

To make sure that you can receive messages from us, please add the 'ccsenet.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.