Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

Full Text: <a href="https://ccsenet.org/journal/index.php/cis/article/download/0/0/37302/37544">PDF &nbsp;
DOI: 10.5539/cis.v11n4p77

Malek Mouhoub; Mustakim Al Helal

doi:10.5539/cis.v11n4p77

Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification

Malek Mouhoub
Mustakim Al Helal

Abstract

Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the whole world but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This paper deals with finding the core topics of Bangla news corpus and classifying news with similarity measures. The document models are built using LDA (Latent Dirichlet Allocation) with bigram.