Topic Modelling in Bangla Language: An LDA Approach to Optimize Topics and News Classification


  •  Malek Mouhoub    
  •  Mustakim Al Helal    

Abstract

Topic modeling is a powerful technique for unsupervised analysis of large document collections. Topic models have a wide range of applications including tag recommendation, text categorization, keyword extraction and similarity search in the text mining, information retrieval and statistical language modeling. The research on topic modeling is gaining popularity day by day. There are various efficient topic modeling techniques available for the English language as it is one of the most spoken languages in the whole world but not for the other spoken languages. Bangla being the seventh most spoken native language in the world by population, it needs automation in different aspects. This paper deals with finding the core topics of Bangla news corpus and classifying news with similarity measures. The document models are built using LDA (Latent Dirichlet Allocation) with bigram.



This work is licensed under a Creative Commons Attribution 4.0 License.
  • Issn(Print): 1913-8989
  • Issn(Onlne): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

(The data was calculated based on Google Scholar Citations)

Google-based Impact Factor (2018): 18.20

h-index (January 2018): 23

i10-index (January 2018): 90

h5-index (January 2018): 11

h5-median(January 2018):17

Contact