Similarity Matrix Based Session Clustering by Sequence Alignment Using Dynamic Programming

  •  K. Duraiswamy    
  •  V. Valli Mayil    


With the rapid increasing popularity of the WWW, Websites are playing a crucial role to convey knowledge to the end users. Every request of Web site or a transaction on the server is stored in a file called server log file.  Providing Web administrator with meaningful information about user access behavior (also called click stream data) has become a necessity to improve the quality of Web information and service performance. As such, the hidden knowledge obtained from mining, web server traffic data and user access patterns ( called Web Usage Mining), could be directly  used for marketing and management of E-business, E-services, E-searching , E-education and so on.

Categorizing visitors or users based on their interaction with a web site is a key problem in web usage mining. The click stream generated by various users often follows distinct patterns, clustering  of  the access pattern will provide the  knowledge,  which may help in recommender system of  finding learning pattern of user  in E-learning system , finding group of visitors  with similar interest , providing  customized content in site manager, categorizing  customers in E-shopping etc.

Given session information, this paper focuses a method to find session similarity by sequence alignment using dynamic programming, and proposes a model such as similarity matrix for representing session similarity measures. The work presented in this paper follows Agglomerative Hierarchical Clustering method to cluster the similarity matrix in order to group similar sessions and the clustering process is depicted in dendrogram diagram.

This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

h-index (December 2020): 35

i10-index (December 2020): 152

h5-index (December 2020): N/A

h5-median(December 2020): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )