Multi-level Frontier based Topic-specific Crawler Design with Improved URL Ordering


  •  Akilandeswari Jeyapal    
  •  Gopalan Palanisamy    

Abstract

The rapid growth of World Wide Web has urged the development of retrieval tools like search engines. Topic specific crawlers are best suited for the users looking for results on a particular subject. In this paper, a novel design of a topic specific web crawler based on multi-agent system is presented. The architecture proposed employs two types of agents: retrieval and coordinator agents. Coordinator agent is responsible for disseminating URLs from crawling frontiers to individual retrieval agents. The URL frontier is modeled as multi-level queues to implement tunneling and is populated with URLs by a rule based engine. The coordinator agent dynamically assigns URLs to retrieval agents to avoid downloading non productive and duplicate Web pages. The empirical results clearly depict the advantage of using multi-level frontier queues in terms of harvest ratio, time, and downloading highly relevant Web pages.


This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: semiannual

Journal Metrics

WJCI (2022): 0.636

Impact Factor 2022 (by WJCI):  0.419

h-index (January 2024): 43

i10-index (January 2024): 193

h5-index (January 2024): N/A

h5-median(January 2024): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )

Contact