Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476484
Title: Focused crawling of online business web pages using enhanced latent semantic indexing approach
Authors: Thamer Saleh Tuwaya (P80532)
Supervisor: Sabrina Tiun, Dr.
Keywords: Latent semantic indexing
Web search engines
Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Issue Date: 2017
Description: With the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant, in-time and in-depth knowledge about Business topic. The huge size of such data makes the process of retrieving analyzing and use of the valuable information in such texts, manually a very difficult task. In this research, we attempt to address a challenging task i.e. a crawling business-specific knowledge on the Web. Thus, the main goal of this study is to describe a new method of focused crawler for online Business web pages based on latent semantic indexing. This study will describe a new model for online Business text crawling which seeks, acquires, maintains and filter Business web pages. This model consists mainly from two main modules: a crawling system and a text filtering system.The crawler is used to collect as many web pages as possible from the News websites. This focused crawler is guided by a latent semantic index and information from WordNet (Business filter) which learns to recognize the relevance of a web page with respect to the Business topic and it also utilizes a set of domain specific keywords. Several models for Business webpages classification has been designed and evaluated using latent sematic indexing based on two weighting methods; Term Frequency (TF) and Term Frequency x Inverse Document Frequency (TF.IDF); The obtained results showed that latent semantic indexing with TF.IDF weighting achieved the best performance with an F-measure (92.6%) on Business webpages classification. The obtained results on online real world data also show that the focused crawler using latent semantic indexing with TFIDF weighting is very effective for building high-quality collections of Business web documents.,“Certification of Master's/Doctoral Thesis” is not available,Master of Information Technology
Pages: 64
Call Number: TK5105.884.T848 2017 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_107073+SOURCE1+SOURCE1.0.PDF
  Restricted Access
937.77 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.