Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/476484
Title: | Focused crawling of online business web pages using enhanced latent semantic indexing approach |
Authors: | Thamer Saleh Tuwaya (P80532) |
Supervisor: | Sabrina Tiun, Dr. |
Keywords: | Latent semantic indexing Web search engines Universiti Kebangsaan Malaysia -- Dissertations Dissertations, Academic -- Malaysia |
Issue Date: | 2017 |
Description: | With the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant, in-time and in-depth knowledge about Business topic. The huge size of such data makes the process of retrieving analyzing and use of the valuable information in such texts, manually a very difficult task. In this research, we attempt to address a challenging task i.e. a crawling business-specific knowledge on the Web. Thus, the main goal of this study is to describe a new method of focused crawler for online Business web pages based on latent semantic indexing. This study will describe a new model for online Business text crawling which seeks, acquires, maintains and filter Business web pages. This model consists mainly from two main modules: a crawling system and a text filtering system.The crawler is used to collect as many web pages as possible from the News websites. This focused crawler is guided by a latent semantic index and information from WordNet (Business filter) which learns to recognize the relevance of a web page with respect to the Business topic and it also utilizes a set of domain specific keywords. Several models for Business webpages classification has been designed and evaluated using latent sematic indexing based on two weighting methods; Term Frequency (TF) and Term Frequency x Inverse Document Frequency (TF.IDF); The obtained results showed that latent semantic indexing with TF.IDF weighting achieved the best performance with an F-measure (92.6%) on Business webpages classification. The obtained results on online real world data also show that the focused crawler using latent semantic indexing with TFIDF weighting is very effective for building high-quality collections of Business web documents.,“Certification of Master's/Doctoral Thesis” is not available,Master of Information Technology |
Pages: | 64 |
Call Number: | TK5105.884.T848 2017 3 tesis |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_107073+SOURCE1+SOURCE1.0.PDF Restricted Access | 937.77 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.