Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476484
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorSabrina Tiun, Dr.-
dc.contributor.authorThamer Saleh Tuwaya (P80532)-
dc.date.accessioned2023-10-06T09:19:20Z-
dc.date.available2023-10-06T09:19:20Z-
dc.date.issued2017-
dc.identifier.otherukmvital:107073-
dc.identifier.urihttps://ptsldigital.ukm.my/jspui/handle/123456789/476484-
dc.descriptionWith the exponential growth of textual information available from the Internet, there has been an emergent need to find relevant, in-time and in-depth knowledge about Business topic. The huge size of such data makes the process of retrieving analyzing and use of the valuable information in such texts, manually a very difficult task. In this research, we attempt to address a challenging task i.e. a crawling business-specific knowledge on the Web. Thus, the main goal of this study is to describe a new method of focused crawler for online Business web pages based on latent semantic indexing. This study will describe a new model for online Business text crawling which seeks, acquires, maintains and filter Business web pages. This model consists mainly from two main modules: a crawling system and a text filtering system.The crawler is used to collect as many web pages as possible from the News websites. This focused crawler is guided by a latent semantic index and information from WordNet (Business filter) which learns to recognize the relevance of a web page with respect to the Business topic and it also utilizes a set of domain specific keywords. Several models for Business webpages classification has been designed and evaluated using latent sematic indexing based on two weighting methods; Term Frequency (TF) and Term Frequency x Inverse Document Frequency (TF.IDF); The obtained results showed that latent semantic indexing with TF.IDF weighting achieved the best performance with an F-measure (92.6%) on Business webpages classification. The obtained results on online real world data also show that the focused crawler using latent semantic indexing with TFIDF weighting is very effective for building high-quality collections of Business web documents.,“Certification of Master's/Doctoral Thesis” is not available,Master of Information Technology-
dc.language.isoeng-
dc.publisherUKM, Bangi-
dc.relationFaculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat-
dc.rightsUKM-
dc.subjectLatent semantic indexing-
dc.subjectWeb search engines-
dc.subjectUniversiti Kebangsaan Malaysia -- Dissertations-
dc.subjectDissertations, Academic -- Malaysia-
dc.titleFocused crawling of online business web pages using enhanced latent semantic indexing approach-
dc.typetheses-
dc.format.pages64-
dc.identifier.callnoTK5105.884.T848 2017 3 tesis-
dc.identifier.barcode003802(2019)-
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_107073+SOURCE1+SOURCE1.0.PDF
  Restricted Access
937.77 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.