Please use this identifier to cite or link to this item:
https://ptsldigital.ukm.my/jspui/handle/123456789/513224
Title: | Ranking algorithms for semantic annotated documents |
Authors: | Syarifah Bahiyah Rahayu Syed Mansoor (P37895) |
Supervisor: | Shahrul Azman Mohd Noah, Prof. Dr. |
Keywords: | Semantic computing. |
Issue Date: | 6-Aug-2014 |
Description: | Semantic search seeks to improve search accuracy through an understanding of searcher intent and the contextual meaning of terms as they appear in the searchable data space. One of the aims of the Semantic Web vision is to support semantic search by representing information as data with ontologies that provide their semantics in a machine-accessible manner. However, the majority of web documents are still in human readable form. Therefore, these human readable documents are usually semantically annotated with a domain-specific ontology, resulting in the creation of semantic annotated documents. The annotations consist of semantic statements that enable computer understanding of the human readable documents. Semantic statements are encoded using Resource Description Framework (RDF) triples: <subject, predicate, object>. Although such documents and technologies have shown promising results, document ranking is still one of the most discussed issues in semantic search. Previous studies have reported that most ranking algorithms lack a weighting method for semantic features. Consequently, search engines return retrieved results without considering differences in the significance of the relationships in semantic statements. The aim of this research is therefore to propose and explore ranking algorithms for semantically annotated documents based on semantic features. The proposed ranking algorithms use semantic features to weigh semantic annotated documents. Semantic features are defined as relationships within a triple consisting of <subject, predicate> or <predicate, object>. Three proposed ranking algorithms are designed based on various measures: i) related resources weighted with a fixed weight parameter (RelFix); ii) a combination of related resources weighted with a calculated weight parameter and knowledge completeness weighed with FF-ICF (FFICFCalc); and iii) a combination of related resources weighted with a fixed weight parameter and knowledge completeness weighed with FF-ICF (ComFFICFFix). These algorithms are evaluated against the semantic TF-IDF algorithm and the Lucene Luke retrieval system. The collection used for testing is OCAS2008 (Summer Olympic Games 2004), which comprises of an ontology and a corpus of textual documents and their associated pre-annotated semantic documents. A total of 51,198 nodes and 719,435 triples from OCAS2008 were used for testing purposes. The testing evaluated the effectiveness of the ranking algorithms with fifteen single-instance queries and fifteen multiple-instance queries. The Normalized Discounted Cumulative Gain (NDCG) and precision and recall metrics were selected to evaluate the results against a human benchmark. The evaluation of the single-instance queries indicated that ComFFICFFix is the most effective ranking algorithm. However, for the multiple-instance queries, the results were mixed. Overall, however, the ranking algorithm with the combination of fixed feature weighting and knowledge completeness, ComFFICFFix, retrieved better results,Ph.D |
Pages: | 165 |
Call Number: | QA76.5913 .S974 2014 3 |
Publisher: | UKM, Bangi |
Appears in Collections: | Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
ukmvital_82273+SOURCE1+SOURCE1.0.PDF Restricted Access | 2.71 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.