Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476429
Title: Question answering system using Support Vector Machine for hadith domain
Authors: Nabeel Thajeel Neamah (P72239)
Supervisor: Saidah Saad, Dr.
Keywords: Question answering system
Hadith
Support Vector Machine
Natural Language Processing
Dissertations, Academic -- Malaysia
Issue Date: 5-Sep-2016
Description: The main aim of question answering (QA) system is provide correct answers based on users’ queries. The system was developed to provide answers for various domains or restricted domain. There are main challenges face the question answering systems such as to extract answers based on weak concepts of users’ queries and difficulty to retrieve accurate answers from large corpus of documents. These challenges increase the difficulty of questions analyzing and retrieve relevant and correct answers based on users’ queries. The main aim of this research is to develop QA system for Hadiths that is related to pray and fasting subjects in order to enhance the accuracy of QA system for Hadiths. The main objectives of this research are: to analyze the real needs of users query through suitable techniques, and to minimize the searching space of answers in order to retrieve more focused answers. There are many Natural Language Processing (NLP) methods adopted in this research in order to address the first objective of this research. Tokenization, stop-word removal, and N-gram are conducted to analyze users’ query and WordNet tool are used to enhance the concepts of provided query. In order to address the second objective of this research, Support Vector Machine (SVM) method is conducted to classify Hadiths documents based on relevant subjects and questions types. Documents in Hadith corpus are classified into two questions types; pray documents for when questions, fasting document for when questions, pray documents for where questions, and fasting documents for where questions. The processes of proposed question answering system are to analyze and enhance users query using NLP methods, identify the subjects and question type of query using SVM, classify Hadiths documents using SVM, and extract candidates answers of query based from the most suitable class of documents. The final answers are extracted using Cosine Similarity (CS) and Longest Common Subsequence (LCS) techniques. The proposed question answering system is tested using 132 Hadiths documents about Fasting and Pray that are selected from Al-Bukhari source. The accuracy of the proposed system is analyzed based on F-score measurement. The training test of this research is conducted using 12 proposed queries to assure the testing effectiveness. These 12 queries are selected based on proposed questions about pray and fasting subjects that are provided by 15 students from UKM. The findings revealed that the average answers accuracy using CS technique is 67%, the average answers accuracy using LCS technique is 66%, the average answers accuracy using combination of CS and LCS techniques is 70%, and the average answers accuracy using CS, LCS, and SVM is 80%. Results accuracy involving SVM method is more accurate than other methods like CS and LCS. SVM enhance the system accuracy up to 10% more than using other methods without classification processes. The main contribution of this research is using SVM method to reduce searching scope of Hadiths documents based on various subjects and question types beside effective analysis of query need using NLP methods. SVM provides more accurate answers than extracting answers using only similarity techniques such as CS and LCS.,Certification of Master's/Doctoral Thesis" is not available
Pages: 79
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_85944+SOURCE1+SOURCE1.0.PDF
  Restricted Access
220.68 kBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.