Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/476603
Title: Question classification based on Bloom's taxonomy cognitive domain modified TF-IDF and word2vec
Authors: Manal Mohammed Al- Tamimi (P82838)
Supervisor: Nazlia Omar, Prof. Dr.
Keywords: Natural language processing (Computer science)
Machine learning
Universiti Kebangsaan Malaysia -- Dissertations
Dissertations, Academic -- Malaysia
Issue Date: 13-Aug-2018
Description: Examination question assessment plays an important role in educational institutes, since it is one of the most common method to evaluate student's achievement in specific course. Therefore, there is a crucial need to write a balanced and high-quality exam, which satisfy different levels of cognitive. Thus, many lecturers use Bloom’s taxonomy cognitive domain, which is a popular framework developed for the purpose of assesses students’ intellectual abilities and skills. However, the process of classifying questions automatically based on Bloom’s taxonomy is a challenging task due to the shortness of questions. Therefore, several works have been done to automatically classifying questions in accordance to Bloom’s taxonomy. Most of these works classify questions in a specific domain, where there is a lack of techniques on classifying question over multi-domain area. The aim of this study is to build a generic question classification model to classify question based on Bloom’s taxonomy cognitive domain from several areas. This study proposed a new method for classifying questions automatically by extracting two features, namely TFPOS-IDF and pre-trained word2vec. The purpose of first feature is to calculate the term frequency- inverse documents frequency based on part of speech, in order to assign a suitable weight for important words in the question. While pre-trained word2vec, the semantic feature, used to boost and enhance the classification process. Then, the combination of these both features are fed into Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classifiers, in order to classify the questions. The experiments have used two dataset. The first dataset contains 141 questions, while the other dataset contains 600 questions. The questions in both dataset are collected from different domains, and divided into 80% training set and 20% test set. The classification result for the first dataset achieves an average of 83.7% and 71.1% weighted F1-measure respectively. While the classification result for the second dataset achieves an average of 89.7% and 85.4% weighted F1-measure respectively. The finding from this study showed that the proposed method is significant in classifying questions from multiple domain.,Master of Computer Science,Certification of Master's / Doctoral Thesis" is not available"
Pages: 98
Call Number: QA76.9.N38T335 2018 3 tesis
Publisher: UKM, Bangi
Appears in Collections:Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:
File Description SizeFormat 
ukmvital_121845+SOURCE1+SOURCE1.0.PDF
  Restricted Access
15.75 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.