Feature selection by integrating differential evolution and support vector machine for low similarity protein prediction based on feature extraction

Mohammed Hasan Alwan Al-Dulaimi (P72757)

Please use this identifier to cite or link to this item: https://ptsldigital.ukm.my/jspui/handle/123456789/513293

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Suhaila Zainudin, Dr.	-
dc.contributor.author	Mohammed Hasan Alwan Al-Dulaimi (P72757)	-
dc.date.accessioned	2023-10-16T04:35:18Z	-
dc.date.available	2023-10-16T04:35:18Z	-
dc.date.issued	2017-10-01	-
dc.identifier.other	ukmvital:97952	-
dc.identifier.uri	https://ptsldigital.ukm.my/jspui/handle/123456789/513293	-
dc.description	The task of predicting protein structural classes is an important issue in Bioinformatics. Many approaches have been proposed to enhance the accuracy of protein structural class prediction. However, most techniques neglected to address low-similarity sequences dataset with 20% to 40% similarity. Feature selection (FS) plays a vital role in classification tasks, particularly when datasets are of high dimensionality. These datasets pose a challenge to the data mining community and demand a highly effective method for the FS task before the classification can be made more accurate. However, researches are still focusing on improving the classification accuracy and less focus on the time needed for learning a classification function and what are the best types of features that are fed to the classifier. Various algorithms have been proposed to handle the classification accuracy. However, Support Vector Machine (SVM) and Differential Evolution (DE) have shown the best solution with high ability to classify the known and unknown Protein Structural Class with promising results. Therefore, this research proposed an improved feature extraction model to enhance the strategy of FS by integrating Support Vector Machine (SVM) with Differential Evolution (DE) algorithm to solve the feature space reduction. The advantage of the method is due to its less computational effort prior to classification. Specifically, SVM and DE were proposed for feature selections by selecting top N features based on the level of feature importance using wrapper. A 71dimensional integrated feature vector was extracted from the predicted secondary structure and hydropathy sequence to categorize proteins into 4 major structural classes: all-α, all-β, α/β, and α+β. Each class is vital towards pinpointing the proteins’ structural classes. The proposed method simplified the process of classification by reducing the feature space in feature extraction phase prior to optimizing the features using the DE+SVM model. Several state-of-the art feature extraction methods were explored in the feature generation phase. The proposed method was compared to state-of-the-art feature extraction, and the results showed that the proposed method performed better due to its higher accuracy and lower number of feature dimensions. Moreover, a common drawback of the current grid search method used for tuning the parameter of the classifier is the long time needed to set the SVM parameters. An alternative method was proposed in this research for that purpose, which involves DE algorithm to tune the best SVM parameters. The results of both phases were analyzed based on classification accuracy, number of selected features and modelling time. The results of the proposed method were compared with some of prior works. The experiment result shows that the proposed DE+SVM has greatly reduced the features and achieved higher classification accuracy compared to the full features. Testing were done on two low-similarity datasets (ASTRAL and D640) and achieved accuracy of 85.7% (ASTRAL) and 91.9% (D640). These improved the performance of current methods with improved average of all accuracy from 2.17% to 6.68% respectively, the number of features reduced from more than 1000 features to 45 features and the CPU time (in seconds) required for optimizing SVM parameters reduced from 6821 to 215 and from 1534 to 126 for both two low-similarity datasets ASTRAL and D640 respectively.,Certification of Master's/Doctoral Thesis" is not available	-
dc.language.iso	eng	-
dc.publisher	UKM, Bangi	-
dc.relation	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat	-
dc.rights	UKM	-
dc.subject	Feature extraction	-
dc.subject	Protein Structural Class	-
dc.subject	Bioinformatics	-
dc.subject	Universiti Kebangsaan Malaysia -- Dissertations	-
dc.title	Feature selection by integrating differential evolution and support vector machine for low similarity protein prediction based on feature extraction	-
dc.type	Theses	-
dc.format.pages	145	-
dc.identifier.barcode	003035(2017)	-
Appears in Collections:	Faculty of Information Science and Technology / Fakulti Teknologi dan Sains Maklumat

Files in This Item:

File	Description	Size	Format
Feature selection by integrating differential evolution and support vector machine for low similarity protein prediction based on feature extraction.pdf Restricted Access	Partial	318.82 kB	Adobe PDF	View/Open

Show simple item record Recommend this item