Training data selection using ensemble dataset approach for software defect prediction

Md Fahimuzzman Sohan, Md Alamgir Kabir, Mostafijur Rahman, S. M. Hasan Mahmud, Touhid Bhuiyan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

Cross-project defect prediction (CPDP) is using due to the limitation of within project defect prediction (WPDP) in Software Defect Prediction (SDP) research. CPDP aims to train one project data to predict another project using the machine learning technique. The source and target projects are different in the CPDP setting, because of various structured source-target projects, sometimes it may not be a perfect combination. This study represents a categorical data set ensemble technique, where multiple data sets have been aggregated for source data instead of using a single data set. The method has been evaluated on nine data sets, taken from the publicly accessible repository with two performance indicators. The results of this data set ensemble approach show the improvement of the prediction performance over 65% combinations compared with traditional CPDP models. The results also show that same categories (homogeneous) train-test data set pairs give high performance; otherwise, the prediction performances of different category data sets are mostly collapsed. Therefore, the proposed scheme is recommended as an alternative to predict defects that can improve the prediction of most of the cases compared with traditional cross-project SDP models.
Original languageEnglish
Title of host publicationLecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
Pages243-256
Number of pages14
ISBN (Electronic)9783030528553
DOIs
Publication statusPublished - 1 Jan 2020
Externally publishedYes
EventLecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST -
Duration: 1 Jan 2020 → …

Publication series

NameLecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
Volume325 LNICST
ISSN (Print)1867-8211

Conference

ConferenceLecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST
Period1/01/20 → …

Keywords

  • Cross-project defect prediction
  • Data set ensemble
  • Software defect prediction
  • Training data selection

Fingerprint

Dive into the research topics of 'Training data selection using ensemble dataset approach for software defect prediction'. Together they form a unique fingerprint.

Cite this