A Review of Informative Data Level Resampling Approaches for Solving Class Imbalanced Problem

Dickson, Dako Apaleokhai; Alhassan, John Kolo; Adepoju, Solomon Adelowo

Please use this identifier to cite or link to this item: http://ir.futminna.edu.ng:8080/jspui/handle/123456789/6776

Full metadata record

DC Field	Value	Language
dc.contributor.author	Dickson, Dako Apaleokhai	-
dc.contributor.author	Alhassan, John Kolo	-
dc.contributor.author	Adepoju, Solomon Adelowo	-
dc.date.accessioned	2021-07-06T12:40:47Z	-
dc.date.available	2021-07-06T12:40:47Z	-
dc.date.issued	2021-05	-
dc.identifier.uri	http://repository.futminna.edu.ng:8080/jspui/handle/123456789/6776	-
dc.description.abstract	In the field of machine learning, Imbalanced learning being one among the most challenging classification problems which is also very common among application dataset. Although, imbalanced approach has received increasing attention over the years due to the necessity of handling real world dataset which are usually skewed in nature, possessing various data difficulty factors. The goal of this work is the review of resampling techniques to identify if data intrinsic characteristics were mostly considered during the design of resampling technique. It went further to categorise the techniques into distance, cluster and evolutionary based method, from the result of said process, also presented the advantages and disadvantages of each category and finally, stating general achievements and drawbacks in resampling approaches. The total search that was conducted for this work, yielded 227 papers published within the last two decades, with emphasis on the last. These articles from imbalanced data domains went through different filtering methods, before been finally reduced to 52. It was presented in this work that distanced based methods have received more attention when compared with cluster based and evolutionary based method, this may be due to its merits, which have been presented in this work. From several previous works, data intrinsic characteristics have been found to be more problematic to learning classifier than imbalanced problem. However, from the findings of this work, it was established that despite the report by publications that data intrinsic characteristics are more harmful than imbalanced nature of data, most existing resampling techniques do not regard data intrinsic characteristic in their design, this may be due to the popular nature and attention drawn by imbalanced problem in publications. However, there are some limiting factors that also need to be resolved generally on all the resampling methods such as: lack of consideration of possible relevant examples in undersampling process, lack of outstanding examples interrelationship and similarities evaluation methods. For future work, a robust resampling technique that will critically consider data difficulty factors when evaluating the region and the examples to oversample and undersample. Resampling techniques should also be evaluated against the different types of difficulty factor so as to ascertain the difficulty type it is best used on to achieve great result.	en_US
dc.language.iso	en	en_US
dc.publisher	Cyber Nigeria/IEEE	en_US
dc.subject	machine learning	en_US
dc.subject	imbalance data	en_US
dc.subject	preprocessing	en_US
dc.subject	data difficulty factors	en_US
dc.title	A Review of Informative Data Level Resampling Approaches for Solving Class Imbalanced Problem	en_US
dc.type	Article	en_US
Appears in Collections:	Computer Science

Files in This Item:

File	Description	Size	Format
Imbalance dataset abs.pdf		1.26 MB	Adobe PDF	View/Open

Show simple item record