Improving Machine Learning Performance of Imbalanced Data by Resampling: DBSCAN and Weighted Arithmetic Mean

GÜLDAL, Serkan

dc.contributor.author	GÜLDAL, Serkan
dc.date.accessioned	2024-04-04T12:46:04Z
dc.date.available	2024-04-04T12:46:04Z
dc.date.issued	2021
dc.identifier.issn	2147-3129
dc.identifier.uri	http://dspace.beu.edu.tr:8080/xmlui/handle/123456789/14771
dc.description.abstract	Improvement of digital technology has caused the collected data sizes to increase at an accelerating rate. The increase in data size comes with new problems such as imbalanced data. If a dataset is imbalanced, the classes are not equally distributed. Therefore, the classification of the data causes performance losses since the classification algorithms assume the datasets are balanced. While the classification favors the majority class, the minority class is often misclassified. To reduce the imbalanced ratio, various studies have been performed in recent years. In general terms, these studies are undersampling, oversampling, or both to balance the imbalanced datasets. In this study, an oversampling method is proposed employing distance combined with mean based resampling method to produce synthetic samples for the minority class. For the resampling process, the distances between pairs are calculated by the Euclidean distance metric between the minority class members. Based on the calculated distances, the denser zones are identified in the sense of DBSCAN around every datum. The new synthetic samples are formed between the points in the zones and central points by using the Weighted Arithmetic Mean. Thus, in this study, the dataset has been approximated 500 (majority) and 535 (from 268 minority data). Moreover, Random Forest (RF) and Support Vector Machine (SVM) algorithms are used for the classification of raw and balanced datasets. The result showed that the proposed method has the best machine learning performance among all the listed methods.	tr_TR
dc.language.iso	English	tr_TR
dc.publisher	Bitlis Eren Üniversitesi	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	tr_TR
dc.subject	Machine Learning,	tr_TR
dc.subject	Random Forest,	tr_TR
dc.subject	Support Vector Machine,	tr_TR
dc.subject	Synthetic Data,	tr_TR
dc.subject	Medical Data	tr_TR
dc.title	Improving Machine Learning Performance of Imbalanced Data by Resampling: DBSCAN and Weighted Arithmetic Mean	tr_TR
dc.type	Article	tr_TR
dc.identifier.issue	4	tr_TR
dc.relation.journal	Bitlis Eren Üniversitesi Fen Bilimleri Dergisi	tr_TR
dc.identifier.volume	10	tr_TR

Files in this item

Name:: 10.17798-bitlisfen.985519-1935 ...
Size:: 910.2Kb
Format:: PDF
Description:: Tam Metin/Full Text

View/Open

This item appears in the following Collection(s)

Cilt 10, Sayı 4 (2021) [35]

Show simple item record