dc.description.abstract | Improvement of digital technology has caused the collected data sizes to increase at an accelerating rate. The
increase in data size comes with new problems such as imbalanced data. If a dataset is imbalanced, the classes are
not equally distributed. Therefore, the classification of the data causes performance losses since the classification
algorithms assume the datasets are balanced. While the classification favors the majority class, the minority class
is often misclassified. To reduce the imbalanced ratio, various studies have been performed in recent years. In
general terms, these studies are undersampling, oversampling, or both to balance the imbalanced datasets. In this
study, an oversampling method is proposed employing distance combined with mean based resampling method to
produce synthetic samples for the minority class. For the resampling process, the distances between pairs are
calculated by the Euclidean distance metric between the minority class members. Based on the calculated
distances, the denser zones are identified in the sense of DBSCAN around every datum. The new synthetic samples
are formed between the points in the zones and central points by using the Weighted Arithmetic Mean. Thus, in
this study, the dataset has been approximated 500 (majority) and 535 (from 268 minority data). Moreover, Random
Forest (RF) and Support Vector Machine (SVM) algorithms are used for the classification of raw and balanced
datasets. The result showed that the proposed method has the best machine learning performance among all the
listed methods. | tr_TR |