Show simple item record

dc.contributor.authorGÜLDAL, Serkan
dc.date.accessioned2024-04-04T12:46:04Z
dc.date.available2024-04-04T12:46:04Z
dc.date.issued2021
dc.identifier.issn2147-3129
dc.identifier.urihttp://dspace.beu.edu.tr:8080/xmlui/handle/123456789/14771
dc.description.abstractImprovement of digital technology has caused the collected data sizes to increase at an accelerating rate. The increase in data size comes with new problems such as imbalanced data. If a dataset is imbalanced, the classes are not equally distributed. Therefore, the classification of the data causes performance losses since the classification algorithms assume the datasets are balanced. While the classification favors the majority class, the minority class is often misclassified. To reduce the imbalanced ratio, various studies have been performed in recent years. In general terms, these studies are undersampling, oversampling, or both to balance the imbalanced datasets. In this study, an oversampling method is proposed employing distance combined with mean based resampling method to produce synthetic samples for the minority class. For the resampling process, the distances between pairs are calculated by the Euclidean distance metric between the minority class members. Based on the calculated distances, the denser zones are identified in the sense of DBSCAN around every datum. The new synthetic samples are formed between the points in the zones and central points by using the Weighted Arithmetic Mean. Thus, in this study, the dataset has been approximated 500 (majority) and 535 (from 268 minority data). Moreover, Random Forest (RF) and Support Vector Machine (SVM) algorithms are used for the classification of raw and balanced datasets. The result showed that the proposed method has the best machine learning performance among all the listed methods.tr_TR
dc.language.isoEnglishtr_TR
dc.publisherBitlis Eren Üniversitesitr_TR
dc.rightsinfo:eu-repo/semantics/openAccesstr_TR
dc.subjectMachine Learning,tr_TR
dc.subjectRandom Forest,tr_TR
dc.subjectSupport Vector Machine,tr_TR
dc.subjectSynthetic Data,tr_TR
dc.subjectMedical Datatr_TR
dc.titleImproving Machine Learning Performance of Imbalanced Data by Resampling: DBSCAN and Weighted Arithmetic Meantr_TR
dc.typeArticletr_TR
dc.identifier.issue4tr_TR
dc.relation.journalBitlis Eren Üniversitesi Fen Bilimleri Dergisitr_TR
dc.identifier.volume10tr_TR


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record