• Login
    View Item 
    •   DSpace Home
    • 2-DERGİLER
    • 03) Bitlis Eren Üniversitesi Fen Bilimleri Dergisi
    • Cilt 10, Sayı 4 (2021)
    • View Item
    •   DSpace Home
    • 2-DERGİLER
    • 03) Bitlis Eren Üniversitesi Fen Bilimleri Dergisi
    • Cilt 10, Sayı 4 (2021)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Improving Machine Learning Performance of Imbalanced Data by Resampling: DBSCAN and Weighted Arithmetic Mean

    Thumbnail
    View/Open
    Tam Metin/Full Text (910.2Kb)
    Date
    2021
    Author
    GÜLDAL, Serkan
    Metadata
    Show full item record
    Abstract
    Improvement of digital technology has caused the collected data sizes to increase at an accelerating rate. The increase in data size comes with new problems such as imbalanced data. If a dataset is imbalanced, the classes are not equally distributed. Therefore, the classification of the data causes performance losses since the classification algorithms assume the datasets are balanced. While the classification favors the majority class, the minority class is often misclassified. To reduce the imbalanced ratio, various studies have been performed in recent years. In general terms, these studies are undersampling, oversampling, or both to balance the imbalanced datasets. In this study, an oversampling method is proposed employing distance combined with mean based resampling method to produce synthetic samples for the minority class. For the resampling process, the distances between pairs are calculated by the Euclidean distance metric between the minority class members. Based on the calculated distances, the denser zones are identified in the sense of DBSCAN around every datum. The new synthetic samples are formed between the points in the zones and central points by using the Weighted Arithmetic Mean. Thus, in this study, the dataset has been approximated 500 (majority) and 535 (from 268 minority data). Moreover, Random Forest (RF) and Support Vector Machine (SVM) algorithms are used for the classification of raw and balanced datasets. The result showed that the proposed method has the best machine learning performance among all the listed methods.
    URI
    http://dspace.beu.edu.tr:8080/xmlui/handle/123456789/14771
    Collections
    • Cilt 10, Sayı 4 (2021) [35]





    Creative Commons License
    DSpace@BEU by Bitlis Eren University Institutional Repository is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 Unported License..

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     




    | Yönerge | Rehber | İletişim |

    sherpa/romeo

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsBy TypeThis CollectionBy Issue DateAuthorsTitlesSubjectsBy Type

    My Account

    LoginRegister

    DSpace software copyright © 2002-2016  DuraSpace
    Contact Us | Send Feedback
    Theme by 
    Atmire NV