Chulalongkorn University Theses and Dissertations (Chula ETD)

The density-based minority over-sampling framework for class imbalanced problems

Other Title (Parallel Title in Other Language of ETD)

กรอบการทำงานสุ่มเพิ่มกลุ่มข้อมูลด้อยด้วยความหนาแน่นสำหรับปัญหากลุ่มข้อมูลอสมดุล

Chumphol Bunkhumpornpat, Faculty of Science

Year (A.D.)

2011

Document Type

Thesis

First Advisor

Krung Sinapiromsaran

Second Advisor

Chidchanok Lursinsap

Faculty/College

Faculty of Science (คณะวิทยาศาสตร์)

Degree Name

Doctor of Philosophy

Degree Level

Doctoral Degree

Degree Discipline

Computer Science

DOI

10.58837/CHULA.THE.2011.1087

Abstract

A dataset embodies the class imbalanced problem when the target class has a very small number of instances relative to the other classes. A trivial classifier typically fails to predict the positive instances due to its tiny size. In this thesis, the density-based minority over-sampling framework is proposed. It relies on a density-based notion of clusters and is designed to over-sample an arbitrarily shaped cluster discovered by the density-based clustering algorithm. In detail, my framework generates a synthetic instance along the shortest path from each instance in a cluster of a minority class to the pseudo-centroid of this cluster. Consequently, a set of the synthetic instances is dense near the pseudo-centroid and is sparse far from this centroid. Due to the distribution of the set, a classifier faces more emphatically around the core region than it does around the border region. The experimental results show that my framework improves accuracy, F-value (combination term of Precision and Recall), and AUC of a classifier more than SMOTE and Safe-Level-SMOTE.

Other Abstract (Other language abstract of ETD)

เซตข้อมูลจัดอยู่ในปัญหากลุ่มข้อมูลอสมดุลเมื่อกลุ่มข้อมูลเป้าหมายมีจำนวนข้อมูลน้อยมากเปรียบเทียบกับกลุ่มข้อมูลอื่น ตัวจำแนกกลุ่มข้อมูลโดยทั่วไปมีความผิดพลาดในการทำนายกลุ่มข้อมูลด้อยนี้เพราะจำนวนข้อมูลในกลุ่มมีขนาดเล็ก วิทยานิพนธ์ฉบับนี้ได้นำเสนอกรอบการทำงานสุ่มเพิ่มกลุ่มข้อมูลด้อยด้วยความหนาแน่น กรอบการทำงานนี้ถูกออกแบบให้สุ่มเพิ่มข้อมูลในกลุ่มข้อมูลรูปร่างทั่วไป โดยใช้หลักความหนาแน่นของกลุ่มข้อมูล กล่าวโดยละเอียด กรอบการทำงานนี้สร้างข้อมูลสังเคราะห์ตามแนววิถีสั้นสุดระหว่างข้อมูลแต่ละตัวและจุดเซนทรอยด์เทียมในกลุ่มข้อมูลของกลุ่มข้อมูลด้อย ดังนั้น เซตของข้อมูลสังเคราะห์มีความหนาแน่นใกล้จุดเซนทรอยด์เทียมและมีความเบาบางไกลจุดเซนทรอยด์เทียม จากการกระจายของเซตข้อมูลดังกล่าว ตัวจำแนกกลุ่มข้อมูลเน้นการเรียนรู้บริเวณแกนมากกว่าบริเวณขอบของกลุ่มข้อมูล ผลการทดลองแสดงให้เห็นว่ากรอบการทำงานนี้พัฒนา ความแม่นยำ ค่าเอฟ (เทอมรวมของพรีซิชันและรีคอล) และ เอยูซี มากกว่าขั้นตอนวิธีสโมทและเซฟเลเวลสโมท

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Bunkhumpornpat, Chumphol, "The density-based minority over-sampling framework for class imbalanced problems" (2011). Chulalongkorn University Theses and Dissertations (Chula ETD). 61099.
https://digital.car.chula.ac.th/chulaetd/61099

Link to Full Text

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

The density-based minority over-sampling framework for class imbalanced problems

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

The density-based minority over-sampling framework for class imbalanced problems

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Share

Search

Browse

Author Corner