Chulalongkorn University Theses and Dissertations (Chula ETD)

การเปรียบเทียบวิธีการปรับแต่งไฮเปอร์พารามิเตอร์ในโมเดลจำแนกทางการศึกษาด้วยการจำลองแบบมอนติคาร์โล

Other Title (Parallel Title in Other Language of ETD)

Comparison of Hyperparameter Tuning Techniques in Educational Classification Models by Monte-Carlo Simulation

โยธิน หมายมั่น, คณะครุศาสตร์

Year (A.D.)

2024

Document Type

Thesis

First Advisor

ประภาศิริ รัชชประภาพรกุล

Faculty/College

Faculty of Education (คณะครุศาสตร์)

Department (if any)

Department of Educational Research and Psychology (ภาควิชาวิจัยและจิตวิทยาการศึกษา)

Degree Name

ครุศาสตรมหาบัณฑิต

Degree Level

ปริญญาโท

Degree Discipline

วิธีวิทยาการพัฒนานวัตกรรมทางการศึกษา

DOI

10.58837/CHULA.THE.2024.1419

Abstract

การวิจัยนี้มีวัตถุประสงค์เพื่อ (1) วิเคราะห์ปัจจัยที่ส่งผลต่อประสิทธิภาพในการปรับแต่งไฮเปอร์พารามิเตอร์ของโมเดลจำแนกประเภท โดยใช้อัลกอริทึม Random Forest (RF) และ Support Vector Machine (SVM) และ (2) นำเสนอแนวทางในการเลือกใช้อัลกอริทึมและกำหนดขอบเขตพื้นที่การค้นหาไฮเปอร์พารามิเตอร์ที่เหมาะสมกับลักษณะของข้อมูล โดยแบ่งการศึกษาออกเป็น 2 ตอน ดังนี้ ตอนที่ 1 เป็นการจำลองข้อมูลด้วยวิธีมอนติคาร์โล (Monte Carlo Simulation) เพื่อสร้างชุดข้อมูลที่แตกต่างกันตามปัจจัยต่อไปนี้ (1) ขนาดตัวอย่าง (50, 200, 500) (2) จำนวนตัวแปรต่อเนื่อง (3, 5, 7) (3) ขนาดอิทธิพลของตัวแปร (ระดับสูง, ระดับกลาง) (4) ความสัมพันธ์ระหว่างตัวแปร (เชิงเส้น, ไม่เชิงเส้น) และ (5) พื้นที่การค้นหาค่าไฮเปอร์พารามิเตอร์ (all, importance) แล้วทำการปรับแต่งไฮเปอร์พารามิเตอร์ด้วย 4 วิธี ได้แก่ Bayesian Optimization (BO), Genetic Algorithm (GA), Hyperband (HPB) และ Particle Swarm Optimization (PSO) จากนั้นประเมินประสิทธิภาพของโมเดลด้วย 3 องค์ประกอบได้แก่ ได้แก่ (1) ประสิทธิภาพของการจำแนก โดยจะนำเสนอค่าความถูกต้อง (accuracy) (2) ความเสถียรของประสิทธิภาพ โดยพิจารณาจากสัมประสิทธิ์การแปรผัน (coefficient of variation: CV) และ (3) ดัชนีวัดประสิทธิภาพเชิงเวลา (time-efficiency index) โดยใช้เทคนิค XAI (feature permutation mean dropout loss) ผลการวิเคราะห์พบว่า ในด้านประสิทธิภาพในการจำแนก จำนวนตัวแปรต่อเนื่องมีความสำคัญสูงสุด (mean dropout loss = 0.110) รองลงมาคือ ขนาดตัวอย่าง (mean dropout loss = 0.103–0.101) ในด้านความเสถียรของประสิทธิภาพ ขนาดตัวอย่างมีความสำคัญสูงสุด (mean dropout loss = 0.090) ในด้านประสิทธิภาพเชิงเวลา ขนาดตัวอย่างมีความสำคัญสูงสุด (mean dropout loss = 1.563) ตามด้วยอัลกอริทึมในการปรับแต่ง (mean dropout loss = 1.456) ตอนที่ 2 เป็นการสังเคราะห์ผลการวิจัยเพื่อนำเสนอแนวทางการเลือกใช้อัลกอริทึมและการกำหนดขอบเขตการค้นหาที่เหมาะสมตามลักษณะของข้อมูล พบว่า (1) กรณีต้องการค่าความถูกต้องสูงควรเลือก BO หรือ HPB ร่วมกับ RF หรือ SVM (kernel = radial) โดยใช้พื้นที่การค้นหาเฉพาะไฮเปอร์พารามิเตอร์ที่สำคัญ (importance) (2) กรณีต้องการความคุ้มค่าระหว่างประสิทธิภาพและเวลา ควรเลือก PSO ร่วมกับ RF และใช้พื้นที่การค้นหาโดยใช้ไฮเปอร์พารามิเตอร์ทั้งหมด (all) และ (3) กรณีต้องการทั้งความแม่นยำและความเสถียร ควรเลือก BO ร่วมกับ SVM (kernel = linear) โดยใช้พื้นที่การค้นหาแบบ all

Other Abstract (Other language abstract of ETD)

This study aims to (1) analyze factors influencing the performance of hyperparameter tuning in classification models using Random Forest (RF) and Support Vector Machine (SVM) algorithms, and (2) propose guidelines for selecting appropriate optimization algorithms and defining hyperparameter search spaces based on data characteristics. The study is divided into two parts. Part 1 involves Monte Carlo simulations to generate datasets under varying conditions: (1) sample sizes (50, 200, 500), (2) numbers of continuous variables (3, 5, 7), (3) levels of predictor influence (high, medium), (4) relationships among variables (linear, nonlinear), and (5) hyperparameter search spaces (all, importance). Hyperparameter tuning was conducted using four algorithms: Bayesian Optimization (BO), Genetic Algorithm (GA), Hyperband (HPB), and Particle Swarm Optimization (PSO). Model performance was evaluated based on three criteria: (1) classification performance (accuracy), (2) stability (coefficient of variation: CV), and (3) time-efficiency index. The XAI technique (feature permutation mean dropout loss) was used to interpret feature importance. Results showed that, for classification accuracy, the number of continuous variables was the most important factor (mean dropout loss = 0.110), followed by sample size (mean dropout loss = 0.103–0.101). For performance stability, sample size was the most influential factor (mean dropout loss = 0.090). For time efficiency, sample size again had the greatest impact (mean dropout loss = 1.563), followed by the tuning algorithm (mean dropout loss = 1.456). Part 2 synthesizes the findings to propose guidelines for algorithm and search space selection based on data characteristics. It was found that: (1) to maximize accuracy, BO or HPB combined with RF or SVM (radial kernel) and an importance-based search space should be used ; (2) for a balance between accuracy and time efficiency, PSO combined with RF and an all-inclusive search space is recommended ; and (3) for both accuracy and stability, BO combined with SVM (linear kernel) using an all-inclusive search space is most suitable.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

หมายมั่น, โยธิน, "การเปรียบเทียบวิธีการปรับแต่งไฮเปอร์พารามิเตอร์ในโมเดลจำแนกทางการศึกษาด้วยการจำลองแบบมอนติคาร์โล" (2024). Chulalongkorn University Theses and Dissertations (Chula ETD). 75613.
https://digital.car.chula.ac.th/chulaetd/75613

Download

Included in

Educational Assessment, Evaluation, and Research Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

การเปรียบเทียบวิธีการปรับแต่งไฮเปอร์พารามิเตอร์ในโมเดลจำแนกทางการศึกษาด้วยการจำลองแบบมอนติคาร์โล

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

การเปรียบเทียบวิธีการปรับแต่งไฮเปอร์พารามิเตอร์ในโมเดลจำแนกทางการศึกษาด้วยการจำลองแบบมอนติคาร์โล

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner