Chulalongkorn University Theses and Dissertations (Chula ETD)

Multi-output learning for predicting evaluation and reopening of github pull requests on open-source projects

Other Title (Parallel Title in Other Language of ETD)

การเรียนรู้หลายผลลัพธ์สำหรับการทำนายการประเมินผลและการพิจารณาใหม่ของกิตฮับพูลรีเควสบนโอเพ่นซอร์สโปรเจค

Peerachai Banyongrakkul, Faculty of Commerce and Accountancy

Year (A.D.)

2022

Document Type

Thesis

First Advisor

Suronapee Phoomvuthisarn

Faculty/College

Faculty of Commerce and Accountancy (คณะพาณิชยศาสตร์และการบัญชี)

Department (if any)

Department of Statistics (ภาควิชาสถิติ)

Degree Name

Master of Science

Degree Level

Master's Degree

Degree Discipline

Statistics

DOI

10.58837/CHULA.THE.2022.340

Abstract

GitHub's pull-based development model is widely used by software development teams to manage software complexity. Contributors create pull requests for merging changes into the main codebase, and integrators review these requests to maintain quality and stability. However, a high volume of pull requests can overburden integrators, causing feedback delays. Previous studies have used machine learning and statistical techniques with tabular data as features, but these may lose meaningful information. Additionally, acceptance and latency may not be sufficient for the pull request evaluation. Moreover, reopened pull requests can add maintenance costs and burden already-busy developers. This thesis proposes a novel multi-output deep learning-based approach that early predicts acceptance, latency, and reopening of pull requests, handling various data sources, including tabular and textual data, effectively. Our approach also applies SMOTE and VAE techniques to address the highly imbalanced nature of the pull request reopening. We evaluate our approach on 143,886 pull requests from 54 well-known projects across four popular programming languages. The experimental results show that our approach significantly outperforms the randomized baseline. Moreover, our approach improves Accuracy by 8.68% and F1-Score by 6.77% in acceptance prediction, and MMAE by 6.07% in latency prediction, while improving Balanced Accuracy by 9.43% and AUC by 9.37% in reopening prediction over the existing approach.

Other Abstract (Other language abstract of ETD)

โมเดลการพัฒนาซอฟต์แวร์ของกิตฮับแบบพูลถูกนำมาใช้งานอย่างแพร่หลาย โดยที่คอนทริบิวเตอร์สามารถสร้างพูลรีเควส (คำขอดึงโค้ด) เพื่อรวมการเปลี่ยนแปลงเข้ากับที่เก็บรวมโค้ดหลักของโครงการ และอินทิเกรเตอร์มีหน้าที่ในการพิจารณาพูลรีเควสพวกนี้เพื่อรักษาคุณภาพ และความเสถียรภาพ อย่างไรก็ตาม จำนวนคำขอที่มากอาจทำให้ผู้รวบรวมต้องรับภาระงานเพิ่มขึ้น และทำให้มีความล่าช้าในการตอบกลับคอนทริบิวเตอร์ งานวิจัยก่อนหน้านี้ใช้เทคนิคของการเรียนรู้ของเครื่องและเทคนิคทางสถิติด้วยข้อมูลตาราง แต่อาจสูญเสียข้อมูลที่สำคัญได้ นอกจากนี้ เฉพาะการยอมรับและระยะเวลาในการพิจารณาอาจเป็นปัจจัยที่ไม่เพียงพอสำหรับการประเมินพูลรีเควส อีกทั้งการเปิดเพื่อพิจารณาของพูลรีเควสอีกครั้งสามารถเพิ่มค่าใช้จ่ายในการดูแลรักษา และเพิ่มภาระการทำงานของนักพัฒนาที่มีงานต่างๆมากอยู่แล้ว วิทยานิพนธ์นี้จึงนำเสนอวิธีการใช้การเรียนรู้เชิงลึกแบบหลายผลลัพธ์แบบใหม่ซึ่งสามารถทำนายการยอมรับ ระยะเวลา และการเปิดเพื่อพิจารณาใหม่ของพูลรีเควสล่วงหน้า โดยรองรับแหล่งข้อมูลที่หลากหลายอย่างมีประสิทธิภาพ ในที่นี้หมายถึงข้อมูลแบบตารางและข้อมูลข้อความ วิธีการของผู้จัดทำยังใช้เทคนิค SMOTE และ VAE เพื่อจัดการกับความไม่สมดุลของการเปิดเพื่อพิจารณาพูลรีเควสใหม่ ผู้จัดทำได้ประเมินผลการทดลองของวิธีการด้วยพูลรีเควส จำนวน 143,886 จาก 54 โครงการที่รู้จักกันอย่างแพร่หลายใน 4 ภาษาโปรแกรมยอดนิยม ผลการทดสอบแสดงให้เห็นว่าวิธีการของผู้จัดทำมีประสิทธิภาพมากกว่าบรรทัดฐานแบบสุ่ม และช่วยเพิ่มความแม่นยำได้ถึง 8.68% และเพิ่ม F1-Score ได้ถึง 6.77% ในการทำนายการยอมรับ และ 6.07% สำหรับ MMAE ในการทำนายระยะเวลา พร้อมทั้งเพิ่มความแม่นยำที่ถูกปรับเพื่อความสมดุล ไปถึง 9.43% และ AUC ไปถึง 9.37% ในการทำนายการเปิดเพื่อพิจารณาใหม่ เมื่อเปรียบเทียบกับวิธีการที่มีอยู่ในปัจจุบัน

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Banyongrakkul, Peerachai, "Multi-output learning for predicting evaluation and reopening of github pull requests on open-source projects" (2022). Chulalongkorn University Theses and Dissertations (Chula ETD). 6051.
https://digital.car.chula.ac.th/chulaetd/6051

Download

Included in

Statistics and Probability Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

Multi-output learning for predicting evaluation and reopening of github pull requests on open-source projects

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

Multi-output learning for predicting evaluation and reopening of github pull requests on open-source projects

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner