Chulalongkorn University Theses and Dissertations (Chula ETD)

An application of reinforcement learning to credit scoring based on the logistic Bandit framework

Other Title (Parallel Title in Other Language of ETD)

การประยุกต์ใช้การเรียนรู้แบบเสริมกำลังสำหรับการให้คะแนนเครดิตภายใต้กรอบปัญหาโลจิสติกแบนดิต

Kantapong Visantavarakul, Faculty of Commerce and Accountancy

Year (A.D.)

2022

Document Type

Thesis

First Advisor

Seksan Kiatsupaibul

Faculty/College

Faculty of Commerce and Accountancy (คณะพาณิชยศาสตร์และการบัญชี)

Department (if any)

Department of Statistics (ภาควิชาสถิติ)

Degree Name

Master of Science

Degree Level

Master's Degree

Degree Discipline

Statistics

DOI

10.58837/CHULA.THE.2022.339

Abstract

This study applies reinforcement learning to credit scoring by using the logistic bandit framework. The credit scoring and the credit underwriting are modeled into a single sequential decision problem where the credit underwriter takes a sequence of actions over an indefinite number of time steps. The traditional credit scoring approach considers the model construction separately from the underwriting process. This approach is identified as a greedy algorithm in the reinforcement learning literature, which is commonly believed to be inferior to an efficient reinforcement learning approach such as Thompson sampling. This is true under the simple setting, i.e., granting credit to a single borrower per action while the pool of the borrowers is fixed. However, under the more realistic scenario where these two conditions are relaxed, the greedy approach can outperform Thompson sampling since the greedy algorithm does not commit too early to an inferior action as it does in the simple setting. Still, the efficient exploration feature of Thompson sampling is beneficial. When the borrower characteristics are captured by a large number of features, the exploration mechanism enables Thompson sampling to outperform the greedy algorithm. The results from the simulation study permit a deeper understanding of the reinforcement learning approaches towards the logistic bandits, especially in the setting of credit scoring and credit underwriting processes.

Other Abstract (Other language abstract of ETD)

งานวิจัยนี้มีวัตถุประสงค์เพื่อประยุกต์ใช้การเรียนรู้แบบเสริมกำลังสำหรับการให้คะแนนเครดิตภายใต้กรอบปัญหาโลจิสติกแบนดิต การให้คะแนนเครดิตและการให้สินเชื่อสามารถจัดอยู่ในรูปแบบปัญหาการตัดสินใจอย่างเป็นลำดับโดยผู้ให้สินเชื่อจะตัดสินใจเลือกการกระทำโดยที่จุดสิ้นสุดของเวลานั้นไม่มีกำหนด วิธีการให้คะแนนเครดิตแบบดั้งเดิมพิจารณาการสร้างโมเดลแยกออกจากการให้สินเชื่อ ในการเรียนรู้แบบเสริมกำลัง วิธีนี้เรียกว่า ขั้นตอนวิธีแบบละโมบ (greedy algorithm) ซึ่งเชื่อกันอย่างแพร่หลายว่าให้ประสิทธิภาพที่ด้อยกว่าการเรียนรู้แบบเสริมกำลังที่มีประสิทธิภาพ เช่น การสุ่มตัวอย่างแบบทอมสัน (Thompson sampling) สมมติฐานนี้เป็นจริงในสถานการณ์แบบง่าย นั่นคือ ในแต่ละช่วงเวลาผู้ให้กู้จะให้สินเชื่อได้แค่คนเดียวในขณะที่ผู้กู้สินเชื่อยังเป็นรายเดิมอยู่ตลอด อย่างไรก็ตาม ในสถานการณ์ที่สมจริงมากขึ้น นั่นคือ ไม่มีเงื่อนไขทั้งสองข้อดังกล่าว ขั้นตอนวิธีแบบละโมบสามารถให้ประสิทธิภาพที่ดีกว่าการสุ่มตัวอย่างแบบทอมสัน เนื่องจาก ขั้นตอนวิธีแบบละโมบไม่ได้ยึดติดเร็วเกินไปกับการกระทำที่ให้ผลตอบแทนที่ด้อยกว่าซึ่งไม่เหมือนกับในสถานการณ์แบบง่าย ถึงแม้จะเป็นเช่นนั้น การสำรวจแบบมีประสิทธิภาพของการสุ่มตัวอย่างแบบทอมสันก็ยังมีประโยชน์ในการเรียนรู้ภายใต้สถานการณ์นี้ ในกรณีที่จำนวนตัวแปรที่อธิบายลักษณะของผู้ขอกู้สินเชื่อมีจำนวนมาก การสุ่มตัวอย่างแบบทอมสันสามารถให้ผลลัพธ์ที่ดีกว่าขั้นตอนวิธีแบบละโมบ ผลลัพธ์ที่ได้จากการศึกษานี้คาดว่าจะเป็นประโยชน์ในการทำความเข้าใจการเรียนรู้แบบเสริมกำลังภายใต้กรอบปัญหาโลจิสติกแบนดิตได้ดียิ่งขึ้น โดยเฉพาะในกระบวนการให้คะแนนเครดิตและการให้สินเชื่อ

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Visantavarakul, Kantapong, "An application of reinforcement learning to credit scoring based on the logistic Bandit framework" (2022). Chulalongkorn University Theses and Dissertations (Chula ETD). 6050.
https://digital.car.chula.ac.th/chulaetd/6050

Download

Included in

Statistics and Probability Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

An application of reinforcement learning to credit scoring based on the logistic Bandit framework

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

An application of reinforcement learning to credit scoring based on the logistic Bandit framework

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner