Chulalongkorn University Theses and Dissertations (Chula ETD)

แบบจำลองการเรียนรู้โดยคำกระตุ้นสำหรับการจำแนกข้อความแบบน้อยนัด

Other Title (Parallel Title in Other Language of ETD)

Prompt-based learning model for few-shot text classification

ธนกร ทำอิ่นแก้ว, คณะวิศวกรรมศาสตร์

Year (A.D.)

2023

Document Type

Thesis

First Advisor

พีรพล เวทีกูล

Second Advisor

ปิยวัฒน์ เลิศวิทยากำจร

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Computer Engineering (ภาควิชาวิศวกรรมคอมพิวเตอร์)

Degree Name

วิทยาศาสตรมหาบัณฑิต

Degree Level

ปริญญาโท

Degree Discipline

วิทยาศาสตร์คอมพิวเตอร์

DOI

10.58837/CHULA.THE.2023.898

Abstract

การเรียนรู้โดยคำกระตุ้น (Prompt-based learning) ได้แสดงให้เห็นถึงประสิทธิภาพในการจำแนกประเภทข้อความในสถานการณ์ที่มีข้อมูลแบบน้อยนัด (Few-shot settings) ซึ่งเหนือกว่าวิธีการการปรับใช้แบบจำลองอย่างเต็มรูปแบบ (Full fine-tune) วิธีการนี้แปลงข้อมูลข้อความด้วยแม่แบบที่กำหนดขึ้น เพื่อให้แบบจำลองพยากรณ์คำที่ถูกปิดบัง (Masked language modeling) จากนั้นใช้ตัวแปลงคำ (Verbalizer) เพื่อจับคู่ผลลัพธ์ของแบบจำลองกับป้ายกำกับที่ต้องการ อย่างไรก็ตาม วิธีการจำแนกประเภทข้อความด้วยการเรียนรู้โดยคำกระตุ้นถูกพัฒนาสำหรับชุดข้อมูลภาษาอังกฤษเป็นหลัก ซึ่งอาจมีข้อจำกัดในการใช้งานในภาษาทรัพยากรต่ำ เช่น ภาษาไทย เพื่อแก้ไขปัญหานี้ งานวิจัยนี้ได้แนะนำวิธีการสองแบบ ได้แก่ LAAV สำหรับการจำแนกประเภทข้อความที่มีป้ายกำกับเดียว และ PLAML สำหรับการจำแนกประเภทข้อความที่มีหลายป้ายกำกับ โดยทั้งสองวิธีมีเป้าหมายเพื่อปรับปรุงการเรียนรู้โดยคำกระตุ้น สำหรับการจำแนกประเภทข้อความด้วยข้อมูลแบบน้อยนัด LAAV เสริมประสิทธิภาพด้วยการรวมคำ "และ" เข้าไป ทำให้สามารถสร้างคำที่เหมาะสมกว่าสำหรับตัวแปลงคำ และปรับให้ป้ายกำกับเข้ากับผลลัพธ์ของโมเดลภาษาได้ดียิ่งขึ้น ในขณะเดียวกัน PLAML ก็ได้รับการพัฒนาเพื่อรับมือกับข้อจำกัดของการจำแนกประเภทข้อความที่มีหลายป้ายกำกับ ซึ่งแต่ละป้ายกำกับมักมีความสัมพันธ์กัน โดยเสนอเทคนิคสามอย่าง ได้แก่ ค่าน้ำหนักของคำตัวแทนที่รับรู้ป้ายกำกับ แม่แบบที่รับรู้ป้ายกำกับ และกลไกการกำหนดเกณฑ์แบบเปลี่ยนแปลงได้ จากการทดลองพบว่า เทคนิค LAAV และ PLAML ได้เพิ่มความแม่นยำในการจำแนกประเภททั้งในชุดข้อมูลภาษาไทยและภาษาอังกฤษ

Other Abstract (Other language abstract of ETD)

Prompt-based learning has proven effective in few-shot text classification, outperforming traditional fine-tuning methods. This approach converts text inputs into masked language modeling prompts via templates, uses a fine-tuned language model to complete these prompts, and then employs a verbalizer to align the model's output with a specific class. Initially, this method was mainly developed for English datasets, which may have limited its application in low-resource languages like Thai. Therefore, our research introduces two methods: the Label-Aware Automatic Verbalizer (LAAV) for single-label classification and the Prompt-based Label-Aware framework for Multi-Label classification (PLAML), both aimed at refining prompt-based learning for few-shot classification. LAAV enhances manual labels by incorporating the conjunction “and”, thereby generating more suitable words for the verbalizer and aligning manual labels more effectively with the language model's outputs. Meanwhile, PLAML addresses the challenges of multi-label text classification in which labels often correlate. It integrates three techniques: a token weighting algorithm that considers label correlations, a label-aware training sample augmentation template, and a dynamic threshold mechanism for precise label prediction. The experiments show that LAAV and PLAML improve classification accuracy in both Thai and English datasets.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

ทำอิ่นแก้ว, ธนกร, "แบบจำลองการเรียนรู้โดยคำกระตุ้นสำหรับการจำแนกข้อความแบบน้อยนัด" (2023). Chulalongkorn University Theses and Dissertations (Chula ETD). 11939.
https://digital.car.chula.ac.th/chulaetd/11939

Download

Included in

Computer Sciences Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

แบบจำลองการเรียนรู้โดยคำกระตุ้นสำหรับการจำแนกข้อความแบบน้อยนัด

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

แบบจำลองการเรียนรู้โดยคำกระตุ้นสำหรับการจำแนกข้อความแบบน้อยนัด

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner