Chulalongkorn University Theses and Dissertations (Chula ETD)

Improving thai-dialect automatic speech recognition with curriculum learning

Other Title (Parallel Title in Other Language of ETD)

การเพิ่มความสามารถระบบรู้จําเสียงภาษาถิ่นด้วยการเรียนรู้แบบเป็นหลักสูตร

Artit Suwanbandit, Faculty of Engineering

Year (A.D.)

2024

Document Type

Thesis

First Advisor

Ekapol Chuangsuwanich

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Computer Engineering (ภาควิชาวิศวกรรมคอมพิวเตอร์)

Degree Name

Master of Engineering

Degree Level

Master's Degree

Degree Discipline

Computer Engineering

DOI

10.58837/CHULA.THE.2024.291

Abstract

Automatic Speech Recognition (ASR) has been the foremost feature and integrated in many real-world usages for decades. The novel deep learning approaches prevail in the ASR field as a model architecture. However, ASR still struggles with low-resource data, especially in the Thai dialectal language. Transfer learning is the conventional approach in the machine learning field for escalating the performance of low-resource settings. In the scenario that additional fine-tuning tasks are not achievable, exerting all efficiency of target data is an accountability. To overcome stagnation, we proposed transfer-based curriculum learning for low-resource dialectal ASR, Phonemetically-induced subword Out-of-vocabulary rate (PhIS-OOV), and Model Confidence Distance (MCD). PhIS-OOV is a curriculum scoring function that calibrates difficulty based on differences in pronunciation and spelling. MCD utilizes model probabilities with the edit distance algorithm to measure the difficulty. In addition, we have also contributed to the Thai and dialectal ASR communities by publishing Thai-central, 700 hours of Thai dataset, and Thai-dialect, three of the most spoken dialects of each Thai region, i.e., Korat, Khummuang, and Pattani. The procedure of building the read sentences, audio recording, voice verification, and text tokenization for establishing Thai-central and Thai-dialect datasets was clarified. Our results showed that curriculum learning approaches demonstrated better performance compared to the conventional transfer learning. Moreover, the PhIS-OOV can achieve remarkable performance by accomplishing 4.5%, 11.48%, and 12.78% character error rate reduction (CERR) in Khummuang, Korat, and Pattani, respectively.

Other Abstract (Other language abstract of ETD)

ระบบรู้จำเสียงอัตโนมัติเป็นหนึ่งในช่องทางการมีปฏิสัมพันธ์ที่เป็นธรรมชาติที่สุดของมนุษย์กับคอมพิวเตอร์ และถูกปรับใช้ในชีวิตประจำวันอย่างมากในปัจุบัน แต่อย่างไรก็ตาม ระบบรู้จำเสียงอัตโนมัติพบอุปสรรคเกี่ยวกับการขาดแคลนทรัพยากรข้อมูล โดนเฉาะอย่างยิ่งที่ภาษาถิ่นของไทย การเรียนรู้แบบถ่ายโอนถือเป็นวิธีตามประเพณีนิยมที่สามารถช่วยเพิ่มความสามารถของแบบจำลองระบบรู้จำเสียงที่พบกับปัญหาทรัพยากรข้อมูลน้อยได้ ซึ่งในสถานการณ์ที่ข้อมูลเพื่อแก้ไขปรับปรุงแบบจำลองสามารถเข้าถึงได้อย่างจำกัด การเค้นศักยภาพของข้อมูลที่มีให้ได้มากที่สุดจึงเป็นเรื่องจำเป็นอย่างยิ่ง เพื่อแก้ไขปัญหานี้ เราจึงเสนอวีธีการเรียนรู้แบบเป็นหลักสูตรสำหรับการเรียนรู้แบบถ่ายโอน 2 วิธีด้วยกัน คือ PhIS-OOV และระยะห่างของความมั่นใจของแบบจำลอง PhIS-OOV ทำหน้าที่วัดความห่างของการสะกดและการออกเสียงระหว่างข้อมูลก่อนการฝึกอบรมและข้อมูลเพื่อแก้ไขปรับปรุง และนำความห่างเหล่านั้นมาคิดเป็นคะแนนความยากของแต่ละตัวอย่าง ส่วนระยะห่างของความมั่นใจของแบบจำลองนั้นเป็นวิธีที่ปรับปรุงเพิ่มเติมจากระยะทางการถูกแก้ไขข้อความ โดยนำความมันใจของแบบจำลองมาร่วมคำนวณด้วย นอกจากนั้น งานนี้ยังเสนอข้อมูลเสียงสำหรับภาษาไทยและภาษาถิ่นไทยเพื่อยกระดับความสามารถของระบบรู้จำเสียงของไทยและภาษาถิ่น ข้อมูลเสียงประกอบไปด้วย ข้อมูล 700 ชั่วโมงสำหรับภาษาไทย รวมถึงอย่างละ 40 ชั่วโมงสำหรับภาษาถิ่นของแต่ละภูมิภาคของไทย ประกอบไปด้วย ภาษาคำเมือง (ภาคเหนือ) โคราช (ภาคตะวันออกเฉียงเหนือ) และภาษาปัตตานี (ภาคใต้) ผลการทดลองแสดงให้เห็นถึงความสามารถของการเรียนรู้แบบเป็นหลักสูตรสำหรับการเรียนรู้แบบถ่ายโอน ที่มีประสิทธิภาพเหนือกว่าการเรียนรู้แบบถ่ายโอนปกติ และวิธี PhIS-OOV สามารถให้ผลลัพธ์อัตราข้อผิดพลาดของตัวอักษรที่ดีกว่าเดิมถึง 4.5%, 11.48%, และ 12.78% ในข้อมูลภาษาคำเมือง โคราช และ ปัตตานีตามลำดับ

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Suwanbandit, Artit, "Improving thai-dialect automatic speech recognition with curriculum learning" (2024). Chulalongkorn University Theses and Dissertations (Chula ETD). 11546.
https://digital.car.chula.ac.th/chulaetd/11546

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

Improving thai-dialect automatic speech recognition with curriculum learning

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

Improving thai-dialect automatic speech recognition with curriculum learning

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner