Chulalongkorn University Theses and Dissertations (Chula ETD)

Thai-English supported automatic speech recognition for endoscopic reporting

Other Title (Parallel Title in Other Language of ETD)

การรู้จำเสียงอัตโนมัติที่รองรับภาษาไทย-อังกฤษสำหรับการออกรายงานการส่องกล้อง

Arpanant Saeng-xuto, Faculty of Engineering

Year (A.D.)

2024

Document Type

Thesis

First Advisor

Peerapon Vateekul

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Computer Engineering (ภาควิชาวิศวกรรมคอมพิวเตอร์)

Degree Name

Master of Engineering

Degree Level

Master's Degree

Degree Discipline

Computer Engineering

DOI

10.58837/CHULA.THE.2024.976

Abstract

This thesis presents the automatic speech recognition (ASR) system for endoscopic reporting that supports Thai-English code-switching. During endoscopic procedures, gastroenterologists are required to use both hands to handle instruments, thereby complicating the real-time documentation of abnormal findings. While recent advances in speech recognition offer promising solutions, existing models face difficulties with Thai-English code-switching and tend to overfit when fine-tuned on limited datasets. To overcome these limitations, we propose an ASR model enhanced with the Mixture of Experts (MoE) technique to improve transcription accuracy. Furthermore, the Named Entity Recognition (NER) model extracts gastrointestinal (GI) terminology from the transcriptions and classifies its categories to facilitate the reporting process. Experimental results show that our ASR model achieves low word error rates (GI: 1.12%, Thai: 2.06%) and high recall for medical (96.53%) and non-medical terms (95.85%), outperforming baseline models. The NER model demonstrates strong performance in terms of F1-score, recall, and precision (96.11%,96.06%, and 96.16%, respectively).

Other Abstract (Other language abstract of ETD)

วิทยานิพนธ์นี้นำเสนอระบบรู้จำเสียงอัตโนมัติ (ASR) สำหรับการออกรายงานการส่องกล้อง ที่รองรับการสลับภาษาระหว่างภาษาไทยและภาษาอังกฤษ ในระหว่างกระบวนการส่องกล้อง แพทย์ระบบทางเดินอาหารจำเป็นต้องใช้มือทั้งสองข้างในการควบคุมอุปกรณ์ ส่งผลให้การบันทึกข้อมูลความผิดปกติแบบทันทีเป็นเรื่องยาก แม้ว่าการพัฒนาล่าสุดในเทคโนโลยีรู้จำเสียงจะมีแนวโน้มที่ดี แต่โมเดลที่มีอยู่ยังประสบปัญหาในการจัดการกับการสลับภาษาไทย-อังกฤษ และมักเกิดการเรียนรู้ตรงกับข้อมูลฝึกสอนได้ดีมากเกินไป (overfitting) เมื่อทำการปรับแต่งกับชุดข้อมูลที่มีขนาดจำกัด เพื่อแก้ไขข้อจำกัดเหล่านี้ งานวิจัยนี้จึงเสนอโมเดล ASR ที่ได้รับการปรับปรุงด้วยเทคนิค “Mixture of Experts (MoE)” เพื่อเพิ่มความแม่นยำในการถอดเสียง นอกจากนี้ โมเดลรู้จำชื่อเฉพาะ (NER) ยังถูกนำมาใช้เพื่อสกัดคำศัพท์ทางระบบทางเดินอาหาร (GI) จากข้อความถอดเสียง และจัดประเภทคำศัพท์เหล่านี้เพื่ออำนวยความสะดวกในกระบวนการสร้างรายงาน ผลการทดลองแสดงให้เห็นว่าโมเดล ASR ของเรามีอัตราความผิดพลาดของคำ (word error rate) ต่ำ (GI: 1.12%, ภาษาไทย: 2.06%) พร้อมทั้งมีค่า “recall” สูงทั้งสำหรับคำศัพท์ทางการแพทย์ (96.53%) และคำศัพท์ทั่วไป (95.85%) ซึ่งดีกว่าโมเดลมาตรฐานทั้งหมด โมเดล NER ยังให้ผลการทำงานที่โดดเด่นด้วยค่า “F1-score”, “recall” และ “precision” อยู่ที่ 96.11%, 96.06% และ 96.16% ตามลำดับ

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Saeng-xuto, Arpanant, "Thai-English supported automatic speech recognition for endoscopic reporting" (2024). Chulalongkorn University Theses and Dissertations (Chula ETD). 74814.
https://digital.car.chula.ac.th/chulaetd/74814

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

Thai-English supported automatic speech recognition for endoscopic reporting

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

Thai-English supported automatic speech recognition for endoscopic reporting

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner