Chulalongkorn University Theses and Dissertations (Chula ETD)

Efficient text bounding box identification using mask R-CNN: case of thai documents

Other Title (Parallel Title in Other Language of ETD)

การหากรอบข้อความจากภาพด้วยการใช้ แมสก์ อาซีเอ็นเอ็น: กรณีเอกสารภาษาไทย

Phanthakan Kiatphaisansophon, Faculty of Science

Year (A.D.)

2023

Document Type

Thesis

First Advisor

Dittaya Wanvarie

Second Advisor

Nagul Cooharojananone

Faculty/College

Faculty of Science (คณะวิทยาศาสตร์)

Degree Name

Master of Science

Degree Level

Master's Degree

Degree Discipline

Science for Industry

DOI

10.58837/CHULA.THE.2023.940

Abstract

Text detection is a fundamental task in computer vision, particularly for Optical Character Recognition (OCR) application. This research focuses on text detection part of the OCR. Our previous text detection model, CRAFT (Character-Region Awareness For Text detection) has shown promising outcomes in bounding box identification. Nevertheless, it encounters issues associated with post-processing and multiline text. The post-processing related to the necessity to reconfigure the model when new documents are introduced, resulting in inefficiencies and complexities. Also the CRAFT model tends to blend bounding boxes from consecutive lines, introducing errors and impacting the accuracy in the text recognition part of the application. To tackle these challenges, this study proposes an adjusted approach based on Mask R-CNN which is an object detection model and separate the objects in the pixel level. By adopting the Mask R-CNN approach, the study successfully tackles post-processing issues. Moreover, the multiline text issue is effectively resolved. Comparative experiments reveal that the Mask R-CNN model achieves similar results to CRAFT in terms of accuracy and versatility when the new documents are introduced. The results demonstrate high efficiency and potential for real-world applications, including reducing time inference and resource utilization when deploying the model.

Other Abstract (Other language abstract of ETD)

การตรวจจับข้อความเป็นงานพื้นฐานในคอมพิวเตอร์วิทัศน์ โดยเฉพาะสำหรับการใช้ในแอปพลิเคชัน การรู้จำอักขระด้วยแสง งานวิจัยนี้จะเน้นและเจาะลึกไปในส่วนของการตรวจจับข้อความในแอปพลิเคชัน ก่อนหน้านี้มีตัวแบบตรวจจับข้อความ การตระหนักบริเวณอักขระสำหรับการตรวจจับตัวอักษร (CRAFT) ที่ให้ความแม่นยำที่สูงอยู่แล้ว อย่างไรก็ตาม ตัวแบบนี้มีจุดบกพร่องคือปัญหาการปรับปรุงกล่องข้อความหลังการประมวลผล และการตรวจจับข้อความหลายบรรทัด ปัญหาการปรับปรุงกล่องข้อความหลังการประมวลผลเกิดขึ้น เนื่องจากจำเป็นต้องปรับแต่งตัวแบบใหม่เมื่อมีเอกสารใหม่เข้ามา ซึ่งทำให้เกิดความไม่มีประสิทธิภาพและซับซ้อน และอีกปัญหาที่พบจากตัวแบบ CRAFT คือ ตัวแบบมีแนวโน้มที่จะผสมกล่องข้อความจากต่างบรรทัดเข้าด้วยกันซึ่งทำให้เกิดความผิดพลาดในการรู้จำข้อความซึ่งเป็นส่วนถัดไปของระบบ เพื่อแก้ไขปัญหาเหล่านี้ เราได้ทำการเสนองานวิจัยนี้ โดยการปรับใช้ตัวแบบ มาส์ก อาร์ซีเอ็นเอ็น มาทำการตรวจจับวัตถุ และแบ่งแยกวัตถุแต่ละชิ้นออกจากกันในระดับพิกเซล โดยการใช้ตัวแบบ มาส์ก อาร์ซีเอ็นเอ็น สามารถกำจัดปัญหาเรื่องการปรับปรุงกล่องข้อความหลังการประมวลผลออกได้ นอกจากนี้ ปัญหาการตรวจจับข้อความหลายบรรทัดก็ได้รับการแก้ไขอย่างมีประสิทธิภาพด้วยเช่นกัน และเมื่อเปรียบเทียบการทดลองจะแสดงให้เห็นว่า ตัวแบบ มาส์ก อาร์ซีเอ็นเอ็น ที่นำเสนอ ได้ให้ผลลัพธ์ที่เทียบเท่ากับ CRAFT ในแง่ของความแม่นยำและความยืดหยุ่นเมื่อมีเอกสารใหม่เข้ามา ผลลัพธ์ที่ได้แสดงให้เห็นถึงประสิทธิภาพที่สูงและศักยภาพสำหรับการใช้งานจริง รวมถึงสามารถลดเวลาการทำงานและการใช้ทรัพยากรสำหรับการนำตัวแบบไปใช้

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Kiatphaisansophon, Phanthakan, "Efficient text bounding box identification using mask R-CNN: case of thai documents" (2023). Chulalongkorn University Theses and Dissertations (Chula ETD). 11736.
https://digital.car.chula.ac.th/chulaetd/11736

Download

Included in

Biotechnology Commons, Chemistry Commons, Physics Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

Efficient text bounding box identification using mask R-CNN: case of thai documents

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

Efficient text bounding box identification using mask R-CNN: case of thai documents

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner