Chulalongkorn University Theses and Dissertations (Chula ETD)

Thai scene text recognition

Other Title (Parallel Title in Other Language of ETD)

การรู้จำข้อความภาษาไทยในภาพถ่าย

Thananop Kobchaisawat, Faculty of Engineering

Year (A.D.)

2019

Document Type

Thesis

First Advisor

Thanarat Chalidabhongse

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Computer Engineering (ภาควิชาวิศวกรรมคอมพิวเตอร์)

Degree Name

Doctor of Philosophy

Degree Level

Doctoral Degree

Degree Discipline

Computer Engineering

DOI

10.58837/CHULA.THE.2019.160

Abstract

Automatic scene text detection and recognition can benefit a large number of daily life applications such as reading signs and labels, and helping visually impaired persons. Reading scene text images becomes more challenging than reading scanned documents in many aspects due to many factors such as variations of font styles and unpredictable lighting conditions. The problem can be decomposed into two sub-problems: text localization and text recognition. The proposed scene text localization works at the pixel level combined with a new text representation and a fully-convolutional neural network. This method is capable of detecting arbitrary shape texts without language limitations. The experimental results on the standard benchmarks show the performance in terms of accuracy and speed compared to the existing works. The cropped text instances are passed into the proposed text recognition algorithm, which consists of four stages: transformation, feature extraction, sequence modeling, and prediction. The proposed method is designed based on a fully-learnable deep learning-based model in combination with multi-level attention, which inspires from Thai writing system. The training data is purely synthesized from various fonts and novel techniques to make the generated images looked sensible. The experimental results on the test dataset show excellent accuracy and inference time.

Other Abstract (Other language abstract of ETD)

การระบุตำแหน่งและรู้จำข้อความจากภาพถ่ายโดยอัตโนมัติ สามารถนำไปใช้ประโยชน์ได้หลากหลายในชีวิตประจำวัน เช่น การอ่านป้ายบอกทาง ฉลากสินค้า และการช่วยเหลือคนพิการทางการมองเห็น การอ่านข้อความจากภาพถ่ายนั้น มีความแตกต่างจากภาพเอกสารในหลายแง่มุม เช่น ความหลากหลายของรูปแบบอักษร การเรียงตัวของข้อความและสภาพแสงที่คาดเดาได้ยาก ปัญหานี้สามารถแบ่งได้เป็น 2 ปัญหาย่อยคือ การระบุตำแหน่งข้อความและการอ่านข้อความจากภาพถ่าย ขั้นตอนวิธีการระบุตำแหน่งข้อความที่เสนอ ใช้หลักการจำแนกประเภทระดับจุดภาพ ร่วมกับการบ่งบอกบริเวณของข้อความ และการเรียนรู้เชิงลึกแบบคอนโวลูชันทั้งหมด วิธีการที่นำเสนอนั้นสามารถตรวจจับข้อความได้ไม่จำกัดภาษา โดยไม่จำกัดรูปแบบ ผลการทดลองด้วยวิธีที่นำเสนอบนชุดข้อมูลทดสอบมาตรฐานแสดงให้เห็นถึงประสิทธิภาพที่ดีขึ้นทั้งในด้านความแม่นยำและความเร็วเมื่อเทียบกับวิธีอื่นๆ ส่วนภาพของข้อความจะถูกตัดแบ่งเพื่อเข้าสู่ขั้นตอนวิธีการรู้จำข้อความจากภาพถ่าย ประกอบไปด้วย 4 ขั้นตอนคือ การแปลงสภาพ การสกัดคุณลักษณะสำคัญ การสกัดคุณลักษณะของลำดับและการทำนาย ขั้นตอนที่เสนอถูกออกแบบเป็นโมเดลการเรียนรู้เชิงลึก แบบสามารถเรียนรู้ได้ทั้งหมดร่วมกับกลไกจุดสนใจแบบหลายระดับ ตามรูปแบบการเขียนในภาษาไทย โดยใช้ชุดข้อมูลสอนจากภาพข้อความที่สร้างขึ้นจากรูปแบบตัวอักษร ร่วมกับขั้นตอนวิธีที่ทำให้ภาพข้อความใกล้เคียงกับที่ปรากฏในภาพถ่าย ผลการทดลองบนชุดข้อมูลทดสอบ แสดงให้เห็นถึงประสิทธิภาพ ความแม่นยำ และผลกระทบของส่วนต่างๆในขั้นตอนวิธีที่นำเสนอ

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Kobchaisawat, Thananop, "Thai scene text recognition" (2019). Chulalongkorn University Theses and Dissertations (Chula ETD). 8536.
https://digital.car.chula.ac.th/chulaetd/8536

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

Thai scene text recognition

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

Thai scene text recognition

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner