Chulalongkorn University Theses and Dissertations (Chula ETD)

An attention-guided image super-resolution for scene text

Other Title (Parallel Title in Other Language of ETD)

ภาพความละเอียดสูงยิ่งยวดด้วยการใช้กลไกความสนใจสำหรับข้อความในฉาก

Tithnorakneath Em, Faculty of Engineering

Year (A.D.)

2024

Document Type

Thesis

First Advisor

Supavadee Aramvith

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Electrical Engineering (ภาควิชาวิศวกรรมไฟฟ้า)

Degree Name

Master of Engineering

Degree Level

Master's Degree

Degree Discipline

Electrical Engineering

DOI

10.58837/CHULA.THE.2024.1365

Abstract

Scene Text Image Super-Resolution (STISR) is crucial in enhancing text recognition for low-resolution images, often degraded due to environmental and device limitations. While many existing STISR approaches leverage super-resolution techniques to refine image quality, they frequently need specialized mechanisms to extract text-specific features. Most existing methods were based on feature extraction methods that missed critical details, such as edges and text-background contrasts, that are usually important for recognition. In this work, we propose a new feature extraction method that uses Channel Attention and Spatial Attention mechanisms to emphasize features related to text, namely edges, contours, and spatial alignment. We add a new guidance branch to conduct the super-resolution process more effectively, using advanced text-specific clues from the pre-trained text recognizer. This text guidance includes the context that enhances feature extraction; it focuses the model on the critical regions while suppressing the interference of irrelevant backgrounds. Our model is trained and evaluated on the TextZoom dataset - a benchmark specially designed for scene text super-resolution, which includes low-resolution and high-resolution paired text images taken in various real-world conditions. These results show that our approach allows for substantial text recognition accuracy, significantly improving readability at different levels of image degradation. This result underlines the potential of our method in practical applications of STISR, where clarity and recognition reliability are essential.

Other Abstract (Other language abstract of ETD)

ภาพความละเอียดสูงยิ่งยวดสำหรับข้อความในฉาก (Scene Text Image Super-Resolution: STISR) มีความสำคัญอย่างยิ่งในการปรับปรุงการรู้จำข้อความสำหรับภาพที่มีความละเอียดต่ำ ซึ่งมักจะลดคุณภาพลงเนื่องจากข้อจำกัดด้านสภาพแวดล้อมและอุปกรณ์ แม้ว่าวิธีการ STISR ที่มีอยู่จำนวนมากจะใช้เทคนิคการเพิ่มความละเอียดเพื่อปรับปรุงคุณภาพของภาพ แต่ก็มักจะต้องใช้กลไกเฉพาะทางเพื่อดึงคุณสมบัติเฉพาะของข้อความ วิธีการที่มีอยู่ส่วนใหญ่ใช้วิธีการดึงคุณสมบัติที่พลาดรายละเอียดที่สำคัญ เช่น ขอบและความเปรียบต่างระหว่างข้อความกับพื้นหลัง ซึ่งมักมีความสำคัญต่อการรู้จำ ในงานวิจัยนี้ เราเสนอวิธีการสกัดลักษณะเด่นใหม่ที่ใช้กลไกความสนใจด้านช่องสัญญาณและกลไกความสนใจด้านพื้นที่ เพื่อเน้นลักษณะเด่นที่เกี่ยวข้องกับข้อความ ได้แก่ ขอบ เส้นขอบนอก และการจัดวางในพื้นที่ เราเพิ่มส่วนการแนะนำใหม่เพื่อดำเนินกระบวนการเพิ่มความละเอียดให้มีประสิทธิภาพมากขึ้น โดยใช้ข้อมูลเฉพาะของข้อความขั้นสูงจากตัวรู้จำข้อความที่ผ่านการฝึกฝนมาก่อน การแนะนำข้อความนี้รวมถึงบริบทที่ช่วยเพิ่มประสิทธิภาพการสกัดลักษณะเด่น ซึ่งทำให้แบบจำลองมุ่งเน้นไปที่บริเวณสำคัญในขณะที่ลดการรบกวนจากพื้นหลังที่ไม่เกี่ยวข้อง แบบจำลองของเราได้รับการฝึกฝนและประเมินผลบนชุดข้อมูล TextZoom ซึ่งเป็นเกณฑ์มาตรฐานที่ออกแบบมาเป็นพิเศษสำหรับการเพิ่มความละเอียดภาพข้อความในฉาก ซึ่งรวมถึงภาพข้อความคู่ที่มีความละเอียดต่ำและความละเอียดสูงที่ถ่ายในสภาพแวดล้อมจริงที่หลากหลาย ผลลัพธ์เหล่านี้แสดงให้เห็นว่าวิธีการของเราช่วยให้มีความแม่นยำในการรู้จำข้อความได้มากขึ้นอย่างมาก ช่วยปรับปรุงความสามารถในการอ่านในระดับต่างๆ ของการลดคุณภาพของภาพ ผลลัพธ์นี้เน้นย้ำถึงศักยภาพของวิธีการของเราในการใช้งานจริงของ STISR ซึ่งความชัดเจนและความน่าเชื่อถือในการรู้จำเป็นสิ่งสำคัญ

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Em, Tithnorakneath, "An attention-guided image super-resolution for scene text" (2024). Chulalongkorn University Theses and Dissertations (Chula ETD). 75121.
https://digital.car.chula.ac.th/chulaetd/75121

Download

Included in

Electrical and Electronics Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

An attention-guided image super-resolution for scene text

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

An attention-guided image super-resolution for scene text

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner