Chulalongkorn University Theses and Dissertations (Chula ETD)

ขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด

Other Title (Parallel Title in Other Language of ETD)

Indexing algorithm for Thai text with errors

วรวัฒน์ วรศิลป์, คณะวิศวกรรมศาสตร์

Year (A.D.)

1999

Document Type

Thesis

First Advisor

สมชาย ประสิทธิ์จูตระกูล

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Degree Name

วิทยาศาสตรมหาบัณฑิต

Degree Level

ปริญญาโท

Degree Discipline

วิทยาศาสตร์คอมพิวเตอร์

DOI

10.58837/CHULA.THE.1999.729

Abstract

วิทยานิพนธ์ฉบับนี้กล่าวถึงขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด โดยมีจุดประสงค์ในการทำให้ดัชนีมีความสมบูรณ์มากขึ้นด้วยการเพิ่มคำที่ถูกต้องเข้าไปในดัชนี ในกรณีที่ข้อความที่นำมาทำดัชนีมีความผิดพลาดปนอยู่ การจัดทำดัชนีที่นำเสนอนี้อาศัยคุณสมบัติ "ความเฉพาะตัว" ของสตริงซึ่งคือ จำนวนครั้งของสตริงที่ปรากฏเป็นส่วนหนึ่งของคำในพจนานุกรม ขั้นตอนวิธีการจัดทำดัชนีแบ่งออกเป็นสามขั้นตอนคือ (1) หารายการของสตริงย่อยของข้อความที่ประกอบกันเป็นข้อความเดิมได้ โดยมีผลรวมของค่าของฟังก์ชัน (ที่มีค่าแปรตามค่าเฉพาะตัว) น้อยที่สุด (2) หาสตริงย่อยจากผลลัพธ์ที่ได้ในขั้นตอนแรกที่มีโอกาสสูงที่จะเกิดจากความผิดพลาดในข้อความ โดยพิจารณาจากค่าความเฉพาะตัวของสตริงย่อยที่เกินเกณฑ์ที่กำหนดไว้ และ (3) หาคำในพจนานุกรมที่ใกล้เคียงกับคำหาได้จากการรวมสตริงย่อยของผลลัพธ์ในขั้นตอนที่สองกับสตริงข้างเคียงในข้อความ มาเป็นคำเพิ่มเติมในการจัดทำดัชนี จากผลการทดลองพบว่าสามารถเพิ่มความสมบูรณ์ให้กับดัชนีเดิมซึ่งไม่พิจารณาความผิดพลาดจาก 87% เป็น 97% ในขณะที่ลดความแม่นยำของดัชนีเดิมจาก 83% ลงเป็น 60%

Other Abstract (Other language abstract of ETD)

This thesis presents an indexing algorithm for Thai text with errors. The algorithm utilizes string's "uniqueness" property which is defined to be the number of times that string appear as parts words in a dictionary. There are three steps in the algorithm. First, we find a list of substrings which can be re-assembled to the original text and minimizes a function of substring uniquenesses. Second substrings of the list potentially caused by error are identified. This can be done by comparing a function of substring uniqueness to a preset threshold. Last, words in the dictionary which approximately match strings obtained by concatenating the potentially error-caused substrings and adjacent substrings are added in the index list. Experimental results showed that this algorithm can improve index completeness from 87% to 94% whiles decrease index precision from 83% to 60%

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

วรศิลป์, วรวัฒน์, "ขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด" (1999). Chulalongkorn University Theses and Dissertations (Chula ETD). 63387.
https://digital.car.chula.ac.th/chulaetd/63387

Link to Full Text

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

ขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

ขั้นตอนวิธีการจัดทำดัชนีสำหรับข้อความไทยที่มีความผิดพลาด

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Share

Search

Browse

Author Corner