Chulalongkorn University Theses and Dissertations (Chula ETD)

การพัฒนาตัวแบบแปลความหมายทางภูมิศาสตร์และจำแนกประเภทอัตโนมัติจากข้อมูลภาษาไทยบนทวิตเตอร์

Other Title (Parallel Title in Other Language of ETD)

The development of geoparsing and automated classification from Thai Twitter text data

ธุวชิต แฉล้มเขตต์, คณะวิศวกรรมศาสตร์

Year (A.D.)

2022

Document Type

Thesis

First Advisor

ชนินทร์ ทินนโชติ

Second Advisor

อรรถพล ธำรงรัตนฤทธิ์

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Survey Engineering (ภาควิชาวิศวกรรมสำรวจ)

Degree Name

วิศวกรรมศาสตรดุษฎีบัณฑิต

Degree Level

ปริญญาเอก

Degree Discipline

วิศวกรรมสำรวจ

DOI

10.58837/CHULA.THE.2022.876

Abstract

ทวิตเตอร์เป็นแหล่งข้อมูลข่าวสารที่มีความรวดเร็วอย่างมาก ในข้อความปริมาณมหาศาลที่มีการสื่อสารกันนั้น มีข้อมูลเกี่ยวกับสถานที่ใหม่ ๆ ทั้งชื่อและข้อความที่อธิบายตำแหน่งที่ตั้ง จึงนับเป็นแหล่งข้อมูลที่สำคัญสำหรับช่วยในการปรับปรุงฐานข้อมูลภูมิสารสนเทศในระบบสารสนเทศต่าง ๆ เช่นระบบแผนที่นำทาง ให้ทันสมัยอยู่อย่างต่อเนื่อง โดยขั้นตอนสำคัญ 2 ขั้นตอนคือ การสกัดภูมินาม เพื่อค้นหาและสกัดชื่อของสถานที่ในข้อความ และการเข้ารหัสภูมิศาสตร์ เพื่อวิเคราะห์ประมาณค่าตำแหน่งที่ตั้งทางภูมิศาสตร์ของสถานที่นั้น ในปัจจุบันการนำงานวิจัยและเครื่องมือการสกัดภูมินามที่ได้มีการพัฒนาไว้กับภาษาอื่นมาใช้กับข้อมูลภาษาไทยยังมีอยู่ค่อนข้างจำกัด และเทคนิคการเข้ารหัสภูมิศาสตร์ที่มีอยู่ก็ยังให้ค่าความถูกต้องทางตำแหน่งไม่ดีเท่าที่ควร งานวิจัยนี้พัฒนาตัวแบบเพื่อแปลความหมายทางภูมิศาสตร์ภาษาไทย โดยในการสกัดภูมินามนั้น ได้นำเทคนิคการเรียนรู้ของเครื่องได้แก่ แบบจำลอง CRF ซึ่งมีการสร้างฟังก์ชันคุณลักษณะเฉพาะทางด้านภูมิศาสตร์เพิ่มเติม โครงข่ายประสาทเทียมแบบวกกลับ ได้แก่ LSTM และ GRU และสุดท้ายคือแบบจำลองการถ่ายโอนความรู้ คือ BERT โดย BERT คือแบบจำลองที่ให้ค่าความถูกต้องโดยรวมในระดับคำที่สมบูรณ์ (F1-Phrase) อยู่ที่ 0.919 การเข้ารหัสภูมิศาสตร์เพื่อหาตำแหน่งของชื่อสถานที่ใหม่ที่สกัดได้นั้น ได้มีการพัฒนาอัลกอริทึมใหม่ขึ้นงานวิจัยนี้โดยการนำข้อมูลความสัมพันธ์เชิงพื้นที่ระหว่างชื่อสถานที่อื่น ๆ ที่ทราบตำแหน่งที่ตั้งในข้อความมา ใช้เป็นค่าถ่วงน้ำหนักในการประมาณตำแหน่งของสถานที่ใหม่ ให้ชื่อว่า Topology words ซึ่งจากผลการวิจัยพบว่า แบบจำลอง Topology words ให้ประสิทธิภาพดีที่สุดจากค่าเฉลี่ยกำลังสอง (Root mean square error) ต่ำที่สุดคือ 0.947 กิโลเมตร และเป็นค่าความถูกต้องที่ดีกว่าเทคนิคเดิม ๆ ที่มีอยู่ทั้ง DBSCAN, K-means, K-medoids และ Agglomerative clustering

Other Abstract (Other language abstract of ETD)

Twitter is a rapid news source with a wealth of geo-referenced information. Geoparsing is the transformation of textual place names into geospatial data. For locating new locations, navigation systems and geospatial data retrieval systems are utilized. There is no such instrument for Thai language data currently. In this study, it is necessary to create a model for the geoparsing of Thai. It includes two crucial steps: Toponym recognition. geocoding In the first stage of topographic extraction, additional geographic feature functions are generated using a machine learning technique called the CRF model, the recurrent neural networks, LSTM, and GRU; and lastly, the knowledge transfer model, BERT, where BERT is the model with the highest absolute word-level accuracy (F1-Phrase). The final step is geocoding. This research extends to the estimation of a place if it cannot be determined using the existing database. An algorithm known as "topology words" incorporates the properties of referencing relationships between locations in the text. Also utilized are clustering machine learning models, including DBSCAN, K-means, K-medoids, and Agglomerative clustering. Used to designate a group of place names that will be used to estimate the location. According to the research findings, the topology word model provided the greatest performance, with the lowest root mean square error of 0.94 km.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

แฉล้มเขตต์, ธุวชิต, "การพัฒนาตัวแบบแปลความหมายทางภูมิศาสตร์และจำแนกประเภทอัตโนมัติจากข้อมูลภาษาไทยบนทวิตเตอร์" (2022). Chulalongkorn University Theses and Dissertations (Chula ETD). 6586.
https://digital.car.chula.ac.th/chulaetd/6586

Download

Included in

Engineering Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

การพัฒนาตัวแบบแปลความหมายทางภูมิศาสตร์และจำแนกประเภทอัตโนมัติจากข้อมูลภาษาไทยบนทวิตเตอร์

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

การพัฒนาตัวแบบแปลความหมายทางภูมิศาสตร์และจำแนกประเภทอัตโนมัติจากข้อมูลภาษาไทยบนทวิตเตอร์

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Second Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner