Chulalongkorn University Theses and Dissertations (Chula ETD)

การสกัดคำสำคัญที่เป็นกระแสและคำหยุดจากเพจเฟซบุ๊กภาษาไทยโดยใช้เอ็นแกรมแบบตัวอักษร

Other Title (Parallel Title in Other Language of ETD)

Extraction of Trend Keywords and Stop Words from Thai Facebook Pages using Character n-Grams

ณัษฐพงษ์ อู่สิริมณีชัย, คณะวิศวกรรมศาสตร์

Year (A.D.)

2018

Document Type

Thesis

First Advisor

สุกรี สินธุภิญโญ

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Computer Engineering (ภาควิชาวิศวกรรมคอมพิวเตอร์)

Degree Name

วิศวกรรมศาสตรมหาบัณฑิต

Degree Level

ปริญญาโท

Degree Discipline

วิศวกรรมคอมพิวเตอร์

DOI

10.58837/CHULA.THE.2018.1254

Abstract

สื่อสังคมออนไลน์สามารถใช้วิเคราะห์พฤติกรรมของผู้คนในสังคมได้ โดยสื่อสังคมออนไลน์ที่คนไทยนิยมมากที่สุดคือเฟซบุ๊ก ดังนั้นถ้าเราสามารถวิเคราะห์พฤติกรรมของผู้คนในเฟซบุ๊กได้ก็จะสามารถเข้าใจพฤติกรรมของคนไทยส่วนใหญ่ในสังคมได้ ซึ่งหนึ่งในการวิเคราะห์พฤติกรรมของผู้คนนั้น เรามักจะวิเคราะห์ผ่านกระแสที่เกิดขึ้นในสังคม ว่าผู้คนในสังคมให้ความสนใจในกระแสนั้นอย่างไร จุดเริ่มต้นของกระแสคือเมื่อไหร่ เป็นต้น ซึ่งการวิเคราะห์กระแสนั้นสามารถทำได้ผ่านการวิเคราะห์คำสำคัญที่เกี่ยวข้องกับกระแสดังกล่าว แต่วิธีการที่ใช้ในการสกัดคำสำคัญในปัจจุบันนั้นจำต้องใช้เครื่องมือตัดคำภาษาไทย ซึ่งเครื่องมือในปัจจุบันถูกฝึกสอนด้วยคลังข้อมูลภาษาที่ไม่ได้รวมเอาข้อมูลประโยคที่พบในสื่อสังคมออนไลน์อย่างเฟซบุ๊กไว้ ผลจึงทำให้เครื่องมือตัดคำมีปัญหาเมื่อพบคำที่ไม่เป็นมาตรฐาน ส่งผลต่อประสิทธิภาพของการสกัดคำสำคัญ อีกทั้งวิธีสกัดคำสำคัญในปัจจุบันรองรับการสกัดคำสำคัญที่ความยาวคงที่เท่านั้น ทำให้วิทยานิพนธ์ฉบับนี้ได้พัฒนาวิธีการสกัดคำสำคัญที่เป็นกระแสโดยไม่ใช้เครื่องตัดคำ แต่เลือกใช้อัลกอริทึมเอ็นแกรมแบบตัวอักษรเข้ามาช่วย ซึ่งทำให้สามารถสกัดคำสำคัญที่มีความยาวแบบไม่คงที่ได้ และยังใช้ลักษณะของกระแสในการสร้างฐานข้อมูลคำหยุด และกรองเฉพาะคำที่เป็นกระแสออกมา โดยเมื่อเปรียบเทียบผลกับวิธีดั้งเดิมอย่างวิธี TF-IDF และวิธี TF พบว่าวิธีที่วิทยานิพนธ์นี้นำเสนอ ได้คะแนน F1 ที่ 0.402 ซึ่งดีกว่าวิธี TF-IDF ที่ได้คะแนน F1 ที่ 0.165 และวิธี TF ที่ได้คะแนน F1 ที่ 0.183 โดยวิธีที่วิทยานิพนธ์นี้นำเสนอเหมาะเป็นอย่างยิ่งสำหรับงานที่ต้องการคำสำคัญที่มีความยาวไม่คงที่ อย่างเช่นการหากระแสในสื่อสังคมออนไลน์เฟซบุ๊ก

Other Abstract (Other language abstract of ETD)

Social media can be used to analyze the behavior of people in society, and we often analyze it through the trends in society. The trend analysis can be done through the analysis of keywords related to the trends. But the method used to extract the trend keywords requires Thai word segmentation tools, which are trained with a Thai corpus that does not include sentence information found on social media. As a result, the word segmentation tool has problems when segmenting non-standard words, and thus affecting the efficiency of keyword extraction. In addition, the keyword extraction method supports only the fixed length method. This thesis has developed a method for extracting keywords that are trends by using the character n-grams method instead of word segmentation methods. Which makes it possible to extract keywords that are not fixed in length. In addition, we used the trend characteristics to create the stop word database, then filtered only the words that are trends. By comparing the results with the traditional methods such as TF-IDF and TF methods, it was found that the method proposed by this thesis provided F1 score of 0.402 which is better than TF-IDF method with F1 score of 0.165 and TF method with F1 score of 0.183. Finally, the method presented in this thesis is especially suitable for tasks that require non-fixed length keywords, such as finding the trends on social media, Facebook.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

อู่สิริมณีชัย, ณัษฐพงษ์, "การสกัดคำสำคัญที่เป็นกระแสและคำหยุดจากเพจเฟซบุ๊กภาษาไทยโดยใช้เอ็นแกรมแบบตัวอักษร" (2018). Chulalongkorn University Theses and Dissertations (Chula ETD). 3385.
https://digital.car.chula.ac.th/chulaetd/3385

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

การสกัดคำสำคัญที่เป็นกระแสและคำหยุดจากเพจเฟซบุ๊กภาษาไทยโดยใช้เอ็นแกรมแบบตัวอักษร

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

การสกัดคำสำคัญที่เป็นกระแสและคำหยุดจากเพจเฟซบุ๊กภาษาไทยโดยใช้เอ็นแกรมแบบตัวอักษร

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner