Chulalongkorn University Theses and Dissertations (Chula ETD)

การสกัดตารางและรายการบนเว็บเป็นอาร์ดีเอฟ

Other Title (Parallel Title in Other Language of ETD)

Extraction of tables and lists on the web to RDF

จุลเทพ นันทขว้าง, คณะวิศวกรรมศาสตร์

Year (A.D.)

2020

Document Type

Thesis

First Advisor

ประภาส จงสถิตย์วัฒนา

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Computer Engineering (ภาควิชาวิศวกรรมคอมพิวเตอร์)

Degree Name

วิศวกรรมศาสตรดุษฎีบัณฑิต

Degree Level

ปริญญาเอก

Degree Discipline

วิศวกรรมคอมพิวเตอร์

DOI

10.58837/CHULA.THE.2020.1131

Abstract

ทุกวันนี้ ลิงก์เดต้าได้เติบโตเพิ่มขึ้นอย่างรวดเร็วตามการเติบโตของเว็บ นอกเหนือจากข้อมูลใหม่ที่สร้างขึ้นในรูปแบบซีแมนติกโดยเฉพาะ ส่วนหนึ่งมาจากการแปลงข้อมูลโครงสร้างที่มีอยู่ให้อยู่ในรูปแบบของข้อมูลเปิดระดับห้าดาว อย่างไรก็ตามยังคงมีข้อมูลจำนวนมากในรูปแบบโครงสร้างและกึ่งโครงสร้าง ตัวอย่างเช่นตารางและรายการซึ่งเป็นรูปแบบหลักที่มนุษย์ใช้อ่าน ยังรอการแปลงอยู่ งานวิจัยนี้กล่าวถึงงานวิจัยต่าง ๆ ที่เกี่ยวกับการแปลงตารางและรายการมาเป็นข้อมูลในรูปแบบต่าง ๆ เพื่อให้เครื่องสามารถอ่านได้ นอกจากนี้ยังเสนอวิธีการในการแปลงตารางและรายการเป็นรูปแบบ Resource Description Framework และยังคงเก็บโครงสร้างต้นฉบับที่จำเป็นไว้อย่างละเอียด ซึ่งทำให้สามารถที่จะสร้างข้อมูลโครงสร้างเดิมกลับมาได้ ระบบ TULIP ถูกสร้างขึ้นเพื่อเป็นเครื่องมือสำหรับการพัฒนาซีแมนติกเว็บ วิธีการที่เสนอมีความยืดหยุ่นมากกว่าเมื่อเทียบกับงานอื่น ๆ เดต้าโมเดลของ TULIP สามารถรองรับการเก็บข้อมูลต้นฉบับอย่างครบถ้วน และสามารถนำมาแสดงใหม่ในมุมมองที่แตกต่างไปจากเดิม เครื่องมือนี้สามารถใช้สร้างข้อมูลจำนวนมหาศาลสำหรับเครื่องคอมพิวเตอร์เพื่อให้ใช้งานได้กว้างมากขึ้นกว่าเดิม

Other Abstract (Other language abstract of ETD)

Currently, Linked Data is increasing at a rapid rate as the growth of the Web. Aside from new information that has been created exclusively as Semantic Web-ready, part of them comes from the transformation of existing structural data to be in the form of five-star open data. However, there are still many legacy data in structured and semi-structured form, for example, tables and lists, which are the principal format for human-readable, waiting for transformation. This work discusses attempts in the research area to transform table and list data to make them machine-readable in various formats. Furthermore, the research proposes a method for transforming tables and lists into Resource Description Framework format while maintaining their essential configurations thoroughly. It is possible to recreate their original form back informatively. A system named TULIP has been developed which embodied this conversion method as a tool for the future development of the Semantic Web. The proposed method is more flexible compared to other works. The TULIP data model contains complete information of the source; hence it can be projected into different views. This tool can be used to create a tremendous amount of data for machines to be used at a broader scale.

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

นันทขว้าง, จุลเทพ, "การสกัดตารางและรายการบนเว็บเป็นอาร์ดีเอฟ" (2020). Chulalongkorn University Theses and Dissertations (Chula ETD). 3789.
https://digital.car.chula.ac.th/chulaetd/3789

Download

Included in

Computer Engineering Commons, Computer Sciences Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

การสกัดตารางและรายการบนเว็บเป็นอาร์ดีเอฟ

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

การสกัดตารางและรายการบนเว็บเป็นอาร์ดีเอฟ

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner