Chulalongkorn University Theses and Dissertations (Chula ETD)

Enhanced realism in virtual try-on tasks using diffusion methods

Other Title (Parallel Title in Other Language of ETD)

การเพิ่มความสมจริงในการลองเสื้อแบบเสมือนด้วยวิธีแพร่กระจาย

Saris Kiattithapanayong, Faculty of Commerce and Accountancy

Year (A.D.)

2024

Document Type

Thesis

First Advisor

Suronapee Phoomvuthisarn

Faculty/College

Faculty of Commerce and Accountancy (คณะพาณิชยศาสตร์และการบัญชี)

Department (if any)

Department of Statistics (ภาควิชาสถิติ)

Degree Name

Master of Science

Degree Level

Master's Degree

Degree Discipline

Statistics and Data Science

DOI

10.58837/CHULA.THE.2024.725

Abstract

Virtual try-on technology is revolutionizing online retail by enabling customers to visualize garments on their bodies before purchasing. Traditional methods, often based on Generative Adversarial Networks (GANs), face challenges such as misalignment and visual artifacts, especially in complex poses. We present a virtual try-on framework leveraging diffusion models to enhance realism, accuracy, and garment detail preservation. Our approach integrates Vector Quantized Variational Autoencoders (VQ-VAEs) for precise feature matching within a diffusion U-Net architecture. By adopting image-based conditioning with the CLIP image encoder, our system utilizes visual features directly from clothing images for more faithful garment representations. Additionally, an Additional Feature Preserving Block (ControlNet) maintains intricate details like textures and logos, addressing fine-grained garment fidelity challenges. Quantitative evaluation demonstrates our system's superior performance, achieving the best LPIPS of 0.082. We also achieve a Fréchet Inception Distance (FID) of 7.782 and Kernel Inception Distance (KID) of 1.53, indicating enhanced image quality and feature alignment. Although the Structural Similarity Index Measure (SSIM) of 0.825 is slightly lower, it underscores the trade-off for improved realism and garment detail preservation. Our contributions set a new benchmark for accurate and realistic clothing visualization in virtual try-on systems.

Other Abstract (Other language abstract of ETD)

เทคโนโลยีลองเสื้อเสมือนจริงกำลังมีบทบาทสำคัญในการปฏิวัติอุตสาหกรรมค้าปลีกออนไลน์ โดยช่วยให้ลูกค้าสามารถจำลองการสวมใส่เสื้อผ้าก่อนการตัดสินใจซื้อ อย่างไรก็ตาม วิธีการแบบดั้งเดิมที่อาศัยเครือข่าย Generative Adversarial Networks (GANs) ยังคงประสบปัญหาด้านการจัดตำแหน่งที่คลาดเคลื่อนและภาพที่ไม่สมจริง โดยเฉพาะอย่างยิ่งในกรณีที่คนที่เราจะใส่ในภาพมีการเอาส่วนต่างๆของร่างกายมาบดบังเสื้อ งานวิจัยนี้นำเสนอกรอบการทำงานสำหรับระบบลองเสื้อเสมือนจริงที่ใช้ โมเดลการแพร่กระจาย (Diffusion Models) เพื่อเพิ่มความสมจริง ความแม่นยำ และการคงไว้ซึ่งรายละเอียดของเสื้อผ้า แนวทางที่นำเสนอผสานรวม Vector Quantized Variational Autoencoders (VQ-VAEs) เพื่อให้สามารถจับคู่องค์ประกอบของภาพได้อย่างแม่นยำภายในสถาปัตยกรรม diffusion U-Net โดยใช้ การกำหนดเงื่อนไขจากภาพ (Image-based conditioning) ผ่านตัวเข้ารหัสภาพของ CLIP ทำให้ระบบสามารถนำคุณลักษณะจากภาพเสื้อผ้าโดยตรงมาใช้เพื่อสร้างการแสดงผลของเสื้อผ้าที่สมจริงมากขึ้น นอกจากนี้ Additional Feature Preserving Block (ControlNet) ถูกนำมาใช้เพื่อรักษารายละเอียดที่ซับซ้อนของเสื้อผ้า เช่น เนื้อผ้าและโลโก้ ซึ่งช่วยเพิ่มความสมบูรณ์ของรายละเอียดเสื้อผ้าในระดับที่ละเอียดมากขึ้นผลการประเมินเชิงปริมาณแสดงให้เห็นถึงประสิทธิภาพที่เหนือกว่าของระบบที่นำเสนอโดยได้รับค่า Learned Perceptual Image Patch Similarity (LPIPS) ต่ำสุดที่ 0.082 ค่า Fréchet Inception Distance (FID) เท่ากับ 7.782 และค่า Kernel Inception Distance (KID) เท่ากับ 1.53 ซึ่งบ่งชี้ถึงคุณภาพของภาพและความสอดคล้องของคุณลักษณะที่ได้รับการปรับปรุง แม้ว่าค่า Structural Similarity Index Measure (SSIM) จะอยู่ที่ 0.825 ซึ่งต่ำกว่าเล็กน้อย แต่สะท้อนถึงการแลกเปลี่ยนระหว่างความสมจริงที่เพิ่มขึ้นและการคงไว้ซึ่งรายละเอียดของเสื้อผ้า งานวิจัยนี้กำหนดมาตรฐานใหม่สำหรับระบบลองเสื้อเสมือนจริง โดยนำเสนอแนวทางที่มีความแม่นยำและความสมจริงสูง ซึ่งสามารถนำไปประยุกต์ใช้เพื่อพัฒนาเทคโนโลยีลองเสื้อเสมือนจริงในเชิงพาณิชย์และอุตสาหกรรมแฟชั่นต่อไป

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Kiattithapanayong, Saris, "Enhanced realism in virtual try-on tasks using diffusion methods" (2024). Chulalongkorn University Theses and Dissertations (Chula ETD). 74563.
https://digital.car.chula.ac.th/chulaetd/74563

Download

Included in

Statistics and Probability Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

Enhanced realism in virtual try-on tasks using diffusion methods

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

Enhanced realism in virtual try-on tasks using diffusion methods

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner