Chulalongkorn University Theses and Dissertations (Chula ETD)

Multi-agent deep reinforcement learning for cryptocurrency trading

Other Title (Parallel Title in Other Language of ETD)

การเรียนรู้แบบเสริมกำลังเชิงลึกแบบหลายตัวกระทำสำหรับการซื้อขายคริปโทเคอร์เรนซี

Kittiwin Kumlungmak, Faculty of Engineering

Year (A.D.)

2022

Document Type

Thesis

First Advisor

Peerapon Vateekul

Faculty/College

Faculty of Engineering (คณะวิศวกรรมศาสตร์)

Department (if any)

Department of Computer Engineering (ภาควิชาวิศวกรรมคอมพิวเตอร์)

Degree Name

Master of Science

Degree Level

Master's Degree

Degree Discipline

Computer Science

DOI

10.58837/CHULA.THE.2022.95

Abstract

Reinforcement learning has emerged as a promising approach for enhancing profitability in cryptocurrency trading. However, the inherent volatility of the market, especially during bearish periods, poses significant challenges in this domain. Existing literature addresses this issue through the adoption of single-agent techniques such as deep Q-network (DQN), advantage actor-critic (A2C), and proximal policy optimization (PPO), or their ensembles. Despite these efforts, the mechanisms employed to mitigate losses during bearish market conditions within the cryptocurrency context lack robustness. Consequently, the performance of reinforcement learning methods for cryptocurrency trading remains constrained within the current literature. To overcome this limitation, we present a novel cryptocurrency trading method, leveraging multi-agent proximal policy optimization (MAPPO). Our approach incorporates a collaborative multi-agent scheme and a local-global reward function to optimize both individual and collective agent performance. Employing a multi-objective optimization technique and a multi-scale continuous loss (MSCL) reward, we train the agents using a progressive penalty mechanism to prevent consecutive losses of portfolio value. In evaluating our method, we compare it against multiple baselines, revealing superior cumulative returns compared to baseline methods. Notably, the strength of our method is further exemplified through the results obtained from the bearish test set, where only our approach demonstrates the ability to yield a profit. Specifically, our method achieves an impressive cumulative return of 2.36%, while the baseline methods result in negative cumulative returns. In comparison to FinRL-Ensemble, a reinforcement learning-based method, our approach exhibits a remarkable 46.05% greater cumulative return in the bullish test set.

Other Abstract (Other language abstract of ETD)

การเรียนรู้แบบเสริมกำลัง (Reinforcement learning) เป็นวิธีการที่ถูกนำมาใช้ในการเพิ่มผลกำไรในการซื้อขายคริปโทเคอร์เรนซี (cryptocurrency) อย่างไรก็ตาม ความผันผวนของตลาด โดยเฉพาะในช่วงเวลาที่ตลาดเป็นลักษณะตลาดขาลง (Bearish) กลายเป็นอุปสรรคที่สำคัญของด้านนี้ งานวิจัยที่มีอยู่ในปัจจุบัน มีความพยายามที่จะแก้ปัญหานี้โดยการใช้เทคนิค Deep Q-Network (DQN), Advantage Actor-Critic (A2C), และ Proximal Policy Optimization (PPO) หรือการผสมผสานกันของเทคนิคดังกล่าว (Ensemble) แต่อย่างไรก็ตาม กลไกที่นำมาใช้เพื่อลดความเสียหายในช่วงตลาดขาลงสำหรับคริปโทเคอร์เรนซียังไม่มีประสิทธิภาพเท่าที่ควร ดังนั้นประสิทธิภาพของวิธีการเรียนรู้แบบเสริมกำลังสำหรับการซื้อขายคริปโทเคอร์เรนซียังถูกจำกัด เพื่อเอาชนะข้อจำกัดนี้ เรานำเสนอเทคนิคใหม่สำหรับการซื้อขายคริปโทเคอร์เรนซี โดยใช้การเรียนรู้แบบหลายตัวกระทำ (Multi-Agent) และฟังก์ชันรางวัลร่วม (Local-Global Reward Function) เพื่อปรับปรุงประสิทธิภาพในการทำงานร่วมกันของตัวกระทำทุกตัว รวมถึงการทำงานของตัวกระทำแต่ละตัวไปพร้อมกันด้วย นอกจากนั้น เรายังใช้เทคนิคการปรับปรุงเป้าหมายหลายวัตถุประสงค์ (Multi-Objective Optimization Technique) และการทำโทษเมื่อมีการสูญเสียแบบต่อเนื่อง ซึ่งเราเรียกว่า Multi-Scale Continuous Loss (MSCL) Reward ที่เราดัดแปลงมาจากการลงโทษแบบเพิ่มเติม (Progressive Penalty) เพื่อป้องกันความสูญเสียต่อเนื่องของมูลค่าพอร์ตการลงทุน ในการประเมินผลของวิธีการที่เรานำเสนอ เราได้ทำการเปรียบเทียบกับเทคนิคอื่นๆที่เป็นที่นิยม และพบว่าผลตอบแทนสะสม (cumulative return) ของเทคนิคของเรามีค่าสูงกว่าเทคนิคดังกล่าว โดยเฉพาะในช่วงตลาดขาลง มีเพียงวิธีการของเราเท่านั้นที่สามารถให้ผลกำไรได้ ซึ่งวิธีการของเราสร้างผลตอบแทนสะสมได้ถึง 2.36% ในขณะที่วิธีการอื่นๆที่เรานำมาเปรียบเทียบเกิดการขาดทุนทั้งหมด และเมื่อเปรียบเทียบกับ FinRL-Ensemble ซึ่งเป็นวิธีการที่ใช้การเรียนรู้แบบเสริมกำลัง เราพบว่าวิธีการของเราได้รับผลตอบแทนสะสมที่สูงกว่าถึง 46.05% ในช่วงตลาดขาขึ้น (Bullish)

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Recommended Citation

Kumlungmak, Kittiwin, "Multi-agent deep reinforcement learning for cryptocurrency trading" (2022). Chulalongkorn University Theses and Dissertations (Chula ETD). 5806.
https://digital.car.chula.ac.th/chulaetd/5806

Download

Included in

Computer Sciences Commons

COinS

Chulalongkorn University Theses and Dissertations (Chula ETD)

Multi-agent deep reinforcement learning for cryptocurrency trading

Other Title (Parallel Title in Other Language of ETD)

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Search

Browse

Author Corner

Chulalongkorn University Theses and Dissertations (Chula ETD)

Multi-agent deep reinforcement learning for cryptocurrency trading

Other Title (Parallel Title in Other Language of ETD)

Author

Year (A.D.)

Document Type

First Advisor

Faculty/College

Department (if any)

Degree Name

Degree Level

Degree Discipline

DOI

Abstract

Other Abstract (Other language abstract of ETD)

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse

Author Corner