Managing Tail Risk in Online Learning: When Safety Meets Efficiency
In "Seminars and talks"

Speakers

Feng Zhu
Feng Zhu

Massachusetts Institute of Technology

Feng Zhu is a 5th-year PhD student at MIT, advised by Prof. David Simchi-Levi. His research interests lie broadly in sequential decision-making under uncertainty, including online learning and online matching, with applications to online experimentation, supply chain management and revenue management. His research goal is to develop data-driven decision-making paradigms that ensure the safety & resiliency of modern operational systems, with a keen focus on managing (hidden) risk in various decision-making environments. Prior to MIT, he majored in Mathematics & Statistics and minored in Economics at Peking University.


Date:
Wednesday, 4 December 2024
Time:
10:00 am - 11:30 am
Venue:
NUS Business School
Mochtar Riady Building BIZ1-0302
15 Kent Ridge Drive
Singapore 119245 (Map)

Abstract

Online learning is a fast-growing research area in sequential decision-making. While previous work mostly focuses on achieving efficiency by minimizing regret expectation, controlling the regret tail risk to ensure safety is essential in applications such as revenue management, clinical trials, and financial investment, but has not been well studied. This work tries to provide a detailed characterization of regret distribution in online learning under the safety concern of managing tail risk.

 

In Part I, we aim to design policies that enjoy both optimal regret expectation and light-tailed regret distribution. We first find that any policy that obtains the optimal instance-dependent regret expectation could incur a heavy-tailed regret tail risk. We then design a novel policy that enjoys the optimal worst-case regret expectation and has the optimal worst-case regret tail risk with an optimal exponential decaying rate for any regret threshold. Numerical experiments show that our new policy design leads to similar efficiency and much better safety compared to celebrated policies. Our policy design also bears an interesting connection with Monte Carlo Tree Search (MCTS) used in AlphaGo. In Part II, we study the optimal trade-off between expectation and tail risk for regret distribution. We fully characterize the interplay among three desired properties for policy design: worst-case optimality, instance-dependent consistency, and light-tailed risk. Our results reveal several insights on how to design policies that balance efficiency and safety. All our results are extended to the stochastic linear bandit setting.