Lower Sample Complexity of Reinforcement Learning for Structured MDPs: Evidence from Inventory Control
In "Seminars and talks"

Speakers

Qin Hanzhang
Qin Hanzhang

Assistant Professor, Department of Industrial Systems Engineering and Management, National University of Singapore

Hanzhang Qin is an Assistant Professor at the Department of Industrial Systems Engineering and Management at NUS. He is also an affiliated faculty member at the NUS Institute for Operations Research and Analytics. His research was recognized by several awards, including INFORMS TSL Intelligent Transportation Systems Best Paper Award and MIT MathWorks Prize for Outstanding CSE Doctoral Research. Before joining NUS, Hanzhang spent one year as a postdoctoral scientist in the Supply Chain Optimization Technologies Group of Amazon NYC. He earned his PhD in Computational Science and Engineering under supervision of Professor David Simchi-Levi, and his research interests span stochastic control, applied probability and statistical learning, with applications in supply chain analytics and transportation systems. He holds two master’s, one in EECS and one in Transportation both from MIT. Prior to attending MIT, Hanzhang received two bachelor degrees in Industrial Engineering and Mathematics from Tsinghua University.


Date:
Friday, 19 January 2024
Time:
10:00 am - 11:30 am
Venue:
Institute of Data Science
Innovation 4.0 I4-01-03 (Level 1, Seminar Room)
3 Research Link
Singapore 117602 (Map)

Abstract

I will discuss the important open problems of 1) What is the sample complexity (i.e., how may number of data samples is needed) of learning nearly optimal policy for multi-stage stochastic inventory control when the underlying demand distribution is initially unknown; and 2) How to compute such a policy when the required number of data samples are given. For the first half of the talk, without considering fixed ordering cost, I will start answering the questions from the backlog setting via SAIL, a novel SAmple based Inventory Learning algorithm. Then, results for the more practical lost-sales setting will be discussed, including the first sample complexity result for this more challenging setting with only mild assumptions (that ensures quality data), by leveraging both recent developments of variance reduction techniques for reinforcement learning and the structural properties of the dynamic programming formulation for inventory control settings. Numerical simulations show that SAIL significantly outperforms competing methods in terms of inventory cost minimization. Then in the second half, I will discuss several recent developments on sample complexity related to all three types of the MDP formulations (finite-horizon MDPs, infinite-horizon discounted/average-cost MDPs) for inventory control with fixed ordering cost. Somewhat surprisingly, in all three cases, it is found that sample complexity of the most naïve plug-in estimators is strictly lower than the “best possible” bounds derived for general MDPs.

 

The first half will be based on joint work with David Simchi-Levi (MIT) and Ruihao Zhu (Cornell), and the second half will be based on joint work with Boxiao Chen (UIC), Xiaoyu Fan (NYU), Michael Pinedo (NYU) and Zhengyuan Zhou (NYU).