[{"id":21190,"title":"A Broader View of Thompson Sampling","permalink":"https:\/\/bschool.nus.edu.sg\/biz-events\/event\/a-broader-view-of-thompson-sampling\/","category":"Seminars and talks","event_dept":{"value":"analytics-operations","label":"Analytics & Operations"},"event_sec_dept":false,"event_details":{"event_start_date":"21  November  2025","event_end_date":"21  November  2025","event_start_time":"10:00 am","event_end_time":"11:30 am","event_dress_code":"NA"},"event_loc":{"eve_address_selection":"1","eve_location_1":{"eve_org":"NUS Business School","eve_build":"Mochtar Riady Building","eve_room":"BIZ1 0302","eve_add":"15 Kent Ridge Drive","eve_count":"Singapore","eve_copos":"119245","eve_map_url":"https:\/\/goo.gl\/maps\/Q1kyjwxHNE22"},"eve_location_2":{"eve_org":"Shaw Foundation Alumni House","eve_build":"","eve_room":"Clove and Lemongrass Room Level 2","eve_add":"11 Kent Ridge Drive","eve_count":"Singapore","eve_copos":119244,"eve_map_url":"https:\/\/goo.gl\/maps\/docgThkDWFxKdb9c7"},"eve_location_3":{"eve_org":"Hon Sui Sen Memorial Library Auditorium","eve_build":"","eve_room":"","eve_add":"1 Hon Sui Sen Drive","eve_count":"Singapore","eve_copos":117588,"eve_map_url":"https:\/\/goo.gl\/maps\/NJjWK4RMpC92"},"eve_location_4":{"eve_org":"NUSS Kent Ridge Guild House","eve_build":"","eve_room":"Dalvey Room","eve_add":"9 Kent Ridge Drive","eve_count":"Singapore","eve_copos":119241,"eve_map_url":"https:\/\/goo.gl\/maps\/nXn2Luh96pH2"},"eve_location_5":{"eve_org":"Institute of Data Science","eve_build":"Innovation 4.0","eve_room":"1-3","eve_add":"3 Research Link","eve_count":"Singapore","eve_copos":117602,"eve_map_url":"https:\/\/goo.gl\/maps\/i1xocvvDh27QUXem7"},"eve_location_6":{"eve_org":"","eve_build":"","eve_room":"","eve_add":"","eve_count":"","eve_copos":"","eve_map_url":""},"eve_location_7":""},"event_introduction":"","event_short_intro":"","event_topic":null,"event_banner":false,"event_external_url":"","event_registration_details":{"event_registration_form":false,"event_registration_message":"","event_registration_deadline":null,"eve_registration_url":"","event_form":"","event_registration_ack":""},"event_speaker":[{"event_speaker_name":"Yanlin Qu","event_speaker_designation":"","event_speaker_affiliation":"Stanford University","event_speaker_picture":false,"event_speaker_url":"","event_speaker_introduction":"<table>\n<tbody>\n<tr>\n<td>Yanlin Qu is a postdoctoral research scholar in the Decision, Risk, and Operations Division at Columbia Business School, working with Assaf Zeevi and Hongseok Namkoong. He earned his PhD in Management Science and Engineering from Stanford University, advised by Peter Glynn and Jose Blanchet. At the interface of Operations Research and Machine Learning, his research synergizes methods from both fields to study stochastic systems and their associated decision-making problems, such as analyzing Markov chains via deep learning and understanding Bayesian bandits via online optimization.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n"}],"event_agenda":false,"event_photo_gallery":false,"event_presentations":false,"event_custom_heading":[{"event_custom_title":"Abstract","event_custom_details":"<p>Thompson Sampling is one of the most widely used and studied bandit algorithms, known for its simple structure, low regret performance, and solid theoretical guarantees. Yet, in stark contrast to most other families of bandit algorithms, the exact mechanism through which posterior sampling (as introduced by Thompson) is able to &#8220;properly&#8221; balance exploration and exploitation, remains a mystery. In this talk we show that the core insight to address this question stems from recasting Thompson Sampling as an online optimization algorithm. To distill this, a key conceptual tool is introduced, which we refer to as &#8220;faithful&#8221; stationarization of the regret formulation. Essentially, the finite horizon dynamic optimization problem is converted into a stationary counterpart which &#8220;closely resembles&#8221; the original objective (in contrast, the classical infinite horizon discounted formulation, that leads to the Gittins index, alters the problem and objective in too significant a manner). The newly crafted time invariant objective can be studied using Bellman&#8217;s principle which leads to a time invariant optimal policy. When viewed through this lens, Thompson Sampling admits a simple online optimization form that mimics the structure of the Bellman-optimal policy, and where greediness is regularized by a measure of residual uncertainty based on point-biserial correlation. This answers the question of how Thompson Sampling balances exploration-exploitation, and moreover, provides a principled framework to study and further improve Thompson&#8217;s original idea.<\/p>\n"}],"event_enquiry_details":{"event_enq_full_name":"","event_enq_department":"","event_enq_email":"","event_enq_telephone":"","event_enq_website":""}}]