Shangtong Zhang: Off-Policy Evaluation

About Share Download Add to

Data Fest Online 2020 Reinforcement Learning track In this talk, I will present my recent work on off-policy evaluation, where we want to estimate the performance of a policy with only a given dataset without executing the policy. Off-policy evaluation has broad real world applications such as recommendation systems. I will start with a brief introduction to reinforcement learning and discuss main challenges in Off-policy evaluation. Then I will present our work GradientDICE at ICML 2020 and discuss how and why it is better than previous methods like DualDICE and GenDICE, both theoretically and empirically. Register and get access to the tracks: Join the community:

Share with your friends

Link:

Embed:

<iframe width="640" height="360" src="//myvideo.cc/embed/eWVyMW9xcVgvWTRDTU5rZGZuZWhiZXhtSExkbU5yNnlmZHlCT1luWVJDMD0" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>