Tag: reinforcement-learning
- Top-k Off-Policy Correction for a REINFORCE Recommender System (10 Oct 2020)
- Multi-armed Bandits: Thompson Sampling (06 May 2020)
- Structured Prediction and Reinforcement Learning (25 Aug 2019)
- Stochastic Policy Gradient (20 Apr 2019)
- Q Learning (12 Apr 2019)