Chapter 14
[14:1] Introduction to Machine Learning §Reinforcement learning: Single State Case: K-Armed Bandit, E. Alpaydin - MIT Press 2007
[14:2] Algorithms for the multiarmed bandit problem V. Kuleshov, D. Precup McGill University – 2000, https://www.cs.mcgill.ca/~vkules/bandits.pdf
[14:3] Multiarmed Bandits and Exploration Strategies, S. Raja - https://sudeepraja.github.io/Bandits/
[14:4} A Tutorial on Thompson Sampling, D. Russo, B. Van Roy, A. Kazerouni, I. Osband, Z. Wen – Stanford University, Columbia University, Google Deepmind, Adobe Research 2017 - http://web.stanford.edu/~bvr/pubs/TS_Tutorial.pdf
[14:5] Analysis of Thompson Sampling for the Multiarmed Bandit Problem, S. Agrawal, N. Goyal – Microsoft Research India – 2012 - http://proceedings.mlr.press/v23/agrawal12/agrawal12.pdf
[14:6] Generalized Thompson Sampling for Contextual Bandits, L.Li Microsoft Research – 2013 - https://arxiv.org/pdf/1310.7163.pdf
[14: 7] Bandit Algorithms Continued...