Odalric-Ambrym Maillard
In Algorithmic Learning Theory, 2013.
Abstract: |
We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret. |
You can dowload the paper from the ALT website (here) or from the HAL online open depository* (here).
Bibtex: |
@incollection{Maillard2013, year={2013}, isbn={978-3-642-40934-9}, booktitle={Algorithmic Learning Theory}, volume={8139}, series={Lecture Notes in Computer Science}, editor={Jain, Sanjay and Munos, Rémi and Stephan, Frank and Zeugmann, Thomas}, title={Robust Risk-Averse Stochastic Multi-armed Bandits}, publisher={Springer Berlin Heidelberg}, author={Maillard, Odalric-Ambrym}, pages={218-233} } |
Related publications: |
Kullback-Leibler Upper Confidence Bounds for Optimal Sequential Allocation. Finite-Time Analysis of Multi-armed Bandits Problems with Kullback-Leibler Divergences. |
--
* The HAL open-access online archive system seeks to make research results available to the widest audience, independently of the major publisher, and cooperates with other large international archives like arXiv.