摘要:
Recently, multi-armed bandit algorithms have been deployed in many realistic large-scale applications, i.e., ranking of search engines and acceleration of model selection. In these cases, the workload represented by the number of bandits |A| is simply too high to be handled by a single processor. A natural way out is to introduce several independent nodes that may take actions and observe rewards in parallel. We denote such problems as multi-agent multi-armed bandits.
In this talk, we introduce various methods for solving multi-agent multi-armed bandits, most of which are derived from classical solutions to single-agent multi-armed bandits. In addition to regret analysis for any participated node, we concentrate on the communication overhead of these algorithms. Moreover, the graph structure of the network in which independent nodes are connected greatly affects the design and analysis of proposed algorithms. Take T, K and N as the time horizon, number of bandits and number of participated agents. Recent works have achieved a communication overhead of O(\log(T)\log(K)) while enjoying a \sqrt{N} speed up for any participated node in finding the optimal arm.