✨ TL;DR
This paper introduces a bandit algorithm for learning smooth functions over graphs, where neighboring nodes have similar payoffs. The approach achieves regret bounds that scale with an effective dimension rather than the total number of nodes, enabling efficient online learning for applications like content recommendation with thousands of items from just tens of evaluations.
The paper addresses online learning problems on graphs where the payoff function is smooth, meaning neighboring nodes have similar expected rewards. Traditional multi-armed bandit approaches scale poorly with the number of nodes, making them impractical for large graphs. This is particularly relevant for applications like content-based recommendation systems, where items form a graph structure and similar items should have similar user ratings. The challenge is to efficiently learn which items (nodes) have high expected payoffs without having to evaluate all or most nodes in potentially very large graphs.
The authors formulate this as a spectral bandit problem that exploits the smoothness of the payoff function over the graph structure. They introduce the concept of an effective dimension, which captures the complexity of smooth functions on graphs and is typically much smaller than the total number of nodes in real-world graphs. They propose two algorithms that leverage this effective dimension: one with linear scaling and another with sublinear scaling in this dimension. The algorithms use spectral properties of the graph to generalize observations from evaluated nodes to their neighbors, allowing efficient exploration and exploitation across the graph structure.