Spectral bandits for smooth graph functions

Michal Valko; Rémi Munos; Branislav Kveton; Tomáš Kocák

✨ TL;DR

This paper introduces a bandit algorithm for learning smooth functions over graphs, where neighboring nodes have similar payoffs. The approach achieves regret bounds that scale with an effective dimension rather than the total number of nodes, enabling efficient online learning for applications like content recommendation with thousands of items from just tens of evaluations.

01 · Problem

The paper addresses online learning problems on graphs where the payoff function is smooth, meaning neighboring nodes have similar expected rewards. Traditional multi-armed bandit approaches scale poorly with the number of nodes, making them impractical for large graphs. This is particularly relevant for applications like content-based recommendation systems, where items form a graph structure and similar items should have similar user ratings. The challenge is to efficiently learn which items (nodes) have high expected payoffs without having to evaluate all or most nodes in potentially very large graphs.

02 · Approach

The authors formulate this as a spectral bandit problem that exploits the smoothness of the payoff function over the graph structure. They introduce the concept of an effective dimension, which captures the complexity of smooth functions on graphs and is typically much smaller than the total number of nodes in real-world graphs. They propose two algorithms that leverage this effective dimension: one with linear scaling and another with sublinear scaling in this dimension. The algorithms use spectral properties of the graph to generalize observations from evaluated nodes to their neighbors, allowing efficient exploration and exploitation across the graph structure.

 · Key insights

Spectral bandits for smooth graph functions

What the paper shows.

↘ Related papers