Revisiting Active Sequential Prediction-Powered Mean Estimation

Maria-Eleni Sfyraki; Jun-Kun Wang

✨ TL;DR

This paper analyzes active sequential prediction-powered mean estimation, where labels are selectively queried and ML predictions fill in the gaps. The authors find that contrary to intuition, using a nearly constant query probability (ignoring uncertainty) often produces tighter confidence intervals than adaptive uncertainty-based querying.

01 · Problem

In prediction-powered inference, researchers want to estimate population means using a combination of expensive ground-truth labels and cheap ML model predictions. The key challenge is deciding when to query true labels versus relying on predictions. Prior work proposed mixing an uncertainty-based adaptive query strategy with a constant baseline probability, but the optimal mixing remained unclear. The fundamental question is how to allocate a limited labeling budget across sequential samples to minimize the width of confidence intervals around the mean estimate.

02 · Approach

The authors conduct both theoretical and empirical analysis of the query probability selection mechanism. They develop a non-asymptotic analysis that provides data-dependent bounds on confidence intervals for the mean estimator. They examine how different mixing weights between uncertainty-based and constant query probabilities affect performance. Additionally, they analyze what happens when a no-regret learning algorithm is used to adaptively choose query probabilities by treating the confidence bound as a loss function to minimize over time.

03 · Key insights

Revisiting Active Sequential Prediction-Powered Mean Estimation

What the paper

↘ Related papers