The poster appeared at TPDP 2024. Graham gave the conference talk @ PODS.

Bug Report: 2026-04-22

Christian and his co-authors identified a bug in the proof for non-streaming version. The conference version of the non-streaming HHH protocol is not private.

The sketch of the non-privacy argument is that a single $\gamma \samples \text{Laplace(1/$\epsilon$)}$ is used across every node in the tree. But differential privacy mandates that every 1-sensitive query release uses a fresh sample Laplace distribution with appropriate variance. As we re-use $\gamma$, the adversary has more information about $\gamma$ then differential privacy allows.

As a fix we will use $\gamma_p$ for every node $p \in \mathcal{H}$ in the tree, where $\gamma_p \samples \text{Laplace}(O(1/\epsilon))$, and truncate. Full proof available shortly.

Abstract

The task of finding Hierarchical Heavy Hitters (HHH) was introduced by Cormode et al. [12] as a generalisation of the heavy hitter problem. While finding HHH in data streams has been studied extensively, the question of releasing HHH when the underlying data is private remains unexplored. In this paper, we formalise and study the notion of differentially private HHH, in both the streaming and non-streaming setting. In the non-streaming setting, we show the surprising result that the relative error in estimating the count for any prefix is independent of the height of the hierarchy and the number of heavy hitters in the stream. Additionally, our algorithms also improve the error guarantees of Ghazi et al. [24] for the problem of counting over trees. Meanwhile, in the streaming setting, the main issue is that although the exact version of HHH has low global sensitivity (as counting queries are 1-sensitive), the approximation functions due to streaming have high global sensitivity, linear in the available space. Despite this obstacle, we show that the absolute error for estimating frequencies in the streaming setting is independent of the available space.