Interactive Proofs For Distribution Testing With Conditional Oracles
In Submission
We revisit the framework of interactive proofs for distribution testing, first introduced by Chiesa and Gur (ITCS 2018), which has recently experienced a surge in interest, accompanied by notable progress (e.g., Herman and Rothblum, STOC 2022, FOCS 2023; Herman, RANDOM~2024). In this model, a data-poor verifier determines whether a probability distribution has a property of interest by interacting with an all-powerful, data-rich but untrusted prover bent on convincing them that it has the property. While prior work gave sample-, time-, and communication-efficient protocols for a testing and estimating a range of distribution properties, it all suffers from an inherent issue: for most interesting properties of distributions over a domain of size N, the verifier must draw at least Ω(N1/2) samples of its own. While sublinear in N, this is still prohibitive for large domains encountered in practice.In this work, we circumvent this limitation by augmenting the verifier with the ability to perform an exponentially smaller number of more powerful (but reasonable) pairwise conditional queries, effectively enabling them to perform ``local comparison checks’’ of the prover’s claims.We systematically investigate the landscape of interactive proofs in this new setting, giving polylogarithmic query and sample protocols for (tolerantly) testing all label-invariant properties, thus demonstrating exponential savings without compromising on communication, for this large and fundamental class of testing tasks.
Refutations of Perfect Matchings In Expanders Is Hard
In Submission
This work studies the complexity of refuting the existence of a perfect matching in spectral expanders with an odd number of vertices, in the Polynomial Calculus (PC) and Sum of Squares (SoS) proof system. Austrin and Risse [SODA, 2021] showed that refuting perfect matchings in sparse d-regular random graphs in the above proof systems requires proofs with degree Ω(n/log n) on average. We extend their result by showing the same lower bound holds for all d-regular graphs with a mild spectral gap.Additionally, as Austrin and Risse do for random graphs, we show that the hardness of refuting a more general class of formulae, which includes both perfect matchings and even-colouring formulas, also reduces to the hardness of refuting perfect matchings in spectral expanders, thereby settling an open problem of Buss and Nordstrom.
Differentially Private Hierarchical Heavy Hitters
The task of finding Hierarchical Heavy Hitters (HHH) was introduced by Cormode et al. [12] as a generalisation of the heavy hitter problem. While finding HHH in data streams has been studied extensively, the question of releasing HHH when the underlying data is private remains unexplored. In this paper, we formalise and study the notion of differentially private HHH, in both the streaming and non-streaming setting. In the non-streaming setting, we show the surprising result that the relative error in estimating the count for any prefix is independent of the height of the hierarchy and the number of heavy hitters in the stream. Additionally, our algorithms also improve the error guarantees of Ghazi et al. [24] for the problem of counting over trees. Meanwhile, in the streaming setting, the main issue is that although the exact version of HHH has low global sensitivity (as counting queries are 1-sensitive), the approximation functions due to streaming have high global sensitivity, linear in the available space. Despite this obstacle, we show that the absolute error for estimating frequencies in the streaming setting is independent of the available space.
The poster appeared at TPDP 2024. Graham gave the conference talk @ PODS.
Interactive Proofs For Differentially Private Counting
Differential Privacy (DP) is often presented as a strong privacy-enhancing technology with broad applicability and advocated as a de facto standard for releasing aggregate statistics on sensitive data. However, in many embodiments, DP introduces a new attack surface: a malicious entity entrusted with releasing statistics could manipulate the results and use the randomness of DP as a convenient smokescreen to mask its nefariousness. Since revealing the random noise would obviate the purpose of introducing it, the miscreant may have a perfect alibi. To close this loophole, we introduce the idea of Interactive Proofs For Differential Privacy, which requires the publishing entity to output a zero knowledge proof that convinces an efficient verifier that the output is both DP and reliable. Such a definition might seem unachievable, as a verifier must validate that DP randomness was generated faithfully without learning anything about the randomness itself. We resolve this paradox by carefully mixing private and public randomness to compute verifiable DP counting queries with theoretical guarantees and show that it is also practical for real-world deployment. We also demonstrate that computational assumptions are necessary by showing a separation between information-theoretic DP and computational DP under our definition of verifiability.
The extended abstract appeared at TPDP 2023
Seeker: Real-Time Interactive Search
This paper introduces Seeker, a system that allows users to interactively refine search rankings in real time, through feedback in the form of likes and dislikes. When searching online, users may not know how to accurately describe their product of choice in words. An alternative approach is to search an embedding space, allowing the user to query using a representation of the item (like a tune for a song, or a picture for an object). However, this approach requires the user to possess an example representation of their desired item. Additionally, most current search systems do not allow the user to dynamically adapt the results with further feedback. On the other hand, users often have a mental picture of the desired item and are able to answer ordinal questions of the form: Is this item similar to what you have in mind? With this assumption, our algorithm allows for users to provide sequential feedback on search results to adapt the search feed. We show that our proposed approach works well both qualitatively and quantitatively. Unlike most previous representation-based search systems, we can quantify the quality of our algorithm by evaluating humans-in-the-loop experiments.
Adversarial Principal Component Analysis
This paper studies the following question: where should an adversary place an outlier of a given magnitude in order to maximize the error of the subspace estimated by PCA? We give the exact location of this worst possible outlier, and the exact expression of the maximum possible error. Equivalently, we determine the information-theoretic bounds on how much an outlier can tilt a subspace in its direction. This in turn provides universal (worst-case) error bounds for PCA under arbitrary noisy settings. Our results also have several implications on adaptive PCA, online PCA, and rank-one updates. We illustrate our results with a subspace tracking experiment.