Publications
Refutations of Perfect Matchings In Expanders Is Hard
This work studies the complexity of refuting the existence of a
perfect matching in spectral expanders with an odd number of
vertices, in the Polynomial Calculus (PC) and Sum of Squares (SoS)
proof system. Austrin and Risse [SODA, 2021] showed that refuting
perfect matchings in sparse \(d\)-regular random graphs in
the above proof systems requires proofs with degree \(\Omega(n/\log n)\) on average. We
extend their result by showing the same lower bound holds for
all \(d\)-regular
graphs with a mild spectral gap.Additionally, as Austrin and Risse
do for random graphs, we show that the hardness of refuting a more
general class of formulae, which includes both perfect matchings and
even-colouring formulas, also reduces to the hardness of
refuting perfect matchings in spectral expanders, thereby settling
an open problem of Buss and Nordstrom.
Differentially Private Hierarchical Heavy Hitters
The task of finding Hierarchical Heavy Hitters (HHH) was introduced
by Cormode et al. [12] as a generalisation of the heavy hitter
problem. While finding HHH in data streams has been studied
extensively, the question of releasing HHH when the underlying data
is private remains unexplored. In this paper, we formalise and study
the notion of differentially private HHH, in both the streaming and
non-streaming setting. In the non-streaming setting, we show the
surprising result that the relative error in estimating the count
for any prefix is independent of the height of the hierarchy and the
number of heavy hitters in the stream. Additionally, our algorithms
also improve the error guarantees of Ghazi et al. [24] for the
problem of counting over trees. Meanwhile, in the streaming setting,
the main issue is that although the exact version of HHH has low
global sensitivity (as counting queries are 1-sensitive), the
approximation functions due to streaming have high global
sensitivity, linear in the available space. Despite this obstacle,
we show that the absolute error for estimating frequencies in the
streaming setting is independent of the available space.
The poster appeared at
TPDP
2024
Interactive Proofs For Differentially Private Counting
Differential Privacy (DP) is often presented as a strong
privacy-enhancing technology with broad applicability and advocated
as a de facto standard for releasing aggregate statistics on
sensitive data. However, in many embodiments, DP introduces a new
attack surface: a malicious entity entrusted with releasing
statistics could manipulate the results and use the randomness of DP
as a convenient smokescreen to mask its nefariousness. Since
revealing the random noise would obviate the purpose of introducing
it, the miscreant may have a perfect alibi. To close this loophole,
we introduce the idea of Interactive Proofs For Differential
Privacy, which requires the publishing entity to output a zero
knowledge proof that convinces an efficient verifier that the output
is both DP and reliable. Such a definition might seem unachievable,
as a verifier must validate that DP randomness was generated
faithfully without learning anything about the randomness itself. We
resolve this paradox by carefully mixing private and public
randomness to compute verifiable DP counting queries with
theoretical guarantees and show that it is also practical for
real-world deployment. We also demonstrate that computational
assumptions are necessary by showing a separation between
information-theoretic DP and computational DP under our definition
of verifiability.
The extended abstract appeared at
TPDP
2023
Seeker: Real-Time Interactive Search
This paper introduces Seeker, a system that allows users to
interactively refine search rankings in real time, through feedback
in the form of likes and dislikes. When searching online, users may
not know how to accurately describe their product of choice in
words. An alternative approach is to search an embedding space,
allowing the user to query using a representation of the item (like
a tune for a song, or a picture for an object). However, this
approach requires the user to possess an example representation of
their desired item. Additionally, most current search systems do not
allow the user to dynamically adapt the results with further
feedback. On the other hand, users often have a mental picture of
the desired item and are able to answer ordinal questions of the
form: Is this item similar to what you have in mind? With this
assumption, our algorithm allows for users to provide sequential
feedback on search results to adapt the search feed. We show that
our proposed approach works well both qualitatively and
quantitatively. Unlike most previous representation-based search
systems, we can quantify the quality of our algorithm by evaluating
humans-in-the-loop experiments.
Adversarial Principal Component Analysis
This paper studies the following question: where should an adversary
place an outlier of a given magnitude in order to maximize the error
of the subspace estimated by PCA? We give the exact location of this
worst possible outlier, and the exact expression of the maximum
possible error. Equivalently, we determine the information-theoretic
bounds on how much an outlier can tilt a subspace in its direction.
This in turn provides universal (worst-case) error bounds for PCA
under arbitrary noisy settings. Our results also have several
implications on adaptive PCA, online PCA, and rank-one updates. We
illustrate our results with a subspace tracking experiment.
Hands-on Introduction To Computer Science At The Freshman Level
This paper details the creation of a hands-on introduction course
that reflects the dramatic growth and diversity in computer science.
Our aim was to enable students to get an end-to-end perspective on
computer system design by building one. We report on a two-year
exercise in using the Arduino platform to build a series of hands-on
projects. We have used these projects in two course instances, and
have obtained detailed student feedback, which we analyze and
present in this paper. The instructions, code and videos developed
are available open-source.