Deploying FuzzyTags Securely
The properties provided by this system are highly dependent on selecting a false positive rate p. In the following sections we will cover a number of considerations you should take into account when integrating fuzzytags into a larger privacy preserving application.
How bad is it to let people select their own false-positive rates?
The short answer is "it depends".
The longer answer:
When different parties have different false positive rates the server can calculate the skew between a party's ideal false positive rate and observed false positive rate.
That skew leaks information, especially given certain message distributions. Specifically it leaks parties who receive a larger proportion of system messages than their ideal false positive rate.
i.e. for low false positive rates and high message volume for a specific receiver, the adversarial server can calculate a skew that leaks the recipient of individual messages - breaking privacy for that receiver.
It also removes those messages from the pool of messages that an adversarial server needs to consider for other receivers. Effectively reducing the anonymity set for everyone else.
Should Senders use an anonymous communication network?
If statistical & differential attacks are likely e.g. few parties download everything and multiple messages are expected to originate from a sender to a receiver or there is other information that might otherwise link a set of messages to a sender or receiver then you may want to consider how to remove that context.
One potential way of removing context is by having senders send their message to the server through some kind of anonymous communication network e.g. a mixnet or tor.
Be warned: This may not eliminate all the context!
How bad is it to select a poor choice of p?
Consider a pareto distribution where most users only receive a few messages, and small subset of users receive a large number of messages it seems that increasing the number of parties is generally more important to overall anonymity of the system than any individual selection of p.
Under a certain threshold of parties, trivial breaks (i.e. tags that only match to a single party) are a bigger concern.
Assuming we have large number of parties (N), the following heuristic emerges:
- Parties who only expect to receive a small number of messages can safely choose smaller false positive rates, up to a threshold θ, where θ > 2^-N. The lower the value of θ the greater the possibility of random trivial breaks for the party.
- Parties who expect a large number of messages should choose to receive all messages for 2 reasons:
- Even high false positive rates for power users result in information leaks to the server (due to the large skew) i.e. a server can trivially learn what users are power users.
- By choosing to receive all messages, power users don't sacrifice much in terms of bandwidth, but will provide cover for parties who receive a small number of messages and who want a lower false-positive rate.
(We consider a pareto distribution here because we expect many applications to have parties that can be modelled as such - especially over short-time horizons)