In this section we will document simulations performed on the Email EU Core dataset (details below). In particular, we assess the worst-case scenario of a server with access to a sender-oracle (i.e. able to attribute tags to a particular sender) to understand how much information is leaked by fuzzytags without appropriate deployment mitigations.
Email EU Core Dataset Simulations
Nodes: 1004
Temporal Edges: 332334
Time span: 803 days
Citation: Ashwin Paranjape, Austin R. Benson, and Jure Leskovec. "Motifs in Temporal Networks." In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017.
Scenario 1
Setup: 1 month of email events between 1004 nodes, 20k events (5148 links). False positive rates: [0.007812, 0.5]. No entangling.
Result: An adversarial server can identify ~7% of original graph (393 links) with a 6% false positive rate. Threshold: 0.0001
Scenario 2
Setup: The same month of emails, 20k events (5148 links) Same false positive rates. Every tag is entangled with 1 random node.
Result: Server can identify ~6.6% of original graph with a 6.8% false positive rate.
Discussion
Entanglement seems to have some impact on the servers ability to relearn the social graph, in particular it increases the false positive rate of the derived graph. However, this impact is not significant enough in the observed simulation.