Topic 2: Information-Theoretic Security and Privacy
Thursday 20 May Schedule
Information-theoretic security and privacy
- 13:00: Keynote talk by Sennur Ulukus (University of Maryland)
- Title: Recent Advances in Private Information Retrieval: Semantics, Symmetry, Storage and Side Information
Private information retrieval (PIR) refers to the problem of retrieving a file (a message) out of K messages from N distributed databases in such a way that no individual database can tell which message has been retrieved, hence the name, “private” information retrieval. PIR has originated in the computer science literature in late 1990s and has been revisited by the information theory community recently. Information-theoretic reformulation of the problem defines the “PIR capacity” as the largest number of bits that can be retrieved privately per download, equivalently, the smallest number of downloads needed per bit of privately retrieved information. In this talk, I will describe the problem, summarize a few break-through results in the history of the problem, and present some of our recent results focusing on semantics of information, symmetry in privacy, effects of storage constraints on privacy, and critical use of side information especially in single-database systems.
- 14:00: Invited talk by Mikael Skoglund (KTH)
- Title: Secure Block Source Coding with Sequential Encoding
We introduce fundamental bounds on achievable cumulative rate distribution functions (CRDF) to characterize a sequential encoding process that ensures lossless or lossy reconstruction subject to an average distortion criterion using a non-causal decoder. The CRDF describes the rate resources spent sequentially to compress the sequence. We also include a security constraint that affects the set of achievable CRDF. The information leakage is defined sequentially based on the mutual information between the source and its compressed representation, as it evolves. To characterize the security constraints, we introduce the concept of cumulative leakage distribution functions (CLF), which determines the allowed information leakage as distributed over encoded sub-blocks. Utilizing tools from majorization theory, we derive necessary and sufficient conditions on the achievable CRDF for a given independent and identically distributed (IID) source and CLF. One primary result of this paper is that the concave-hull of the CRDF characterizes the optimal achievable rate distribution.
- 14:20: Invited talk by Gautam Kamath (University of Waterloo)
- Title: Private Hypothesis Selection
We revisit classical methods for hypothesis selection, such as the Scheffe estimator, in the context of differential privacy. Given samples from an unknown probability distribution P and a set of m probability distributions H, the goal is to output, in a ε-differentially private manner, a distribution from H whose total variation distance to P is comparable to that of the best such distribution (which we denote by α). The sample complexity of our basic algorithm is O(log(m)/α2+ log(m)/α×ε), representing a minimal cost for privacy when compared to the non-private algorithm.
As a corollary of our approach, given a cover for a class of distributions, we can provide a learning algorithm. Our algorithm is shown to be essentially sample optimal for every class of distributions, due to lower bounds for differentially private learning in terms of packings, combined with covering-packing duality. For example, we give the first near-optimal sample complexity algorithms for estimating multivariate Gaussians.
Based on joint works with Ishaq Aden-Ali, Hassan Ashtiani, Mark Bun, Thomas Steinke, and Zhiwei Steven Wu.
- 14:40: Invited talk by Ravi Tandon (University of Arizona)
- Title: Capacity of Information Retrieval with Latent Variable Privacy
Private information retrieval (PIR) allows a user to download one out of K messages from a curious database, without revealing the message index to the database. Achieving perfect information-theoretic privacy, however, comes with a prohibitive overhead in many practical scenarios. For instance, one has to download the entire repository of messages when the repository is hosted by a single database. In many settings of interest, however, one may be interested in only hiding latent attributes that could be inferred from the identity of the content being retrieved. A compelling use-case is as follows: a user accesses an online service to obtain information (e.g., news articles, movies, etc.) hosted by a remote database. The retrieved information reveals with high certainty the user's political affiliation, making her a target of political advertisements and biased news reports. This scenario is commonplace for the plethora of online services that employ data analytics to profile users and infer private information such as their religious, political views, and personality traits.
In this talk, we will introduce the problem of latent-variable PIR (LV-PIR), where the goal is to retrieve the desired content efficiently, while keeping sensitive latent attributes perfectly private. We will describe our recent results on the LV-PIR problem, which show that it is possible to achieve significant savings in download overhead compared to conventional PIR protocols. In addition to describing the key new ideas and techniques needed to characterize the capacity of LV-PIR for a single database, we will also discuss low-complexity, albeit sub-optimal schemes that can also outperform conventional PIR based solutions. Several open problems that emerge from this work will also be discussed.
This is joint work with Islam Samy (U. Arizona), Mohamed Atia (U. Arizona) and Prof. Loukas Lazos (U. Arizona).
- 15:00: Invited talk by Christina Fragouli (UCLA)
- Title: Distortion Security for Cyber Physical Systems
In Cyber Physical Systems (CPS), inference based on communicated data is of critical significance as it can be used to manipulate or damage the control operations by adversaries. In our work, we explored distortion based security measures, that use small amounts of key and low complexity operations to protect the information that matters for control decisions, as opposed to raw data. We showed that for a single source, each additional bit of secret key is exponentially useful in distortion security and designed optimal polynomial time encoders; we also considered protecting time-series data, such as resulting from a drone trajectory, and derived schemes both for average and worst case distortion measures.
- 15:30-16:00: Break
16:00: Invited Session on Blockchains and Communication Systems
Organized by Giulia Fanti (Carnegie Mellon University)
- 16:00: Invited talk by Sreeram Kannan (University of Washington)
- Title: Fair ordering for Permissionless Blockchains
Over the past five years, a significant line of research has investigated the blockchain consensus problem in the general permissionless setting, where protocol nodes can leave and join dynamically. The work of Garay et al. (Eurocrypt 2015) and Pass et al. (Eurocrypt 2017) showed the security properties of consistency and liveness for Nakamoto's seminal proof-of-work protocol. However, consistency and liveness do not provide any guarantees on the relationship between the order in which transactions arrive into the network and the finalized order in the ledger, making protocols prone to transaction order-manipulation attacks. In practice, there have been severe transaction manipulation attacks in decentralized exchanges on Ethereum, where nodes can front-run others transactions. While recent work by Kelkar et al. (Crypto 2020) has created new definitions and protocols for fair ordering in the permissioned setting, the permissionless setting (which is where Ethereum and most public blockchains operate) remains open. In this work, we provide the first definition of order-fairness in the permissionless setting and provide two protocols that realize it. Our protocols work in a synchronous network and use an underlying longest-chain blockchain. As an added contribution, we show that any fair ordering protocol achieves a powerful zero-block confirmation property, through which honest transactions can be securely confirmed even before they are included in any block.
- 16:20: Invited talk by Jacob Leshno (University of Chicago)
- Title: Economic Security of Distributed Systems
We consider the economic security of a distributed system against a profit motivated attacker, and define a system to be economically secure if an attacker's cost of inducing a consistency violation (e.g., double spend) is high. Contrary to popular belief, we find that mining rewards are neither sufficient nor necessary for economic security of the system. We show that when miners are profit motivated and rational, Nakamoto consensus provides weak economic security guarantees even if mining rewards are large. Alternative explanations for the lack of attacks on Bitcoin and other cryptocurrencies do not require large mining rewards. An alternative protocol design shows that economic security can be achieved without large payment to miners.
- 16:40: Invited talk by Pramod Viswanath (UIUC)
- Title: Information and Coding Theory for Blockchains and Vice Versa
Blockchains are the technology underlying decentralized digital trust platforms. At the heart of blockchains is a core multiterminal communication problem. In this talk we use the lens of information theory to view this communication problem as a new arbitrarily varying channel (AVC) and state a corresponding capacity theorem. The information theoretic viewpoint sheds light on the mathematical formulation of the core blockchain operation; conversely, blockchains provide new models and arenas for information theory to grow and make an impact on. Given the basic role of communication, it is natural for coding theory to play a basic role as well; conversely, we find that the blockchain application raises new and basic coding theoretic formulations.
- 17:00: Invited talk by Lara Dolecek (UCLA)
- Title: Concentrated Stopping Set Design for Coded Merkle Tree: Improving Security Against Data Availability Attacks in Blockchain Systems
- Authors: Debarnab Mitra, Lev Tauz, and Lara Dolecek (UCLA)
In certain blockchain systems, light nodes are clients that download only a small portion of the block. Light nodes are vulnerable to data availability (DA) attacks where a malicious node hides an invalid portion of the block from the light nodes. Recently, a technique based on erasure codes called Coded Merkle Tree (CMT) was proposed by Yu et al. that enables light nodes to detect a DA attack with high probability. The CMT is constructed using LDPC codes for fast decoding but can fail to detect a DA attack if a malicious node hides a small stopping set of the code. To combat this, Yu et al. used well-studied techniques to design random LDPC codes with high minimum stopping set size. Although effective, these codes are not necessarily optimal for this application. In this talk, we demonstrate a more specialized LDPC code design to improve the security against DA attacks. We achieve this goal by providing a deterministic LDPC code construction that focuses on concentrating stopping sets to a small group of variable nodes rather than only eliminating stopping sets. We design these codes by modifying the Progressive Edge Growth algorithm into a technique called the entropy-constrained PEG (EC-PEG) algorithm. This new method demonstrates a higher probability of detecting DA attacks and allows for good codes at short lengths.