Federated Learning. Yang Liu

Чтение книги онлайн.

Читать онлайн книгу Federated Learning - Yang Liu страница 15

Federated Learning - Yang  Liu Synthesis Lectures on Artificial Intelligence and Machine Learning

Скачать книгу

e.g., Gentry [2009]); (2) Approximate-GCD based FHE (see, e.g., Dijk et al. [2010]); (3) (R)LWE-based FHE (e.g., Lyubashevsky et al. [2010] and Brakerski et al. [2011]); and (4) NTRU-like FHE (see, e.g., López-Alt et al. [2012]).

      The existing FHE schemes are built on SHE, by assuming circular security and implementing an expensive bootstrap operation. The bootstrap operation re-encrypts the ciphertexts, by evaluating the decryption and encryption functions over the ciphertexts and the encrypted secret key, in order to reduce the noise of ciphertext for further computation. As a result of the costly bootstrap operation, FHE schemes are very slow and not competitive against general MPC approaches in practice. Researchers are now focusing on finding more efficient SHE schemes that satisfy certain requirements, instead of trying to develop an FHE scheme. In addition, FHE schemes assume circular security (a.k.a. key dependent message (KDM) security), which keeps the secret key secure by encrypting it with the public key. However, no FHE is proven to be semantically secure with respect to any function and is IND-CCA1 secure [Acar et al., 2018].

       Application in PPML

      Many research efforts based on HE have been devoted to PPML in the past. For example, Hardy et al. [2017] proposed algorithms for privacy-preserving two-party logistic regression for vertically partitioned data. Paillier’s scheme is leveraged in secure gradient descent to train the logistic regression model, where constant-multiplication and addition operations are conducted via a mask encrypted by Paillier’s scheme and the intermediate data computed by each party. The encrypted masked intermediate results are exchanged between the two parties in the secure gradient descent algorithm. Finally, the encrypted gradient is sent to a coordinator for decryption and model update.

      CryptoNets [Gilad-Bachrach et al., 2016] is an HE-based methodology announced by Microsoft Research that allows secure evaluation (inference) of encrypted queries over already trained neural networks on cloud servers: queries from the clients can be classified securely by the trained neural network model on a cloud server without inferring any information about the query or the result. The CryptoDL [Hesamifard et al., 2017] framework is a leveled HE-based approach for secure neural network inference. In CryptoDL, several activation functions are approximated using low-degree polynomials and mean-pooling is used as a replacement for max-pooling. The GAZELLE [Juvekar et al., 2018] framework is proposed as a scalable and low-latency system for secure neural network inference. In GAZELLE, to conduct secure nonlinear function evaluation in neural network inference, HE and traditional secure two-party computation techniques (such as GC) are combined in an intricate way. The packed additive homomorphic encryption (PAHE) embraced in GAZELLE allows single instruction multiple data (SIMD) arithmetic homomorphic operations over encrypted data.

      FedMF [Chai et al., 2019] uses Paillier’s HE for secure federated matrix factorization assuming honest-but-curious server and honest clients. Secure federated transfer learning is studied via Paillier’s HE scheme in Liu et al. [2019], where the semi-honest third party is into the discard by mixing HE with additive secret sharing in decryption process.

      DP was originally developed to facilitate secure analysis over sensitive data. With the rise of ML, DP has become an active research field again in the ML community. This is motivated by the fact that many exciting results from DP can be applied to PPML [Dwork et al., 2016, 2006]. The key idea of DP is to confuse the adversaries when they are trying to query individual information from the database so that adversaries cannot distinguish individual-level sensitivity from the query result.

       Definition

      DP is a privacy definition initially proposed by Dwork et al. [2006], developed in the context of statistical disclosure control. It provides an information-theoretic security guarantee that the outcome of a function to be insensitive to any particular record in the dataset. Therefore, DP can be used to resist the membership inference attack. The (∊, δ)-differential privacy is defined as follows.

      Definition 2.2 (∊, δ)-differential privacy. A randomized mechanism M preserves (∊, δ)-differential privacy if given any two datasets D and D′ differing by only one record, and for all SRange (M),

Image

      where ∊ is the privacy budget and δ is the failure probability.

      The quantity Image is called the privacy loss, with ln denoting natural logarithm operation. When δ = 0, the stronger notion of ∊-differential privacy is achieved.

      DP has utility-privacy trade-offs as it introduces noise to data. Jayaraman and Evans [2019] found out that current mechanisms for differential privacy for ML rarely offer acceptable utility-privacy trade-offs: settings that provide limited accuracy loss provide little effective privacy, and settings that provide strong privacy result in large accuracy loss.

       Categorization of DP Schemes

      Typically, there are mainly two ways to achieve DP by adding noise to the data. One is the addition of noise according to the sensitivity of a function [Dwork et al., 2006]. The other is choosing noise according to an exponential distribution among discrete values [McSherry and Talwar, 2007].

      The sensitivity of a real-valued function expresses the maximum possible change in its value due to the addition or removal of a single sample.

      Definition 2.3 Sensitivity. For two datasets D and D′ differing by only one record, and a function M : D → Rd over an arbitrary domain, the sensitivity of M is the maximum change in the output of M over all possible inputs:

Image

      where ‖·‖ is a norm of the vector. The l1-sensitivity or the l2-sensitivity is defined when the l1-norm or l2-norm is applied, respectively.

      We denote the Laplace distribution with parameter b as Lap(b). Lap(b) has a probability density function Image. Given a function M with sensitivity ΔM, the addition of noise drawn from a calibrated Laplace distribution Lap(ΔM/∊) maintains ∊-differential privacy [Dwork et al., 2006].

      Theorem 2.4 Given a function M : D → Rd over an arbitrary domain D, for any input X, the function:

Image

      provides ∊-differential privacy. The ∊-differential privacy can also be achieved by adding independently generated Laplace noise from distribution Lap(ΔM/∊) to each of the d output terms.

      The addition of Gaussian or binomial noise, scaled to the l2-sensitivity of the function, sometimes yields better accuracy, while only ensuring the weaker (∊, δ)-differential privacy [Dwork et al., 2006, Dwork and Nissim, 2004].

      The exponential mechanism [McSherry and Talwar, 2007] is another way to obtain DP. The exponential mechanism is given a quality function q that scores outcomes of a calculation, where higher scores are better. For a given database and ∊ parameter, the quality function induces a probability distribution over the output domain, from which the exponential mechanism samples the outcome. This probability distribution favors high-scoring outcomes,

Скачать книгу