Hardness of Approximation Between P and NP. Aviad Rubinstein

Чтение книги онлайн.

Читать онлайн книгу Hardness of Approximation Between P and NP - Aviad Rubinstein страница 13

Автор:
Жанр:
Серия:
Издательство:
Hardness of Approximation Between P and NP - Aviad Rubinstein ACM Books

Скачать книгу

defines a satisfying assignment to Β. Therefore there is a one-to-one correspondence between satisfying assignments to the 3SAT formula and to the instance of LABEL COVER.

      ■

      For a universe (or ground set) U, a concept C is simply a subset of U and a concept class C is a collection of concepts. For convenience, we sometimes relax the definition and allow the concepts to not be subsets of U; all definitions here extend naturally to this case.

      The VC and Littlestone’s Dimensions can be defined as follows.

       Definition 2.8

      VC Dimension [Vapnik and Chervonenkis 1971]. A subset S ⊆ U is said to be shattered by a concept class C if, for every TS, there exists a concept C ∈ C such that T = SC.

      The VC Dimension VC-dim(C, U) of a concept class C with respect to the universe U is the largest d such that there exists a subset S ⊆ U of size d that is shattered by C.

       Definition 2.9

      Mistake tree and Littlestone’s Dimension [Littlestone 1987]. A depth-d instance-labeled tree of U is a full binary tree of depth d such that every internal node of the tree is assigned an element of U. For convenience, we will identify each node in the tree canonically by a binary string s of length at most d.

      A depth-d mistake tree (aka shattered tree [Ben-David et al. 2009]) for a universe U and a concept class C is a depth-d instance-labeled tree of U such that, if we let vs ∈ U denote the element assigned to the vertex s for every s ∈ {0, 1}<d, then, for every leaf ∈ {0, 1}d, there exists a concept C ∈ C that agrees with the path from root to it, i.e., that, for every imageC iff i+1 = 1 where ≤i denotes the prefix of of length i.

      The Littlestone’s Dimension L-dim(C, U) of a concept class C with respect to the universe U is defined as the maximum d such that there exists a depth-d mistake tree for U, C.

      An equivalent formulation of Littlestone’s Dimension is through mistakes made in online learning, as stated below. This interpretation will be useful in the proof of Theorem 14.4.

       Definition 2.10

      Mistake bound. An online algorithm A is an algorithm that, at time step i, is given an element xi ∈ U and the algorithm outputs a prediction pi ∈ {0, 1} whether x is in the class. After the prediction, the algorithm is told the correct answer hi ∈ {0, 1}. For a sequence (x1, h1),…, (xn, hn), a prediction mistake of A is defined as the number of incorrect predictions, i.e., Σi∈n 1[pihi]. The mistake bound of A for a concept class C is defined as the maximum prediction mistake of A over all the sequences (x1, h1),…, (xn, hn), which corresponds to a concept C ∈ C (i.e., hi = 1[xiC] for all i ∈ [n]).

       Theorem 2.3

      Littlestone [1987]. For any universe U and any concept class C, L-dim(C, U) is equal to the minimum mistake bound of C, U over all online algorithms.

      The following facts are well-known and follow easily from the above definitions.

       Fact 2.2

      For any universe U and concept class C, we have

image

       Fact 2.3

      For any two universes U1, U2 and any concept class C,

image

      In this section, we introduce information-theoretic quantities used mostly in Chapter 12. For a more thorough introduction, the reader should refer to Cover and Thomas [2012]. Unless stated otherwise, all log’s in this book paper are base-2.

       Definition 2.11

      Let μ be a probability distribution on sample space Ω. The Shannon entropy (or just entropy) of μ, denoted by H(μ), is defined as image.

       Definition 2.12

      Binary entropy function. For p ∈ [0, 1], the binary entropy function is defined as follows (with a slight abuse of notation): H(p) := − p log p − (1 − p) log(1 − p).

       Fact 2.4

      Concavity of binary entropy. Let μ be a distribution on [0, 1], and let p ~ μ. Then image.

      For a random variable A, we shall write H(A) to denote the entropy of the induced distribution on the support of A. We use the same abuse of notation for other information-theoretic quantities appearing later in this section.

       Definition 2.13

      The conditional entropy of a random variable A conditioned on Β is defined as

image

       Fact 2.5

      Chain rule.

image

       Fact 2.6

      Conditioning decreases entropy. image.

      Another measure we will use (briefly) in our proof is that of mutual information, which informally captures the correlation between two random variables.

       Definition 2.14

      Conditional mutual information. The mutual information between two random variables A and Β, denoted by I(A; B), is defined as

image

      The conditional mutual information between A and Β given C, denoted by I(A; B|C), is defined as

Скачать книгу