Data Science in Theory and Practice. Maria Cristina Mariani

Чтение книги онлайн.

Читать онлайн книгу Data Science in Theory and Practice - Maria Cristina Mariani страница 17

Data Science in Theory and Practice - Maria Cristina Mariani

Скачать книгу

rel="nofollow" href="#fb3_img_img_23b36e62-0d6e-571e-9d70-b220c914aabc.png" alt="script upper F"/> in that space if the inverse image of the set upper B, defined as f Superscript negative 1 Baseline left-parenthesis upper B right-parenthesis identical-to StartSet omega element-of upper E colon f left-parenthesis omega right-parenthesis element-of upper B EndSet is a set in sigma‐algebra script upper F, for all Borel sets upper B of double-struck upper R. Borel sets are sets that are constructed from open or closed sets by repeatedly taking countable unions, countable intersections and relative complement.

      Measurable functions will be discussed in detail in Section 20.5.

      Suppose we have a random vector bold upper X defined on a space left-parenthesis normal upper Omega comma script upper F comma bold p right-parenthesis. The sigma algebra generated by bold upper X is the smallest sigma algebra in left-parenthesis normal upper Omega comma script upper F comma bold p right-parenthesis that contains all the pre images of sets in double-struck upper R through bold upper X. That is

sigma left-parenthesis bold upper X right-parenthesis equals sigma left-parenthesis left-brace bold upper X Superscript negative 1 Baseline left-parenthesis upper B right-parenthesis bar for all upper B Borel sets in double-struck upper R right-brace right-parenthesis period

      This abstract concept is necessary to make sure that we may calculate any probability related to the random variable bold upper X.

      Any random vector has a distribution function, defined similarly with the one‐dimensional case. Specifically, if the random vector bold upper X has components bold upper X equals left-parenthesis upper X 1 comma ellipsis comma upper X Subscript n Baseline right-parenthesis, its cumulative distribution function or cdf is defined as:

upper F Subscript bold upper X Baseline left-parenthesis bold x right-parenthesis equals bold upper P left-parenthesis bold upper X less-than-or-equal-to bold x right-parenthesis equals bold upper P left-parenthesis upper X 1 less-than-or-equal-to x 1 comma ellipsis comma upper X Subscript n Baseline less-than-or-equal-to x Subscript n Baseline right-parenthesis for all bold x period

      Associated with a random variable bold upper X and its cdf upper F Subscript bold upper X is another function, called the probability density function (pdf) or probability mass function (pmf). The terms pdf and pmf refer to the continuous and discrete cases of random variables, respectively.

Experiment Random variable
Toss two dice bold upper X = sum of the numbers
Toss a coin 10 times bold upper X = sum of tails in 10 tosses

f Subscript bold upper X Baseline left-parenthesis bold x right-parenthesis equals bold upper P left-parenthesis bold upper X equals bold x right-parenthesis for all bold x period

      Definition 2.21 (Probability density function) The pdf, f Subscript bold upper X Baseline left-parenthesis bold x right-parenthesis of a continuous random variable bold upper X is the function that satisfies

upper F left-parenthesis bold x right-parenthesis equals upper F left-parenthesis x 1 comma ellipsis comma x Subscript n Baseline right-parenthesis equals integral Subscript negative infinity Superscript x 1 Baseline ellipsis integral Subscript negative infinity Superscript x Subscript n Baseline Baseline f Subscript bold upper X Baseline left-parenthesis t 1 comma ellipsis comma t Subscript n Baseline right-parenthesis d t Subscript n Baseline ellipsis d t 1 period

      We will discuss these notations in details in Chapter 20.

      Using these concepts, we can define the moments of the distribution. In fact, suppose that g colon double-struck upper R Superscript n Baseline right-arrow double-struck upper R is any function, then we can calculate the expected value of the random variable g left-parenthesis upper X 1 comma ellipsis comma upper X Subscript n Baseline right-parenthesis when the joint density exists as:

upper E left-bracket g left-parenthesis upper X 1 comma ellipsis comma upper X Subscript n Baseline right-parenthesis right-bracket equals integral Subscript negative infinity Superscript infinity Baseline ellipsis integral Subscript negative infinity Superscript infinity Baseline g left-parenthesis x 1 comma ellipsis comma x Subscript n Baseline right-parenthesis f left-parenthesis x 1 comma ellipsis comma x Subscript n Baseline right-parenthesis d x 1 ellipsis d x Subscript n Baseline period

Скачать книгу