Probability. Robert P. Dobrow

Чтение книги онлайн.

Читать онлайн книгу Probability - Robert P. Dobrow страница 12

Probability - Robert P. Dobrow

Скачать книгу

reading of the manuscript and for many suggestions that led to numerous improvements.

      The staff at Wiley, including Steve Quigley, Amy Hendrickson, and Sari Friedman, provided encouragement and valuable assistance in preparing this book.

      This book is accompanied by a companion website:

       www.wiley.com/go/wagaman/probability2e

image

      The book companion site is split into:

       The student companion site includes chapter reviews and is open to all.

       The instructor companion site includes the instructor solutions manual.

      All theory, dear friend, is gray, but the golden tree of life springs ever green.

      —Johann Wolfgang von Goethe

      Probability began by first considering games of chance. But today, it has practical applications in areas as diverse as astronomy, economics, social networks, and zoology that enrich the theory and give the subject its unique appeal.

      In this book, we will flip coins, roll dice, and pick balls from urns, all the standard fare of a probability course. But we have also tried to make connections with real-life applications and illustrate the theory with examples that are current and engaging.

      You will see some of the following case studies again throughout the text. They are meant to whet your appetite for what is to come.

      There are about one trillion websites on the Internet. When you google a phrase like “Can Chuck Norris divide by zero?,” a remarkable algorithm called PageRank searches these sites and returns a list ranked by importance and relevance, all in the blink of an eye. PageRank is the heart of the Google search engine. The algorithm assigns an “importance value” to each web page and gives it a rank to determine how useful it is.

      What is the PageRank of site x? Suppose the web surfer has been randomly walking the web for a very long time (infinitely long in theory). The probability that they visit site x is precisely the PageRank of that site. Sites that have lots of incoming links will have a higher PageRank value than sites with fewer links.

      The PageRank algorithm is actually best understood as an assignment of probabilities to each site on the web. Such a list of numbers is called a probability distribution. And since it comes as the result of a theoretically infinitely long random walk, it is known as the limiting distribution of the random walk. Remarkably, the PageRank values for billions of websites can be computed quickly and in real time.

      Turn to a random page in this book. Look in the middle of the page and point to the first number you see. Write down the first digit of that number.

Bar chart depicts Benford's law describes the frequencies of first digits for many real-life datasets.

      Durtschi et al. [2004] describe an investigation of a large medical center in the western United States. The distribution of first digits of check amounts differed significantly from Benford's law. A subsequent investigation uncovered that the financial officer had created bogus shell insurance companies in her own name and was writing large refund checks to those companies. Applications to international trade were investigated in Cerioli et al. [2019].

      Few areas of modern science employ probability more than biology and genetics. A strand of DNA, with its four nucleotide bases adenine, cytosine, guanine, and thymine, abbreviated by their first letters, presents itself as a sequence of outcomes of a four-sided die. The enormity of the data—about three billion “letters” per strand of human DNA—makes randomized methods relevant and viable.

      Restriction sites are locations on the DNA that contain a specific sequence of nucleotides, such as G-A-A-T-T-C. Such sites are important to identify because they are locations where the DNA can be cut and studied. Finding all these locations is akin to finding patterns of heads and tails in a long sequence of coin tosses. Theoretical limit theorems for idealized sequences of coin tosses become practically relevant for exploring the genome. The locations for such restriction sites are well described by the Poisson process, a fundamental class of random processes that model locations of restriction sites on a chromosome, as well as car accidents on the highway, service times at a fast food chain, and when you get your text messages.

      On the macrolevel, random processes are used to study the evolution of DNA over time in order to construct evolutionary trees showing the divergence of species. DNA sequences change over time as a result of mutation and natural selection. Models for sequence evolution, called Markov processes, are continuous time analogues of the type of random walk models introduced earlier.

      Miller

Скачать книгу