Population Genetics. Matthew B. Hamilton
Чтение книги онлайн.
Читать онлайн книгу Population Genetics - Matthew B. Hamilton страница 13
It is also possible to use well‐tested and accepted model expectations as a basis to hypothesize what processes caused an observed pattern in a biological population. Again, to use a D. melanogaster population as an example, we might ask whether an observed change in allele frequency over some generations in a wild population could be explained by genetic drift. If the observed allele frequency change is within the range of the predicted change in allele frequencies based on a model of genetic drift, then we have identified a possible cause of the observed pattern. Comparing observed genetic patterns in populations often requires modifications to existing models or the construction of novel models in order to develop appropriate expectations. For example, a model of genetic drift constructed for D. melanogaster might naturally assume that all individuals in the population are diploid (individuals that possess paired sets of homologous chromosomes). If we wanted to use that same model to predict genetic drift in a population of honeybees, we would have to account for the fact that their males are haploid (individuals that possess single copies of each chromosome) while females are diploid. This change in reproductive biology could be taken into account by altering the assumptions of the model of genetic drift to make predictions appropriate for honeybee populations. Note that without some modifications, a single model of genetic drift would not accurately predict allele frequencies over time in both fruit flies and honeybees since their patterns of reproduction and chromosomal inheritance are different.
Parameters and parameter estimates
While developing the expectations of population genetics in this book, we will most often be working with idealized quantities. For example, allele frequency in a population is a fundamental quantity. For a genetic locus with two alleles, A and a, it is common to say that p equals the frequency of the A allele and q equals the frequency of the a allele. In mathematics, parameter is another term for an idealized quantity like an allele frequency. It is assumed that parameters have an exact value. Put another way, parameters are idealized quantities where the messy, real‐life details of how to measure the quantities they represent are completely ignored.
Empirical population genetics measures quantities such as allele frequencies to give parameter estimates by sampling and then measuring the alleles and genotypes present in actual populations. All experiments, observations, and even simulations in population genetics produce parameter estimates of some sort. There is a subtle notational convention used to indicate an estimate, that is, the hat or ^ character above a variable. Estimates wear hats whereas parameters do not. Using allele frequency as an example, we would say
(pronounced “p hat”) equals the number of A alleles sampled divided by the total number of alleles sampled. Intuitively, we can see from the denominator in the expression for that the allele frequency estimate will depend on the sample we gather to make the estimate.In actual populations, a parameter has a true value. For the allele frequency p, knowing this true value would require examining the genotype of every individual and counting all A and a alleles to determine their frequency in the population. This task is impractical or impossible in most cases. Instead, we rely on an estimate of allele frequency,
, obtained from a sample of individuals from the population. Sampling leads to some uncertainty in parameter estimates because repeating the sampling and parameter estimate process would likely lead to a somewhat different parameter estimate each time. Quantifying this uncertainty is important to determine whether repeated sampling might change a parameter estimate by just a little or change it by a lot. When dealing with parameters, we might expect that p + q = 1 exactly if there are only two alleles with allele frequencies p and q. However, if we are dealing with estimates, we might say the two allele frequency estimates should sum to approximately one ( + ≈ 1) since each allele frequency is estimated with some errors. The more uncertain the estimates of and , the less we should be surprised to find that their sum does not equal the expected value of one.Parameter: A variable or constant appearing in a mathematical expression; a value (usually unknown) used to represent a certain population characteristic; any factor that defines a system and determines or limits its performance.
Estimate: An indication of the value of an unknown quantity based on observed data; an approximation of a true score, parameter, or value; a statistical estimate of the value of a parameter.
It could be said that statistics sits at the intersection of theoretical and empirical population genetics. Parameters and parameter estimates are fundamentally different things. Estimation requires effort to understand sampling variation and quantify sources of error and bias in samples and estimates. The distinction between parameters and estimates is critical when comparing actual populations with expectations to test hypotheses. When large, random samples can be taken, estimates are likely to have minimal errors. However, there are many cases where estimates have a great deal of uncertainty, which limits the ability to evaluate expectations. There are also instances where very different processes may produce very similar expected results. In such cases, it may be difficult or impossible to distinguish the different potential causes of a pattern due to the approximate nature of estimates. While this book focuses mostly on parameters, it is useful to bear in mind that testing or comparing expectations requires the use of parameter estimates and statistics that quantify sampling error. The Appendix provides a review of some basic statistics that are used in the text.
Inductive and deductive reasoning
Population genetics employs both inductive and deductive reasoning in an effort to understand the biological processes operating in actual populations as well as to elucidate the general processes that cause population genetic phenomena. The inductive approach to population genetics involves assembling measures of genetic variation (parameter estimates) from various populations to build up evidence that can be used to identify the underlying processes that produced the observed patterns. This approach is logically identical to that used by Isaac Newton, who used knowledge of how objects fall to the surface of the Earth as well as knowledge of the movement of planets to arrive at the general principles of gravity. Application of inductive reasoning requires detailed familiarity with the various empirical data types in population genetics, such as DNA sequences, along with the results of studies that report observed patterns of genetic variation. From this accumulated empirical information, it is then possible to draw more general conclusions about the qualities and quantities of genetic variation in populations. Model organisms like D. melanogaster and Arabidopsis thaliana play a large role in population genetic conclusions reached by inductive reasoning. Because model organisms receive a large amount of scientific effort, for example, to completely sequence and annotate their genomes, a great deal of available genetic data are accumulated for these species. Based on this evidence, many inferences have been made about population genetic processes. Although model organisms are very rich sources of empirical information, the number of species is limited by definition so that any generalizations may not apply universally to all species.