Prediction Revisited. Mark P. Kritzman
Чтение книги онлайн.
Читать онлайн книгу Prediction Revisited - Mark P. Kritzman страница 6
Information theory: A unified mathematical theory of communication, created by Claude Shannon, which expresses messages as sequences of 0s and 1s and, based on the inverse relationship of information and probability, prescribes the optimal redundancy of symbols to manage the speed and accuracy of transmission.
Circumstance: A set of attribute values that collectively describes an observation.
Informativeness: A measure of the information conveyed by the circumstances of an observation, based on the inverse relationship of information and probability. For an observation of a single attribute, it is equal to the observed distance from the average, squared. For an observation of two or more uncorrelated attributes, it is equal to the sum of each individual attribute's informativeness. For an observation of two or more correlated attributes—the most general case—it is given by the Mahalanobis distance of the observation from the average of the observations. Informativeness is a component of relevance. It does not depend on the units of measurement.
Co-occurrence: The degree of alignment between two attributes for a single observation. It ranges between –1 and +1 and does not depend on the units of measurement.
Correlation: The average co-occurrence of a pair of attributes across all observations, weighted by the informativeness of each observation. In classical statistics, it is known as the Pearson correlation coefficient.
Covariance matrix: A symmetric square matrix of numbers that concisely summarizes the spreads of a set of attributes along with the signs and strengths of their correlation. Each element pertains to a pair of attributes and is equal to their correlation times their respective standard deviations (the square root of variance or spread).
Mahalanobis distance: A standardized measure of distance or surprise for a single observation across many attributes, which incorporates all the information from the covariance matrix. The Mahalanobis distance of a set of attribute values (a circumstance) from the average of the attribute values measures the informativeness of that observation. Half of the negative of the Mahalanobis distance of one circumstance from another measures the similarity between them.
Similarity: A measure of the closeness between one circumstance and another, based on their attributes. It is equal to the opposite (negative) of half the Mahalanobis distance between the two circumstances. Similarity is a component of relevance.
Relevance: A measure of the importance of an observation to forming a prediction. Its components are the informativeness of past circumstances, the informativeness of current circumstances, and the similarity of past circumstances to current circumstances.
Partial sample regression: A two-step prediction process in which one first identifies a subset of observations that are relevant to the prediction task and, second, forms the prediction as a relevance-weighted average of the historical outcomes in the subset. When the subset from the first step equals the full-sample, this procedure converges to classical linear regression.
Asymmetry: A measure of the extent to which predictions differ when they are formed from a partial sample regression that includes the most relevant observations compared to one that includes the least relevant observations. It is computed as the average dissimilarity of the predictions from these two methods. Equivalently, it may be computed by comparing the respective fits of the most and least relevant subsets of observations to the cross-fit between them. The presence of asymmetry causes partial sample regression predictions to differ from those of classical linear regression. The minimum amount of asymmetry is zero, in which case the predictions from full-sample and partial-sample regression match.
Fit: The average alignment between relevance and outcomes across all observation pairs for a single prediction. It is normalized by the spreads of relevance and outcomes, and while the alignment for one pair of observations may be positive or negative, their average always falls between zero and one. A large value indicates that observations that are similarly relevant have similar outcomes, in which case one should have more confidence in the prediction. A small value indicates that relevance does not line up with the outcomes, in which case one should view the prediction more cautiously.
Bias: The artificial inflation of fit resulting from the inclusion of the alignment of each observation with itself. This bias is addressed by partitioning fit into two components—outlier influence, which is the fit of observations with themselves, and agreement, which is the fit of observations with their peers—and using agreement to give an unbiased measure of fit.
Outlier influence: The fit of observations with themselves. It is always greater than zero, owing to the inherent bias of comparing observations with themselves, and it is larger to the extent that unusual circumstances coincide with unusual outcomes.
Agreement: The fit of observations with their peers. It may be positive, negative, or zero, and is not systematically biased.
Precision: The inverse of the extent to which the randomness of historical observations (often referred to as noise) introduces uncertainty to a prediction.
Focus: The choice to form a prediction from a subset of relevant observations even though the smaller subset may be more sensitive to noise than the full sample of observations, because the consistency of the relevant subset improves confidence in the prediction more than noise undermines confidence.
Reliability: The average fit across a set of prediction tasks, weighted by the informativeness of each prediction circumstance. For a full sample of observations, it may be computed as the average alignment of pairwise relevance and outcomes and is equivalent to the classical R-squared statistic.
Complexity: The presence of nonlinearities or other conditional features that undermine the efficacy of linear prediction models. The conventional approach for addressing complexity is to apply machine learning algorithms, but one must counter the tendency of these algorithms to overfit the data. In addition, it can be difficult to interpret the inner workings of machine learning models. A simpler and more transparent approach to complexity is to filter observations by relevance. The two approaches can also be combined.
Preface
The path that led us to write this book began in 1999. We wanted to build an investment portfolio that would perform well across a wide range of market environments. We quickly came to the view that we needed more reliable estimates of volatilities and correlations—the inputs that determine portfolio risk—than the estimates given by the conventional method of extrapolating historical values. Our thought back then was to measure these statistics from a subset of the most unusual periods in history. We reasoned that unusual observations were likely to be associated with material events and would therefore be more informative than common observations, which probably reflected useless noise. We had not yet heard of the Mahalanobis distance, nor were we aware of Claude Shannon's information theory. Nonetheless, as we worked on our task, we derived the same formula Mahalanobis originated to analyze human skulls in India more than 60 years earlier.
As we extended our research to a broader set of problems, we developed a deep appreciation of the versatility of the Mahalanobis distance. In a single number, his distance measure tells us how dissimilar two items are from each other, accounting not only