Profit Driven Business Analytics. Baesens Bart
Чтение книги онлайн.
Читать онлайн книгу Profit Driven Business Analytics - Baesens Bart страница 10
The idea for PCA is simple: is it possible to engulf our data in an ellipsoid? If so, what would that ellipsoid look like? We would like four properties to hold:
1. Each principal component should capture as much variance as possible.
2. The variance that each principal component captures should decrease in each step.
3. The transformation should respect the distances between the observations and the angles that they form (i.e., should be orthogonal).
4. The coordinates should not be correlated with each other.
The answer to these questions lies in the eigenvectors and eigenvalues of the data matrix. The orthogonal basis of a matrix is the set of eigenvectors (coordinates) so that each one is orthogonal to each other, or, from a statistical point of view, uncorrelated with each other. The order of the components comes from a property of the covariance matrix XTX: if the eigenvectors are ordered by the eigenvalues of XTX, then the highest eigenvalue will be associated with the coordinate that represents the most variance. Another interesting property of the eigenvalues and the eigenvectors, proven below, is that the eigenvalues of XTX are equal to the square of the eigenvalues of X, and that the eigenvectors of X and XTX are the same. This will simplify our analyses, as finding the orthogonal basis of X will be the same as finding the orthogonal basis of XTX.
The principal component transformation of X will then calculate a new matrix P from the eigenvectors of X (or XTX). If V is the matrix with the eigenvectors of X, then the transformation will calculate a new matrix
. The question is how to calculate this orthogonal basis in an efficient way.The singular value decomposition (SVD) of the original dataset X is the most efficient method of obtaining its principal components. The idea of the SVD is to decompose the dataset (matrix) X into a set of three matrices, U, D, and V, such that
, where VT is the transpose of the matrix V1, and U and V are unitary matrices, so . The matrix D is a diagonal matrix so that each element di is the singular value of matrix X.Now we can calculate the principal component transformation P of X. If
, then we can calculate the expression , and identifying terms we can see that matrix V is composed by the eigenvectors of XTX, which are equal to the eigenvectors of X, and the eigenvalues of X will be equal to the square root of the eigenvalues of XTX, D2, as we previously stated. Thus, , with D the eigenvalues of X and U the eigenvectors, or left singular vectors, of X.Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.