Principal components tries to re-express the data as a sum of uncorrelated components. There are lots of other techniques which try to do similar things, like Fourier analysis, or wavelet decomposition. Things like Fourier analysis decompose the data into a sum of a xed set of basis functions or basis vectors. This has the advantage of making results comparable across data sets, and of making the meaning of the components clear. So why ever do PCA rather than a Fourier transform? First, in some situations the idea of doing a Fourier transform is just embar- rassingly weird. For the states or cars data ets, we could number the features and take cosines of the feature numbers, etc., but it just seems crazy. No such embarrassment attends PCA. Second, when using a xed set of components, there is no guarantee that a small number of components will give a good reconstruction of the original data. PCA guarantees that the first q components will do a better (mean-square) job of reconstructing the original data than any other linear method using only q components. Third, it is good at preserving distances between the points – the component scores give the optimal linear multidimensional scaling. PCA gives us uncorrelated components, which are generally not independent components; for that you need independent component analysis. PCA looks for linear combinations of the original features; one could well do better by nding nonlinear combinations. Rather than directions in feature space, these would be curves or surfaces. PCA is purely a descriptive technique; in itself it makes no prediction about what future data will look like. The Truth about Principal Components and Factor Analysis