Principle Component Analysis (PCA)... High Eigenvalues!

2441
5
06-21-2011 05:35 PM
VictorThomasson
New Contributor
I'm new on this forum but regularly read some of the posts.
I'm currently trying to make principle components with 19 climatic variables. They are all from WORLDCLIM in the same coordinate system and same extent. By default I created 19 principle components and it worked! But I realized the eigenvalues were really high. Here is what I get for the first 3.

PCA   Eigenvalues
1-     7.652846e+004
2-     3.276850e+003
3-     9.996242e+002

Why are they so high? In regular PCAs, the eigenvalues are usually equal or lower than the  number number of variables... Is it different because were are dealing with spatial analyses?

Thank you,
Victor
0 Kudos
5 Replies
VictorThomasson
New Contributor
This is not good. My 19 climatic variables are highly correlated and I would like to produce principle component to have fewer uncorrelated variables. I believe I should be doing a PCA with the correlation matrix no? Any idea how to do this?
0 Kudos
JeffreyEvans
Occasional Contributor III
Using covariance only makes sense if all the variables are measured in the same units. Variables with high variances will dominate the principle components. You could mitigate this by standardizing your variables. One assumption of PCA is that relationships are approximately linear or monotonic. Non-linear relationships cause distortion of the vectors through p-dimensional space making results uninformative. You need to be careful to not include variables that are inversely correlated and I would check the normality in the variables as well. Centroids are derived from the means, making PCA sensitive to non-normality.  

You need to really think about your data and not just apply a statistic to reduce dimensionality and redundancy for convenience sake. Your climate variables are representing temperature, moisture and timing with many variables that are inversely correlated. Additionally, some climate transformations create curvilinear relationships. Some exploratory analysis will reveal much about your data, test assumptions, and allow you to make informed decisions before applying a PCA. Creating a subset of variables that represents the process that you are interested in will provide more relevant interpretation of the principle axes.
0 Kudos
Siew_FongChen
New Contributor
Hey,

Actually i have had this problem before. What happened was i realized that i have forgotten to normalize my data before running Principal Components. Is this the case?

Actually i do have a problem now. I am also doing the same thing but also with topographic and edaphic data. My problem is... eigenvalues are too low, and i do not know how to use the results to carry out varimax rotation.

Help is VERY VERY much appreciated!
0 Kudos
JeffreyEvans
Occasional Contributor III
Bill, thanks for addressing some points in my post, his clarity of thinking is always welcome. I do want to say that PCA does have a multivariatie normality assumption (crack open any introductory multivariate statistics textbook). To the question of violating assumptions. The overwhelming consensus is that it does not matter as long as your intent is data reduction and not inference. In statistics we violate assumptions all the time, it is just a matter of how much and what the effect is. In image analysis, where the intent is pure data reduction, PCA is applied commonly without a second thought to data distributions.

The motivation behind my advice was in specific reference to understanding climate process. I can think of very few instances in climate analysis where, at some point along the line, inference is not necessary. As such, it does not strike me as a data reduction problem. The relationship between temperature or precipitation and timing/duration (e.g., frost free period) can be a driver of many ecological processes. It is important to understand these relationships to fully realize the effects on a given process. These relationships can be non-linear lending themselves to more suitable statistics than PCA. Additionally, it is possible for certain climate variables to have multiple modes, which standardization does not correct, thus resulting in difficult interpretation. PCA is applied commonly to climate data (in both climate science and ecology) and there has recently been considerable "push back" as to how analysis are conducted, particularly in community ecology. I review for several ecological and modeling journals and this trend is becoming clear. I have seen several papers rejected recently because one of the reviewers does not agree with how a PCA was applied and does not believe that the results are supported. You just need to be careful that your question does not unintentionally lead to an ordination type analysis. This is driven by what you intend to do with the resulting reduced-climate PCA results. If your intent is to look at a process (e.g., species richness, productivity, etc...) along an ecological gradient then you start running into issues. This is very well documented in the ecological literature.  

However, I do have to admit that I made an overreaching assumption on what the intent of your analysis was, so my advice may be way off base. Just remember that, in statistics, what is technically correct and what you can get away with are often very different and the subject of much debate among statisticians. Because of this, "correct" methodology can be very confusing with much contradictory information.
0 Kudos
ArmaganKARABULUT
New Contributor
Dear All,

has anybody performed any map similarity works among different raster maps using ArcGIS?
Who has any idea or advise on this issue will be appreciated very much.
Thank you very much.
Armagan
0 Kudos