PCA for categorical variables?

8627
8
02-01-2014 10:27 AM
MeganSebasky
New Contributor
Hello,

I was wondering if anyone knew whether the principal components function in Spatial Analyst was appropriate for use on categorical variables. Since my data are in rasters they will have to be assigned integers which may result in the variables being considered continuous. Does anyone know of a way to do PCA on categorical variables in rasters?

Thanks. Any advice is appreciated!
0 Kudos
8 Replies
DanPatterson_Retired
MVP Emeritus
http://en.wikipedia.org/wiki/Principal_component_analysis
No since categorical data is measured on a nominal scale meaning that the category spacing has no interval/ratio meaning
0 Kudos
RamB
by
Occasional Contributor III
Yes, it is possible. You must do non-linear PCA. Not sure if it exists in arcgis, but you can bring in python modules to do it.

In non-linear PCA you first make categorical variables into continuous variables and then do the same as PCA. So first you can solve your analysis in SPSS or R (or other software) then bring in those tables for doing a PCA in arcgis.

I did something similar many years ago.

regards,
0 Kudos
MeganSebasky
New Contributor
Thanks for the advice. I want to run the analysis in SAS, since I know you can run a PCA with categorical variables, but how would you import raster data into a statistical software? SAS takes ASCII files, but I don't know about spatial ASCII files.
0 Kudos
curtvprice
MVP Esteemed Contributor
Thanks for the advice. I want to run the analysis in SAS, since I know you can run a PCA with categorical variables, but how would you import raster data into a statistical software? SAS takes ASCII files, but I don't know about spatial ASCII files.


The easiest way to get your values to SAS would be to convert your raster to points with the Raster To Point tool, and then export the point table (or just the value field) to a text for dbf file.

Given this may be more data than you can handle (or you even need) you probably want to resample or aggregate your raster first to minimize the number of points. (It adds up fast, 1000 x 1000 -> 1e6 points.)

The PCA analysis in SAS will return factor weights which you would then apply to your data in ArcGIS using the Raster Calculator to transform your input rasters into PCA rasters. The categorical flavor of PCA will probably complicate the map algebra because there is a category -> value transformation in there (that one would hope, the SAS proc would report as well).
0 Kudos
MeganSebasky
New Contributor
The easiest way to get your values to SAS would be to convert your raster to points with the Raster To Point tool, and then export the point table (or just the value field) to a text for dbf file.

Given this may be more data than you can handle (or you even need) you probably want to resample or aggregate your raster first to minimize the number of points. (It adds up fast, 1000 x 1000 -> 1e6 points.)

The PCA analysis in SAS will return factor weights which you would then apply to your data in ArcGIS using the Raster Calculator to transform your input rasters into PCA rasters. The categorical flavor of PCA will probably complicate the map algebra because there is a category -> value transformation in there (that one would hope, the SAS proc would report as well).


Hi Curtis,

That makes sense - thanks so much!

I have another statistical question - if I am running a PCA to get a new raster that explains the variation in the others, and some of the others have smaller extents than the area I want, is it better to only use the area where I have data for all cells to run the PCA? I'm not sure how well PCA deals with missing data.
0 Kudos
RamB
by
Occasional Contributor III
SAS can read .asc files and shapefiles also and gird formats also. Open-source tools like R are more flexible. Once you import them to SAS, I am sure you can do non-linear PCA if such a thing exists in SAS (it exists in SPSS so I assume SAS has it too). I found something here
http://support.sas.com/documentation/cdl/en/apdatgis/65034/PDF/default/apdatgis.pdf

categorical values will not complicate the analysis as long as you understand the method and you accept its assumptions. In short, non-linear version of PCA first brings all categorical data to numeric format. you can stop here and bring data to arcgis PCA. Or you can finish till you get the factors and then come into arcgis.

regards,
0 Kudos
MeganSebasky
New Contributor
I can change my categorical data to numerical values in GIS (well, they are rasters so they already have numerical values), but I think there is a different method for running the PCA with categorical variables (even if they are numerical) that the PCA tool in Arc cannot account for (maybe this is the non-linear aspect?)
0 Kudos
RamB
by
Occasional Contributor III
Yes, I did not mean you change them to numerical values. 🙂 The non-linear PCA method changes them based on a established algorithms and taking cognizance of the other variables and their correlations and then reduces the dimensionality.

regards
0 Kudos