Hello,
I have categorical variable named "ethnicity" with values Black and Latinos. I wanted to create dummy variable for each value. First, I created dummy variable for black in a new column and gave it the value of "1", and Latinos the value of "0".
Then I created another dummy variable for Latinos in a new column and gave it the value of "1", and black the value of "0".
So when I ran the Ordinary least square (Ordinary Least Squares (OLS)—Help | ArcGIS Desktop ) on these two dummy variables, I got the error of multicollinearity, it regarded the two variable redundant. Is there any way I can create the two dummy variables without having this error?? (I wanted to create them in Arcmap using field calculator)
Appreciate.
When using categorical data (specifically nominal data with a binary representation), you don't use correlation nor regression. They are not the correct statistical techniques and if you even managed to get a 'number' it would have no meaning.
You should really be looking at non-parametric statistics, perhaps a simple Chi-square test or some test of association but definitely not a parametric correlation test.
Correlation and regression in your useage would require interval/ratio data and the tests themselves have whole set of underlying assumptions about the distribution of the data with respect to normality etc etc.
Dan,
Thanks a lot for your response. I have read the following statement about OLS from What they don't tell you about regression analysis—Help | ArcGIS Desktop
"Do you see regional clusters, or can you recognize trends in your data? If so, creating a dummy variable to capture these regional differences may be effective. The classic example for a dummy variable is one that distinguishes urban and rural features. By assigning all rural features a value of 1 and all other features a value of 0, you may be able to capture spatial relationships in the landscape that could be important to your model."
So when I read this statement , I thought I could create a dummy variable for OLS