Select to view content in your preferred language

Problem with Dummy Varaibles

794
2
04-07-2017 01:52 PM
ABDALLAMOHAMED
Occasional Contributor

Hello,

I have categorical variable named "ethnicity" with values Black and Latinos. I wanted to create  dummy variable  for each value. First, I created dummy variable for black in a new column and gave it the value of "1", and Latinos the value of "0".

Then I created another dummy variable for Latinos in a new column and gave it the value of "1", and black the value of "0".

So when I ran the Ordinary least square (Ordinary Least Squares (OLS)—Help | ArcGIS Desktop ) on these two dummy variables, I got the error of multicollinearity, it regarded the two variable redundant. Is there any way I can create the two dummy variables without having this error?? (I wanted to create them in Arcmap using field calculator)

Appreciate.

0 Kudos
2 Replies
DanPatterson_Retired
MVP Emeritus

When using categorical data (specifically nominal data with a binary representation), you don't use correlation nor regression.  They are not the correct statistical techniques and if you even managed to get a 'number' it would have no meaning. 

You should really be looking at non-parametric statistics, perhaps a simple Chi-square test or some test of association but definitely not a parametric correlation test.

Correlation and regression in your useage would require interval/ratio data and the tests themselves have whole set of underlying assumptions about the distribution of the data with respect to normality etc etc.

0 Kudos
ABDALLAMOHAMED
Occasional Contributor

Dan,

Thanks a lot for your response.  I have read the following statement about OLS from What they don't tell you about regression analysis—Help | ArcGIS Desktop 

"Do you see regional clusters, or can you recognize trends in your data? If so, creating a dummy variable to capture these regional differences may be effective. The classic example for a dummy variable is one that distinguishes urban and rural features. By assigning all rural features a value of 1 and all other features a value of 0, you may be able to capture spatial relationships in the landscape that could be important to your model."

 

So when I read this statement , I thought I could create a dummy variable for OLS

0 Kudos