Hello Esri community,

I am a doctoral study at DU and I am working on a research project that includes data of Colorado's 178 school districts. Guided by theory, I have compiled a list of variables to run OLS and analyze how these variables interact with its neighboring school districts and predicts school district performance score.

Unfortunately, I found some of these variables being moderately/substantially skewed (not normally distributed) and hence need data transformation. I am struggling as these variables show a variety of skewness (moderate to substantial), are either positively/negatively skewed, includes some zero values and are in different formats – percentages, ratios, dollar amounts, count, and sum total. Due to such variability in my data, I am uncertain about which data log transformation would be most appropriate on each of these data types.

Any guidance would be very helpful.

Thanks.

Saj

More questions than answered

- what do the descriptive statistics show? and/or the spatial patterns?

- why are you needing to use OLS when there are non-parametric alternatives?

- are you just doing univariate or are you looking at multivariate descriptors

- If zero is a valid observation, then that will limit your transformations (assuming that transformations make sense)

- ratios, percentages and the like can be problematic (eg. spurious correlation and the fallacy of the ratio standard revisited)