How to handle data related problems in OLS and GWR (ArcGIS)?

857
2
10-22-2017 12:07 AM
ArjumandZaidi
New Contributor

My study is all about evaluating the association between dengue incidences and weather parameters, land use/cover, and demographic characteristics of the study area. I am using ArcGIS regression tools OLS and GWR. The study area is divided in 150 polygons for which average values of explanatory variables are calculated using zonal statistics. There are 20 polygons where no dengue case is reported. My first question is; would it be okay if I remove these polygons from analysis since the predictive model is giving negative results in GWR model using predicted values of explanatory variables. Otherwise, is there any justification for negative dengue cases?
Also, the histograms of the variables show a non-normal distribution. When I use natural logarithm transformation, the variables look near to normal distribution (although some are still skewed). But the regression results of OLS deteriorated badly (very low squared r value). The GWR model failed to run as well. My second question is that can I still go for non-transformed variables and would that be fine if I discuss in the paper all these trails or simply omit the transformation exercise since it is not giving any result (and I think not a requirement for regression analysis).
One of my reviewers suggested to use Poisson's regression which is considered a better model for count variables as dengue cases. The scope of my study was to evaluate the spatial association between dengue and other parameters using ArcGIS. Can I perform the suggested analysis in ArcGIS (I could not find it)? If not, would stating the scope as a reason of not employing Poisson's model be a reasonable response?

0 Kudos
2 Replies
DanPatterson_Retired
MVP Emeritus
  • Why do you feel that regression is the best tool? Are you trying to predict? interpolate? or are you just looking for an association, specifically a spatial association?  If it is the latter, examine some of the other tools in the Geostatistical Analyst that are better suited to that case
  • You don't dump data, period.  Perhaps it those polygons with no occurrence that are the real locations of interest.
  • Not all data can be transformed to a normal distribution.  And the transform that is used needs to make sense.  If the data remain non-normal, then move on to non-parametric statistical tests that don't have the normalcy requirement.  I would examine the distribution of the data itself.  Perhaps you don't have one sample population from which your observations are drawn. There may be inherent attribute and/or spatial distributions that make them different, so combining them rather than stratifying them would be an oversight.  This goes back to my point about not dumping the polygons that have no observations... they may be a different population.
  • The last sentence about not using Poisson (which may be appropriate).  Your reason for not using it isn't good.  You shouldn't ignore other possibilities because you have chosen a path of investigation and a particular set of tools to use.  Those are things that need investigation.  The variables you are using, their distributions, what are the appropriate statistical tests to use and whether your distributions allow for their use.

Good luck

0 Kudos
DanPatterson_Retired
MVP Emeritus
0 Kudos