negative R2 in GWR

JochenAlbrecht · ‎06-23-2011

I have been struggling with my attempt to regress global greenhouse gas emissions on a bunch of social and environmental variables such as population density, GDP, distance to coast, elevation, heating degree days, and so on. With standard regression techniques I am getting rather low R-squares, and with GWR, I managed to raise it to some .25 using four variables (a brownie if you guess which ones). The results of my latest attempt threw me a bit though: I am now getting negative R-squares in the .3 range. Obviously something went wrong - but what? I am attaching a screen shot of the results window.
Btw, this is still exploratory. I will eventually move to spatial regression techniques but first wanted to get a feel for the spatial effects, which are far more local than I would have thought given that I am working with global data sets.
Cheers,
Jochen

AndrewTimleck · ‎06-28-2011

Hi Jochen,

First "hello from Baltimore" - Andr3w T1ml3ck here (was in your class at UMD with Maurice C.)...

Not sure if I can help at all, been messing with this stuff too....

In Fischer & Getis' Handbook of Applied Spatial Analysis (2010) Wheeler & Paez have a chapter on what works and what doesn't in GWR (see esp. pp 486-469). In it they note that bandwith and number of neighbors selection can be highly problematic - too many neighbors, too far a reach and of course you get no spatial variability. Too few neighbors, too close and you end up with spatial autocorrelation and you get wild, local, swings in regression coefficients - which sounds like what you're describing.

Cheers,
Andrew (Andy)

Clearly, if the bandwidth is such as to include a large
number of observations, there will be relatively little or no spatial variation in the
coefficients, and if the bandwidth is small, there will potentially be large amounts
of variation. A natural concern emerges that some variation or smoothness in the
pattern of estimated coefficients may be artificially introduced by the technique
and may not represent true regression effects. This situation is at the heart of the
discussion about the utility of GWR for inference on regression coefficients and is
not answered by existing statistical (Leung et al. 2000a) or Monte Carlo (Fother-
ingham et al. 2002) tests for significant variation of GWR coefficients because
these tests do not consider the source of the variation. This is important because
one source of regression coefficient variability in GWR can come from collinear-
ity, or dependence in the kernel-weighted design matrix. Collinearity is known in
linear models to inflate the variances of regression coefficients (Neter et al. 1996),
and GWR is no exception (Griffith 2008). Collinearity has been found in empiri-
cal work to be an issue in GWR models at the local level when it is not present in
the global linear regression model using the same data (Wheeler 2007). In addition
to large variation of estimated regression coefficients, there can be strong depend-
ence in GWR coefficients for different regression terms, including the intercept, at
least partly attributable to collinearity. Wheeler and Tiefelsdorf (2005) show in a
simulation study that while GWR coefficients can be correlated when there is no
explanatory variable correlation, the coefficient correlation increases systemati-
cally with increasingly more collinearity.