GWR : accidents point and lit/unlit motorway segments.

ZAHIAKIKI · ‎05-31-2011

Any thoughts on how to accurately apply GWR based on point/segment data. I am looking into the correlation btwn road accident severity (pointd) and lit/unlit segments of a motorway.

Most of the reading showed GWR applied on areas and not segments.

Initial considerations:
- dependent variable: centroid of each segment with total crash data weight by severity.
- explanatory variables: accidents data variables (lit/unlit, weather, lanes, speed, gender...)

Could use any help and advice!
Much appreciated.

Zak

LaurenRosenshein · ‎06-14-2011

Hi Zahi,

This is a great question! There is absolutely no reason that you can't use GWR to analyze road segments. As long as each of the centroids that you're analyzing have all of the necessary dependent and explanatory variables, you can use OLS and GWR to analyze them. With all of the tools in the Spatial Statistics toolbox, points, lines and polygons can be used as the unit of analysis. For lines and polygons, most of the time the centroid of the feature is being used for analysis (for instance if a fixed distance band is used to determine spatial relationships, that distance is from the centroid of a line or polygon, not the edge). The major exception is when polygon contiguity is used.

That being said, its important to remember that you always want to start out your regression analysis by finding a properly specified model using OLS. Once you find a properly specified model using OLS you can feel confident in the results that you find using GWR. What does it mean to be properly specified? Well, there are several assumptions of OLS that need to be met, and they are all explained in this ArcUser article called Finding a Meaningful Model. There is also a Regression Analysis Tutorial, and a free training seminar called Regression Analysis Basics. There are a bunch of other resources about the spatial statistics also available here: http://esriurl.com/spatialstats.

Hope this helps.

Lauren Rosenshein
Geoprocessing Product Engineer

ZAHIAKIKI · ‎06-20-2011

Am not sure if there is a missing step in my process, hope someone picks it up!
"GWR" + "Moran I" are created in a model and all outputs are generated except the numerical output (shown in the training presentation). Although the generated tables present most of the parameters for the interpretation , some are missing, notable when looking for the "*" to confirm significance or VIF.

Another query: Dummy variables are acceptable in OLS but not in GWR. What if my main variable of interest is a Dummy variable? (Lit/Unlit)... I doesn't make sense to just excluded from the analysis in GWR !

Hope that was clear enough to get a reply!
TX!

JeffreyEvans · ‎06-20-2011

Sorry, but I must respectively disagree that this is a valid approach. It is not statistically justifiable to treat a line segment centroid as a spatial process. It is difficult enough to deal with linear dependencies in sampled point data (i.e., streams) but generalizing a linear dependency to a point centroid is just not supported. Perhaps if all of your segments were exactly the same length there are some statistical approaches that could be applied, but not GWR. GWR assumes a sample from a uniform random field and is not designed to account for the types of spatial structures inherent in linearly dependent data. How could a spatial weights matrix represent the data in a coherent way? The matrix would be derived from the spatial relationships of centroids, whereas the actual spatial process is being dictated by a distance-based, linearly ordered relationship. You could just run a OLS in a mixed model form. The random effect could be segment order or length of each line segment. This is a very a good problem for a graph theoretical approach. There is a commercial ArcGIS 9.3 package (SANET: http://sanet.csis.u-tokyo.ac.jp/) designed for spatial statistics on linear networks (specifically K and Cross-K functions, KNN, KDE and Voronoi analysis).

LaurenScott · ‎06-21-2011

Hi Zahi,
If you are using ArcGIS 10.0, you should be able to see the numerical output by:
1) Disabling background processing (click the Geoprocessing Menu, then Geoprocessing Options... UNcheck "Enable" for background processing). OR
2) Open the Results window (Geoprocessing Menu, then Results). You will see an entry for your model... open that, right click on Messages and select View.

The "*" to determine statistical significance isn't written to the coefficient or diagnostic tables (as you noticed), but you can easily interpret p-value significance as follows:
* P < 0.10 means statistically significant at the 90% confidence level (less conservative)
* P < 0.05 means statistically significant at the 95% confidence level
* P < 0.01 means statistically significant at the 99% confidence level (more conservative).
To learn more about interpreting z-scores and p-values, please see: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/What_is_a_z_score_What_is_a_p_value/00...

The VIF values don't reflect results that are either significant or not significant... rather the rule of thumb is that if a VIF value is larger than about 7.5 there are issues with variable redundancy (multicollinearity) that could potentially lead to model instability.

For more information about interpreting OLS diagnostics, please see:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Interpreting_OLS_results/005p000000300...

With regard to removing dummy variables when you move from OLS to GWR: this strong recommendation applies to spatial regime dummy variables where you have a bunch of 1's spatially clustered and/or a bunch of 0's spatially clustered. The reason this is a problem for GWR is what's called "local multicollinearity". GWR creates a separate equation for each feature and calibrates it (i.e., computes the coefficients) using nearby features (rather than using ALL features). When the values for a variable cluster spatially (e.g., all 1's) there is the potential that the nearby features used for calibrating an equation will have all the same values and this would result in perfect multicollinearity with the Intercept ...and GWR cannot solve (calibrate) in that situation. Even if there isn't perfect local multicollinearity, when there is very little variation in a variable's values, results can be unstable. You can tell if you are having this problem by looking at your output feature class from GWR. When the condition number for a feature is larger than about 30, there are issues with local multicollinearity and you have less confidence in the results associated with those features. I hope this is clear... if not, please let me know and I can try again 🙂

You also asked about the validity of using GWR for linear data. This is fine as long as you recognize that network relationships are not used to define which features are "nearby" or how much weighting a nearby feature has with regard to calibration. Remember, calibration of the equation associated with a particular feature is based on features that are "nearby"... the idea is that nearby features better reflect relationships between your dependent variable and the explanatory variables than features that are far away. "Nearby" is a function of your answers for the Kernel Type and Bandwidth Method parameters and we can talk more about that if you want, but the point is that distances to determine which features are nearby are computed using plain ol' Euclidean straight-line, as-the-crow-flies distance. With a road network, we might expect two points that are on the same street to be more alike (to deserve a larger weight/influence) than two points that are the same distance but on parallel streets... these type of network relationships won't be considered. So if you use GWR, you should ask yourself if including network relationships in the calibrations for each feature equation is important to the question you want to answer.

For other tools in the Spatial Statistics toolbox you can create spatial relationships base on a road network (Generate Network Spatial Weights tool), but unfortunately our GWR tool currently cannot take advantage of this option.

I hope this answers your questions!
Best wishes,
Lauren

Lauren M. Scott, PhD
Esri
Geoprocessing, Spatial Statistics