How can I aggregate incident points for OLS and GWR while preserving the detail of my explanatory variable dataset?

NC · ‎02-13-2020

Hi everyone,

I want to run an ordinary least squares regression in ArcGIS Pro on my dataset which is composed of about 400 fire occurrences (point data) and demographic variables (100x100m polygons of income, education, etc). I plan to use the fire occurrences for my dependant variable and the demographics for my explanatory variables.

The problem I have is with aggregating my fire incident point data. Essentially, I don't want to aggregate my incidents into larger polygons (large fishnet, census districts, postal districs, etc) and lose the accuracy of my 100x100m demographic variables. However, the dependent variable input for OLS requires variation in the weighted value.

I have tried the Integrate/Collect Events method of aggregation suggested in the help documentation (and ESRI seminar videos), however because I do not have so many coincident points I am finding myself with a low ICOUNT as a result (a maximum of 3 events aggregated).

Does anyone know of a method that I can both aggregate my points properly and preserve the high detail of my 100x100m demographics for my regression?

Thanks for your help,
Naomi

MervynLotter · ‎02-13-2020

You may want t try a slightly different approach, a logistic model rather that OLS. Although I have not tried it, you should be able to run the Generalized Linear Regression tool on binary variables using Generalized Linear Regression (GLR)—ArcGIS Pro | Documentation. When you run the tool, you can choose between using continuous, count, or binary variables.

NC · ‎02-14-2020

Thanks for your response! I'll give this a try and report back if it works for me.

Naomi

ClaudiaCaceres4 · ‎10-24-2020

Hi, would you please let me know how this worked for you? I am doing a similar analysis.

Thank you.

NaomiCrump · ‎12-28-2020

Hi again,

As an update on this post, I did end up using logistic regression instead of OLS. This allowed me to simply mark which 100x100 m grid cells had a presence of fire (1) and which had an absence (0). The proportion of fire presence to absence was quite low (500 fires to 15,000ish non fires) so I used undersampling, that is just modifying the data to include 500 fires and 500 non fires (randomly selected from the total non fires).

The Generalized Linear Regression tool in Arc does have a binary logistic regression tool, but I ended up using R so that I could have more freedom in my analysis. All in all, this method worked well for me, yielding around 75% accuracy (as well as other reasonable evaluation metrics such as AUC, kappa, F1, etc.).

It helped me a lot to look into peer reviewed articles with similar analyses like the below (sorry for citation styles) as well as the ESRI spatial stats talks. The book 'Applied Logistic Regression' by Hosmer and Lemeshow was also very useful. I hope this helps. /Naomi

Z.X. Zhang, H.Y. Zhang, D.W. Zhou,
Using GIS spatial analysis and logistic regression to predict the probabilities of human-caused grassland fires, Journal of Arid Environments

H. Zhang, X. Han and S. Dai, "Fire Occurrence Probability Mapping of Northeast China With Binary Logistic Regression Model," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 1, pp. 121-127, Feb. 2013, doi: 10.1109/JSTARS.2012.2236680.

https://www.esri.com/arcgis-blog/products/product/analytics/spatial-statistics-resources/