POST
|
Almost exactly right! Humor me for a minute, I'm going to backtrack just a bit. Hot Spot Analysis and Emerging Hot Spot Analysis both run the Gi* Statistic under the hood. Conceptually, the Gi* statistic works by computing the mean value for a feature and its neighbors, and comparing that local mean to the global mean (the mean value for ALL features in the dataset). Then, taking the number of features and variance into account, it decides if the local mean is different enough from the global mean to be a statistically significant hot or cold spot. For Emerging Hot Spot Analysis, neighbors can be defined to include temporal AND spatial neighbors, as you know. Let's consider an easy one: you define neighbors to be the 8 closest spatial neighbors and 1 time step. The local neighborhood would be 9 features from the same time step (the feature itself plus the 8 nearest neighboring features) and 9 features for the preceding time step, for a total of 18 neighbors. If you choose 2 for the Neighborhood Time Step parameter, there will be a total of 27 features in the neighborhood for each feature in the dataset. Note: the Gi* analysis is performed for every feature at every time step. Now on to your question: If you choose Entire Cube to decide if the local feature and its neighbors at a particular location in the cube is a statistically significant hot or cold spot, the tool will compare the local mean to the mean for all features in the cube. You did understand that one well! 🙂 If you choose Neighborhood Time Step (let's assume you typed 2 for Neighborhood Time Step), the tool will compare a feature and its 27 neighbors to the mean for: all features in the same time step along with all features in the preceding 2 time steps. This is a good option if you have strong trends in the data (like with Covid, fewer cases at the beginning, more at the end of the time period you're analyzing). So that's the one you didn't quite get right. The number of time steps used to compute the Neighborhood ("global") mean will match whatever you typed for the Neighborhood Time Step parameter (it wouldn't necessarily be TWO). Last is the Individual Time Step. This option is like taking a snap shot at each time step to compute the individual ("global") mean. The mean for each feature and its 27 neighbors (assuming you typed 2 for the Neighborhood Time Step parameter) will be compared to the mean for all features in the same time step. There is only one time step used to compute the "global" mean and it's based on all features in the current time step (the time step matching the feature being analyzed). This is often a good option if you are making decisions about how to react today (assuming the final time step is today) to address the emerging trends. I hope this is clearer. If not, please ask again. Best wishes, Lauren
... View more
11-21-2023
10:47 AM
|
1
|
0
|
273
|
POST
|
What version of the software are you using? (It shouldn't matter, but I can try to reproduce what you're seeing). This is what I'm seeing: In the Geoprocessing pane, I select Toolboxes. In The Spatial Statistics toolbox under Mapping Clusters, I right click on the Cluster and Outlier Analysis tool, select Edit, and up pops the script. Another way to find it is to navigate to: <your ArcGIS folder, where ever you installed the software... like maybe in ProgramFiles\ArcGIS> then navigate to: \Pro\Resources\ArcToolBox\Scripts and view the LocalMorans.py python script file. For example, you should be able to right click that file and choose open with <Notepad or Wordpad, for example> I hope this works for you! Best wishes, Lauren
... View more
01-26-2023
12:34 PM
|
0
|
1
|
411
|
POST
|
Sure. Please contact me at LGriffin@esri.com and we can chat about it. If we figure out anything that may be helpful to others, I’ll ask you to please post what we learn to the community 😊 Lauren
... View more
10-18-2022
02:20 PM
|
0
|
1
|
1172
|
POST
|
This is a great question! Thanks for asking it. Sometimes I think these tools should be called hot regions, rather than hot spots. Conceptually, they work by visiting each bin in the cube and computing the mean value for that bin and all it's space-time neighbors. It then compares that local mean to the global mean (the mean for all bins in the cube since you used Entire Cube for the Global Window parameter). So even if there aren't any points in a New Hot Spot bin, if the local mean for that bin and it's space-time neighbors is hot, you can still have a new hot spot. Let me try to put it into the context of your analysis. It looks like you used Create Space Time Cube by Aggregating Points. You said you wanted each fishnet bin to be about 1/4 square mile by 4 weeks. Great. Then you ran Emerging Hot Spot Analysis and defined the spatial-temporal neighborhood to extend 1/2 mile around each bin and one time step previous (so the current time step plus the previous time step... encompassing 8 weeks). Several bins in your graphic are sporadic hot spots, so sometimes the mean for those bins and their space time neighbors were hot spots and sometimes they weren't, but they WERE hot for the last (most recent) time step. One of the bins is a consecutive hot spot, so for the last time step and at least one other immediately previous time step (fewer than 90%, though), it was a statistically significant hot spot (the local mean was significantly higher than the global mean). For the new hot spot bins, only the last (most recent) time step is hot. Even though there aren't any points in those bins (according to the graphic), the local neighborhoods have points that apparently make those bins hot. Maybe look at the 3D version of the cube columns. I don't know what the points represent, but the pattern might suggest a spreading process to the east ?? Also, I'm not sure if your image is showing ALL points in one part of your study area or just points for the last 4 weeks. If it is ALL points, you will clearly have a LOT of zero bins in your cube. In that case, the global mean is zero and it doesn't take many points in a neighborhood to create a hot spot (since finding a point is soooo very rare). My own feeling is that a ton of zeros (the vast majority) makes the analysis unstable. Perhaps you can increase bin size so you have fewer zero bins? I'm not sure what your points represent but I might have other ideas if indeed you are dealing with a cube that is almost entirely zeros. I hope this helps! Thanks again for your question, Stephen! Lauren Griffin, Esri
... View more
10-18-2022
12:04 PM
|
1
|
3
|
1188
|
POST
|
Hi Christine, I don't think the problem is the date format. I think the problem is that you have incomplete data. Check to make sure you have a data value for every month, for every well. If you are missing just a few data entries here and there, the Fill Missing Values tool may be able to estimate the missing data. I'm pretty sure (not positive) you would still need, at minimum, a data entry for the first and last month for every well. I hope this helps! Lauren
... View more
08-10-2022
12:13 PM
|
0
|
0
|
862
|
POST
|
If you have leading zeros, your ID field is type text and, sorry, but you won't be able to keep the leading zeros when you convert the field to Integer. If the ID field isn't too long (and all the values are numbers), you should be able to run Calculate Field to create a new field of type LONG; just use the current ID field as the expression (Calculate Field will do the conversion from text to integer for you). Best wishes! Lauren BTW, just in case it's helpful, check out this learn lesson
... View more
08-01-2022
08:41 AM
|
0
|
1
|
1199
|
POST
|
Only because this has happened to me, make sure your Location ID fields are numeric (rather than text fields). If that's not the issue and you can send me your data, please contact me at LGriffin@esri.com and I'm happy to explore further. Best wishes, Lauren Griffin, Esri
... View more
07-31-2022
01:10 PM
|
0
|
3
|
1207
|
POST
|
Hi again, The EMERGING_{ANALYSIS_VARIABLE}_HS_BIN variable is one of 7 values for each bin in the Space Time Cube and can be visualized in 3D: 3: Statistically significant clustering of high/large values at the 0.01 level 2: Statistically significant clustering of high/large values at the 0.05 level 1: Statistically significant clustering of high/large values at the 0.10 level 0: no statistically significant clustering -1: Statistically significant clustering of low/small values at the 0.10 level -2: Statistically significant clustering of low/small values at the 0.05 level -3: Statistically significant clustering of low/small values at the 0.01 level That variable does not include the Mann-Kendal trend information. The 2D output layer from Emerging Hot Spot Analysis (EHSA) shows the patterns and encapsulates the hot spot and trend information. To me, it is difficult to draw conclusions from 3D maps (unless your data set is very small). The 2D maps (like the default output from the EHSA tool) seem more useful, but it really very much depends on the questions you're hoping to answer with your analysis. If you haven't seen it already, consider watching this video, (at about 5:41 I give some examples of interpreting the EHSA results). The entire learning path might be interesting to you, though (my colleague Kevin provides some great examples of Time Series Clustering and Forecasting). Best wishes! Lauren Griffin, Esri
... View more
07-18-2022
12:38 PM
|
1
|
1
|
1217
|
POST
|
Your interpretation of the Gi* z-scores is exactly right! Technically, though, you could have a z-score of 5.0 for time period N-2, 4.5 for time period N-1, and 4.0 for time period N (where N is the total number of time steps in the cube). All of those z-scores would be statistically significant at the 0.01 level (ignoring the FDR correction for simplicity), but the pattern would not be an intensifying hot spot because the z-scores are getting smaller. Even if the z-scores are increasing, Mann-Kendal would be able to sort out if the increases are significant (little tiny increases probably wouldn't be considered "intensifying", but rather "persistent", and Mann-Kendal considers all the values in the column, not just the last couple ones). If you look at the tool in the Geoprocessing pane toolbox, you can right click on it and see the source code. I can try to find the section of the code that provides the exact descriptions for each of the patterns, or you can see if you can find it. Let me know if you want additional information. Again, thank you so much for pointing out the need to improve our documentation on the Emerging Hot Spot Analysis tool outputs! An issue has been created, so better documentation will be available in a future software release. Best wishes, Lauren
... View more
07-18-2022
10:36 AM
|
0
|
3
|
1219
|
POST
|
I agree it's a shame that the chart isn't generated automatically. Fortunately it's quite easy to create the chart from the output table. Right-click the output table and choose to create a line chart. I hope this helps! Best wishes, Lauren Griffin, Esri
... View more
07-15-2022
05:03 PM
|
0
|
0
|
928
|
POST
|
Hi! Sorry for the delay in our reply. Both @LynneBuie and I were swamped with the Esri User Conference this week. But I'm so glad you brought this to our attention. The documentation is not clear at all, as you mention. The table, here, does indeed correctly document the fields that are added to the Input Space Time Cube (STC) after you run Emerging Hot Spot Analysis (EHSA), but it doesn't document how to interpret the fields that appear in the EHSA output layer results (as you found). Hopefully, this will help: The EHSA tool runs Gi*, comparing each bin and its space-time neighbors to the Global Window (Entire Cube, Neighborhood Time Step, or Individual Time Step) to determine if the local values are significantly larger (hot spot) or significantly lower (cold spot). Once all bins in the cube have been assessed (for the categories shown in the graphic above), it runs Mann-Kendal on the Gi* Z-Score values to get the trend for each location (column). The PATTERN field in the EHSA output layer is one of potentially 17 different categories documented here. If you read the description for each category, you'll notice that to classify each location, the tool needs to know the count and percentage of significant bins and if they are hot or cold... and if they are at the top of the cube (most recent time steps) or not... And for most of the patterns this is enough, but to determine if the hot bins are intensifying or the cold bins are diminishing it computes Mann-Kendall on the Gi* z-score values. The TREND_Z, TREND_P, and TREND_BIN values in the EHSA output layer are those Mann-Kendall results. If that doesn't answer your first question, please ask again and I'll do my best to clarify. For your second question, you want to visualize the EMERGING_{ANALYSIS_VARIABLE}_HS_BIN. The 3rd column of the variable table indicates that particular variable is available in 3D. To visualize it, there are two steps: 1) Insert a NEW SCENE (select New Global Scene if it makes sense to show the curvature of the earth, otherwise select New Local Scene). 2) Run the Visualize Space Time Cube in 3D tool. Then you can navigate around the 3D scene with your mouse. I'm not sure if this will be helpful to you or not (it's NOT based on raster input), but @KevinButler-Analysis and I created a Learning Path for a workshop we did for UCGIS. I hope this answers your questions. If not, please let me know and I'm happy to try again. Best wishes, Lauren Griffin, Esri
... View more
07-15-2022
04:52 PM
|
0
|
5
|
1229
|
POST
|
Check out this video on Emerging Hot Spot Analysis. It visually describes the differences between each of the Global Window choices, and also discusses the differences in the results (and when one option might be better than another). https://www.youtube.com/watch?v=9VDRYBvOoDI&list=PLGZUzt4E4O2LuV0vuH74WN6j9nxv0jUty&index=5 It's one of 5 videos associated with this learning path: https://learn.arcgis.com/en/paths/spatio-temporal-analysis-of-covid-19-daily-confirmed-cases/ I hope this helps! Lauren
... View more
07-15-2022
01:32 PM
|
0
|
0
|
611
|
POST
|
Hi, If you choose Entire Cube, the tool conceptually compares the mean of each bin in the space time cube to the mean for all bins in the space time cube and determines if the means are significantly different (it's a bit more complicated than that, taking the number of features and variance into account, not just the mean). Check out this video, especially beginning at 5:37. https://www.youtube.com/watch?v=9VDRYBvOoDI&list=PLGZUzt4E4O2LuV0vuH74WN6j9nxv0jUty&index=5 That video is part of a learning path about space time analysis, in case it's useful: https://learn.arcgis.com/en/paths/spatio-temporal-analysis-of-covid-19-daily-confirmed-cases/ Best wishes, Lauren
... View more
05-09-2022
07:49 AM
|
1
|
2
|
748
|
POST
|
Is your data associated with any geometry? If not, you can download the state polygons from the Living Atlas and join the table data to it. Once your data is associated with polygons, those tools should work well. If your data IS associated with polygons and you're getting those errors, try using repair geometry ? Otherwise, copy the layer, remove all the fields, add a numeric field (add field, then calculate the field to match the object ID field, for example), then see if you still get errors running those tools on the new field. If you do, the issue is your geometry. Consider downloading the state polygons from Living Atlas instead of using your current geometry (if you need points instead of polygons, you can use the feature to point tool). If removing the fields corrects the problem, the issue is with one of your fields. See if you can pinpoint which field(s) are creating the problem. I hope this helps! Lauren
... View more
03-02-2022
10:08 AM
|
1
|
2
|
315
|
POST
|
I think you have a couple options. The first thing you’ll want to do is clarify the question you’re trying to answer. Which census tracts, with at least N of “our health plan members”, are part of a statistically significant cluster of high prevalence (a hot spot)? If that’s your question, you’ll want to begin by removing all census tracts with fewer than N members from your analysis. If the tract denominator reflecting number of members is larger than your threshold (N), and your numerator reflecting the number of cases is 0, your prevalence rate is zero (which is accurate and valid). Keep in mind that hot spot analysis looks at each tract within the context of neighboring tracts. If many tracts won’t have neighbors because they don’t have at least N members, then ask yourself if you’re really looking for clusters of high prevalence after all ?? Are you trying to determine which tracts have higher than expected prevalence (if so, just map prevalence, but also see the disparity index suggestion below) or do you want to know where tracts with high prevalence cluster spatially (and where that clustering is statistically significant)? Hot spot analysis will show you statistically significant regions of high prevalence. I’m thinking that mixing tracts WITH members, and tracts with NO members will complicate your analysis… you wouldn’t know for sure if a cold spot was cold because of clustering of low prevalence or because of clustering of low membership (a cluster of zeros because there aren’t N members), for example. You could, however, aggregate census tracts so that all your polygons (tracts or groups of tracts) have at least N members. Here is a case study that provides a workflow that might help you do that: https://desktop.arcgis.com/en/analytics/case-studies/linguistic-diversity-1-intro.htm I’m thinking the best solution, however, might be to compute disparity indices. The disparity indices would identify where the disease was not distributed “fairly”/evenly based on health plan membership. You could then run hot spot analysis on the disparity indices if you choose to. Computing the disparity indices addresses 2 problems with rates: division by zero, and small numbers problem (a tract has 2 people, one gets the disease, so the rate is 50%, yikes!). Running hot spot analysis on the disparity indices addresses a third problem: the artificial nature of tract boundaries in relation to disease cases. [These three issues with rates are discussed here: https://desktop.arcgis.com/en/analytics/case-studies/locating-a-new-retirement-community.htm. That case study refers to the disparity index as “Level of Service”, but it’s the same thing. In this other learn lesson, the disparity index is used to see how equitably trees are distributed across race/ethnicity and susceptible populations: https://learn.arcgis.com/en/projects/shade-equity-determine-tree-planting-locations-with-suitability-analysis/ ] Oh, and if you decide to go the disparity index route, you can use all your tracts, even those with 0 or only a few members. Basically, the disparity index expects a census tract with 2% of all your health plan members to be associated with 2% of all the cases. The formula is this: For each tract compute: (Ci / All Cases) – (Mi /All Members) Where Ci is the number of cases in the tract, and Mi is the number of members in the tract. All Cases is the sum of cases for all tracts. All Members is the sum of members for all tracts. A positive result means the proportion of cases is higher than the proportion of members (so a higher-than-expected rate/prevalence). A negative result is a lower-than-expected proportion of cases. When the case proportion matches the member proportion (the expectation), the result is zero. When you run hot spot analysis on the indices, you’ll see hot spots in locations where positive indices cluster and cold spots where negative indices cluster. I hope this helps, or at least gives you some ideas for other options. Best wishes! Lauren
... View more
12-21-2021
05:43 PM
|
1
|
0
|
784
|
Title | Kudos | Posted |
---|---|---|
1 | 11-21-2023 10:47 AM | |
1 | 05-09-2022 07:49 AM | |
1 | 07-18-2022 12:38 PM | |
1 | 10-18-2022 12:04 PM | |
1 | 03-02-2022 10:08 AM |
Online Status |
Offline
|
Date Last Visited |
12-06-2023
05:09 PM
|