Select to view content in your preferred language

Tabular selection doesn't equal spatial selection... Would appreciate input on possible solution

2842
3
09-28-2015 03:02 PM
MaribethMilner
Occasional Contributor

ArcGIS 10.3.1

Data Sets: Sub-Sahara African grids with resolution varying between ~250m (~260 million cells) to ~1km.

Objective: Using a set of rules... select research site data (~4200  points) based on the grid values at a user-generated point (script)

Objective2: Define an inference space for the user-generated point and estimate crop production (script)

Knowing I needed the inference space, I selected the research sites spatially.

Not knowing the computer power of potential users... I didn't run the analysis in memory ... so the analysis isn't quick. As a result... I was asked to select research sites by table.

Spatial Analysis Description:

Apply rules to grids (generates 0s (don't match criteria) and 1s (match criteria)); Sum grids (for a given cell: 0 (no grids match criteria) to 7 (all grids match criteria)); Convert cell value = 7 to a shapefile; Intersect shapefile with research site points.

Tabular Analysis:

Extract grid values for research site points (ExtractMultiValuesToPoints); Apply rules to grid values (TableSelect); Export to Excel file.

Problem:

Compared the spatial and tabular output at a point. Total points were nearly identical (391 for spatial; 392 for tabular), but the overlap was only ~75% (i.e. 91 spatially-selected points didn't match the tabular analysis - which is -  of course - the correct answer.

I didn't dice the rasters into identical sized / oriented cells due to file size considerations. So ArcGIS converted all the rasters to the coarsest resolution (1km) before summing the grids. Presumably this spatial "rounding" is the source of the error.

Possible Solutions:

a) I could dice all the grids to 250m, make them overlap and the mismatches would go away. Depending upon how far off the grids are relative to each other... shifting the cells could introduce significant error.

b) I could also create a set of points from the ~250m centroids (network points) and extract their associated grid values. For a given user-selected point, I would select network points using the tabular analysis described above and spatially identify the research sites that are associated with the selected network points [i.e. convert selected network points back to 250m raster cells, convert cells to shapefile and intersect shapefile with research site points].

As long as the 7 grids aren't aligned there will be mismatch errors. But the increased resolution of this analysis should decrease the number of mismatches.

Also... the productivity data could be extracted to the network points - eliminating the need to distribute productivity grids. But the user would still need the 7 grids to identify the grid values at the user-defined point.

c) Create a new ~250m integer grid from the network points. So each cell would have the 7 associated grid values. Then extract those 7 grid values to the user-identified point... which would eliminate mismatches.

Summary:

I'm leaning towards option "c"... but I'm concerned about efficiency and error generation.

Is point analysis ever preferable to raster analysis? (Predefining the relationship between the 7 grids at a series of points has to be worth some efficiency points.)

Are 7 single-valued grids more efficient to use than a single grid that has 7 values associated with each cell? The latter would likely require some sort of raster look-up function.

How would the raster look-up function analysis compare with the TableSelect analysis (with respect to efficiency)?

If I were to run the analysis in memory - would the above questions become moot?

0 Kudos
3 Replies
DanPatterson_Retired
MVP Emeritus

not sure I follow all but some points or points that need clarification

  • when creating rasters, particularly multiple rasters the following should be specified
    • cell size
    • extent
    • snap raster once one is created and others are being created
  • read the cavaets and warnings for Extract Multi Values to Points—Help | ArcGIS for Desktop
  • I don't see any advantage to Table Select—Help | ArcGIS for Desktop
  • I understand your file size problem issue and your desire to aggregate finer to coarser, but I am not clear what option you used for the aggregation (majority? the type of data... crop production? type? yield? nominal, ordinal, interval/ratio type)
  • starting to lose me on option c
  • do you have some visual(s) showing your situation
  • how is you python/numpy?

Any further information as you see fit.

PS

where exactly in ss-africa? when you mean production which producing group are you referring to? what types of crops? farmer selection is not necessarily done to maximize yield for example

0 Kudos
MaribethMilner
Occasional Contributor

Before responding to your points... let me clarify...

My code works... but I don't like the answer.

When I spatially identify the research sites that conform to a set of rules, ~25% of the selections don't meet the criteria. I know this because the selected table items contain the actual grid values and 25% of the field property values clearly don't match the criteria.

Because the file contains the actual grid values associated with each research site, I can apply the rules directly to the shapefile table. And when I do that... ~25% of the selections (though true) aren't selected when I do the spatial analysis.

Varying grid cell size is the primary source of the spatial selection error. ArcGIS rounds cells to the coarsest size / grid orientation before calculating the math (in this case... inference space rules).

I can resolve the problem by changing the cell size / grid orientation of all the grids to match that of the highest resolution cell. The ~1km cell would become 16 ~250m cells and the ~500m cell would become 4 ~250m cells. That's option a.

I can convert the highest resolution grid to points (file geodatabase), extract the 7 grid values associated with each point, and apply the inference space rules to the extracted values in the file geodatabase. The selected points represent the entire inference space. Then I'd need to use the selected points to identify the associated research site points. One approach: convert the selected points back into a ~250m grid, convert the grid to polygons and intersect the polygon with the research sites. That's option b.

Option c: Convert the highest resolution grid to points (file geodatabase), extract the 7 grid values associated with each point and convert those points into an integer grid. (The grid format might not be the best raster format for this approach due to the number of cells and table values.)

Crop production is only relevant in that I'll need an areal representation of all locations that meet the inference space criteria (whether or not there are research sites at those cells) in order to characterize the production potential of that area. "All locations" means all of Sub-Saharan Africa.

The table also contains crop x nutrient modeling coefficients developed from research site data. Some coefficients are based on a considerable amount of research. Others are not. The modeling coefficients will allow a researcher to locate potential undeveloped production areas.

As for your comments...

>The extract to point tool extracts a grid value for a point and creates a new point file. But the Extract Multi Values to Points tool modifies (i.e. adds the extracted point values) to the original point file.

However... I did run into a data corruption issue. I usually create point shape files from tab-delimited text files exported from Excel. The coefficient values are all floating point. But... if the first row of coefficients were integers ArcGIS turned all the coefficient data into integers. Now I'm creating point files from Excel files (no extra rows or columns) and I haven't run into any problems.

>TableSelect worked for me. Here's the code for selecting extracted dem values from the research site point file (Data9-28-15.shp) using the "dem" value associated with a user-selected point:

# ======= 6 of 7 =========

demmax = 5818

print "dem500 resolution ~500m; Range: -155 - 5818; Selected dem500 value: " + str(dem)

# Decrease amount to add (demAdd) if dem + 300 is undefined

if demmax - dem < 300:

    demAdd = demmax - dem

    ##print "Will add " + str(demAdd) + " instead of 300"

else:

    demAdd = 300

    ##print "Will add 300"

demGT = dem - 300

demLT = dem + demAdd

demField = arcpy.AddFieldDelimiters("Data9-28-15.shp","DEM500")

# Create rule-based where clause for TableSelect

if dem < 700:

    whereClause = "{0} <= {1}".format(demField,1000)

    print whereClause

else:

    whereClause = "{0} > {1} AND {0} < {2}".format(demField,demGT,demLT)

    print whereClause

# Select data and output to table

arcpy.TableSelect_analysis("D:/New/.../Out/sandout", "D:/New/.../Out/demout", whereClause)

My arcpy/Python background consists of the 2-3 hour ESRI online short courses (Python Basics & Python Scripting for ArcGIS), a book and a whole lot of internet searching. I don't know anything about python/numpy

I think I've addressed your other points.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Maribeth... I will digest this tonight.

My query about the coverage was dealing with  raster representation ... whether it was fairly continuous with few areas of nodata or spotty with data/cells with values clustered around points.  It has a lot to do with raster processing and efficiency (ergo my query about Numpy). Hence, are you trying to process the whole of SS_A as one unit or were you interested in solutions that would tile it into sub-regions.  No rush, this is obviously not one of those easy answer questions.  should need/want to take anything off-line you can email me an my Uni-email at Dan_Patterson @ Carleton.ca ... it is on my GeoNet profile if you forget

0 Kudos