I have a large group of gps points that was collected. I want to select all points that have a spar depth that is greater or less than 20% that of the points that came before or after it (based on the GISID).
GISID Spar Depth
2022 4.035
2023 4.066
2024 3.604
2025 3.576
2026 4.593
2027 2.690
2028 2.960
2029 4.59
import arcpy
counter =0
with arcpy.da.SearchCursor('GPS Gas Main Pipe Point',[ 'GISID', 'SPAR_DEPTH']) as cursor:
for row in cursor:
MSDepth = row[1]
if counter > 0:
diff = abs(MSDepth-oldMSDepth)
if diff > 20:
print('GISID {}'.format(row[0]))
oldMSDepth = row[1]
oldGISID = row[0]
counter+= 1
Or is the percentage difference to be 20% over 2023 and 2025....
I wondered about that trailing-leading difference thing, when it is more common to do differences across the key point (ie 2024)
import arcpy
counter = 0
FirstMSDepth = -1
InspectMSDepth = -1
NextMSDepth = -1
FirstGisID = -1
InspectGisID = -1
NextGisID = -1
thresHold = 0.2
gisIDArray = []
with arcpy.da.SearchCursor('GPS Gas Main Pipe Point',[ 'GISID', 'SPAR_DEPTH'], sql_clauses='ORDER BY GISID') as cursor:
for row in cursor:
currRate = 0
if counter > 1:
currRate = abs(FirstMSDepth-InspectMSDepth)/InspectMSDepth
if currRate < thresHold and counter > 0:
currRate = abs(NextMSDepth-InspectMSDepth)/InspectMSDepth
if currRate >= thresHold:
print("GISID {} - Rate {}".format(InspectGisID, currRate)
gisIDArray += [InspectGisID]
FirstMSDepth, FirstGisID = InspectMSDepth, InspectGisID
InspectMSDepth, InspectGisID = NextMSDepth, NextGisID
NextMSDepth, NextGisID = row[1], row[0]
counter+= 1
if counter > 1:
currRate = abs(FirstMSDepth-InspectMSDepth)/InspectMSDepth
if currRate < thresHold and counter > 0:
currRate = abs(NextMSDepth-InspectMSDepth)/InspectMSDepth
if currRate >= thresHold:
print("GISID {} - Rate {}".format(InspectGisID, currRate)
gisIDArray += [InspectGisID]
How do your results compare to those already posted? Are they the same? With the numpy solution the forward and backward looking are done in one pass
The difference is, that it does not cache data in memory. Loading all data first into arrays is often not possible for big data. So you need sequential algorithms or geoprocessing tools to solve the problem.
In addition it expands the original approach, which is probably better understood.
I have found that you can handle dictionaries with +2million records before you run out of memory.
If that happens, then switch to the 64bit version of python. 64bit is installed when you install the back ground geoprocessing module for Desktop
With numpy you don't need to load the whole set of data into memory either. I doubt that a searchcursor does a dice on the data before it begins to process.
The search cursor is designed for large datasets in a relational database and loads data sequentally buffered.
We do a lot of raster data processing with python/gdal/numpy (with 64bit-Python), and had to partition data by ourselves. Is there an other way? Do you know how to automatically process feature class data sequentally buffered with numpy arrays? That would be of great help. Your example just shows statical arrays, which would have to be populated in advance.
If you are doing a lot with python numpy and gdal, then *.npy/*.npz or netcdf would be the preferred storage and not a geodatabase anyway
You are right. It does not make sense, to store large raster datasets in geodatabases for raster processing, and we effectively do not. But we also process large vector dataset, and i used to work for a long time as a software engineer for ArcGIS network information systems in the utilities sector (where gas pipelines belong to).
Neverthless the memory-problem with numpy arrays stays the same for raster and vector data, and an out-of-the-box solution for partitioning seems not to exist.
By the way: Another difference with my solution is, that the database model of the original point feature class does not change (no fields added), which is often a requirement in a production environment.