How can I compare one row of data to the row that came before and the row that came after it.

MeganWelch1 · ‎09-07-2018

I have a large group of gps points that was collected. I want to select all points that have a spar depth that is greater or less than 20% that of the points that came before or after it (based on the GISID).

GISID Spar Depth

2022 4.035

2023 4.066

2024 3.604

2025 3.576

2026 4.593

2027 2.690

2028 2.960

2029 4.59

import arcpy
counter =0
with arcpy.da.SearchCursor('GPS Gas Main Pipe Point',[ 'GISID', 'SPAR_DEPTH']) as cursor:
     for row in cursor:
         MSDepth = row[1]
         if counter > 0:
             diff = abs(MSDepth-oldMSDepth)
             if diff > 20:
                 print('GISID {}'.format(row[0]))
         oldMSDepth = row[1]
         oldGISID = row[0]
         counter+= 1

DanPatterson_Retired · ‎09-08-2018

Or is the percentage difference to be 20% over 2023 and 2025....

I wondered about that trailing-leading difference thing, when it is more common to do differences across the key point (ie 2024)

MarianneRohrbach · ‎09-12-2018

It is probably not possible to solve the problem in a single iteration. I therefore suggest to save selected gisids in a table and join it to the layer to select the affected points.

You could to the following based on your sequental approach without caching: (I didn't test it...)

import arcpy
counter = 0
FirstMSDepth = -1
InspectMSDepth = -1
NextMSDepth = -1
FirstGisID = -1
InspectGisID = -1
NextGisID = -1
thresHold = 0.2
gisIDArray = []

with arcpy.da.SearchCursor('GPS Gas Main Pipe Point',[ 'GISID', 'SPAR_DEPTH'], sql_clauses='ORDER BY GISID') as cursor:
     for row in cursor:
         currRate = 0
         if counter > 1:
            currRate = abs(FirstMSDepth-InspectMSDepth)/InspectMSDepth
         if currRate < thresHold and counter > 0:
            currRate = abs(NextMSDepth-InspectMSDepth)/InspectMSDepth
         if currRate >= thresHold:
            print("GISID {} - Rate {}".format(InspectGisID, currRate)
            gisIDArray += [InspectGisID]
         FirstMSDepth, FirstGisID = InspectMSDepth, InspectGisID 
         InspectMSDepth, InspectGisID = NextMSDepth, NextGisID
         NextMSDepth, NextGisID = row[1], row[0]
         counter+= 1

if counter > 1:
      currRate = abs(FirstMSDepth-InspectMSDepth)/InspectMSDepth
if currRate < thresHold and counter > 0:
      currRate = abs(NextMSDepth-InspectMSDepth)/InspectMSDepth
if currRate >= thresHold:
      print("GISID {} - Rate {}".format(InspectGisID, currRate)
      gisIDArray += [InspectGisID]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Since you need a selection, you would probably want to save the gisids in a table to join and select these records.

To join and select use GP-Tools:
- CreateTable_management to persist selected gisids
- Use
  for gisID in gisIDArray:
  .. insertcursor to append gisID to the table
AddIndex_management for the GisID-Fields where the join is based on
AddJoin_management to join the new GisID-Table to the original GPS Gas Main Pipe Point Layer with KEEP_COMMON
SelectLayerbyAttribute_management with NEW_SELECTION and e.g. OBJECTID >= 0 to select all
RemoveJoin_management

DanPatterson_Retired · ‎09-12-2018

How do your results compare to those already posted? Are they the same? With the numpy solution the forward and backward looking are done in one pass

MarianneRohrbach · ‎09-12-2018

The difference is, that it does not cache data in memory. Loading all data first into arrays is often not possible for big data. So you need sequential algorithms or geoprocessing tools to solve the problem.

In addition it expands the original approach, which is probably better understood.

NeilAyres · ‎09-12-2018

I have found that you can handle dictionaries with +2million records before you run out of memory.

If that happens, then switch to the 64bit version of python. 64bit is installed when you install the back ground geoprocessing module for Desktop

DanPatterson_Retired · ‎09-12-2018

With numpy you don't need to load the whole set of data into memory either. I doubt that a searchcursor does a dice on the data before it begins to process.

MarianneRohrbach · ‎09-12-2018

The search cursor is designed for large datasets in a relational database and loads data sequentally buffered.

We do a lot of raster data processing with python/gdal/numpy (with 64bit-Python), and had to partition data by ourselves. Is there an other way? Do you know how to automatically process feature class data sequentally buffered with numpy arrays? That would be of great help. Your example just shows statical arrays, which would have to be populated in advance.

DanPatterson_Retired · ‎09-12-2018

If you are doing a lot with python numpy and gdal, then *.npy/*.npz or netcdf would be the preferred storage and not a geodatabase anyway

MarianneRohrbach · ‎09-12-2018

You are right. It does not make sense, to store large raster datasets in geodatabases for raster processing, and we effectively do not. But we also process large vector dataset, and i used to work for a long time as a software engineer for ArcGIS network information systems in the utilities sector (where gas pipelines belong to).

Neverthless the memory-problem with numpy arrays stays the same for raster and vector data, and an out-of-the-box solution for partitioning seems not to exist.

By the way: Another difference with my solution is, that the database model of the original point feature class does not change (no fields added), which is often a requirement in a production environment.