Select to view content in your preferred language

How can I compare one row of data to the row that came before and the row that came after it.

3234
18
09-07-2018 11:55 AM
MeganWelch1
New Contributor

I have a large group of gps points that was collected. I want to select all points that have a spar depth that is greater or less than 20% that of the points that came before or after it (based on the GISID).

GISID      Spar Depth

2022         4.035

2023         4.066   

2024         3.604

2025         3.576

2026         4.593

2027         2.690

2028         2.960

2029         4.59

import arcpy
counter =0
with arcpy.da.SearchCursor('GPS Gas Main Pipe Point',[ 'GISID', 'SPAR_DEPTH']) as cursor:
     for row in cursor:
         MSDepth = row[1]
         if counter > 0:
             diff = abs(MSDepth-oldMSDepth)
             if diff > 20:
                 print('GISID {}'.format(row[0]))
         oldMSDepth = row[1]
         oldGISID = row[0]
         counter+= 1

0 Kudos
18 Replies
DanPatterson_Retired
MVP Emeritus

Or is the percentage difference to be 20% over 2023 and 2025....

I wondered about that trailing-leading difference thing, when it is more common to do differences across the key point (ie 2024)

0 Kudos
MarianneRohrbach
Occasional Contributor
It is probably not possible to solve the problem in a single iteration. I therefore suggest to save selected gisids in a table and join it to the layer to select the affected points.
You could to the following based on your sequental approach without caching: (I didn't test it...)
import arcpy
counter = 0
FirstMSDepth = -1
InspectMSDepth = -1
NextMSDepth = -1
FirstGisID = -1
InspectGisID = -1
NextGisID = -1
thresHold = 0.2
gisIDArray = []

with arcpy.da.SearchCursor('GPS Gas Main Pipe Point',[ 'GISID', 'SPAR_DEPTH'], sql_clauses='ORDER BY GISID') as cursor:
     for row in cursor:
         currRate = 0
         if counter > 1:
            currRate = abs(FirstMSDepth-InspectMSDepth)/InspectMSDepth
         if currRate < thresHold and counter > 0:
            currRate = abs(NextMSDepth-InspectMSDepth)/InspectMSDepth
         if currRate >= thresHold:
            print("GISID {} - Rate {}".format(InspectGisID, currRate)
            gisIDArray += [InspectGisID]
         FirstMSDepth, FirstGisID = InspectMSDepth, InspectGisID 
         InspectMSDepth, InspectGisID = NextMSDepth, NextGisID
         NextMSDepth, NextGisID = row[1], row[0]
         counter+= 1

if counter > 1:
      currRate = abs(FirstMSDepth-InspectMSDepth)/InspectMSDepth
if currRate < thresHold and counter > 0:
      currRate = abs(NextMSDepth-InspectMSDepth)/InspectMSDepth
if currRate >= thresHold:
      print("GISID {} - Rate {}".format(InspectGisID, currRate)
      gisIDArray += [InspectGisID]
  
  
Since you need a selection, you would probably want to save the gisids in a table to join and select these records.
  • To join and select use GP-Tools:
    • CreateTable_management to persist selected gisids
    • Use
      for gisID in gisIDArray:
          .. insertcursor to append gisID to the table
  • AddIndex_management for the GisID-Fields where the join is based on
  • AddJoin_management to join the new GisID-Table to the original GPS Gas Main Pipe Point Layer with KEEP_COMMON
  • SelectLayerbyAttribute_management with NEW_SELECTION and e.g. OBJECTID >= 0 to select all
  • RemoveJoin_management
DanPatterson_Retired
MVP Emeritus

How do your results compare to those already posted? Are they the same? With the numpy solution the forward and backward looking are done in one pass

0 Kudos
MarianneRohrbach
Occasional Contributor

The difference is, that it does not cache data in memory. Loading all data first into arrays is often not possible for big data. So you need sequential algorithms or geoprocessing tools to solve the problem.

In addition it expands the original approach, which is probably better understood.

NeilAyres
MVP Alum

I have found that you can handle dictionaries with +2million records before you run out of memory.

If that happens, then switch to the 64bit version of python. 64bit is installed when you install the back ground geoprocessing module for Desktop

0 Kudos
DanPatterson_Retired
MVP Emeritus

With numpy you don't need to load the whole set of data into memory either.  I doubt that a searchcursor does a dice on the data before it begins to process.

0 Kudos
MarianneRohrbach
Occasional Contributor

The search cursor is designed for large datasets in a relational database and loads data sequentally buffered. 

We do a lot of raster data processing with python/gdal/numpy (with 64bit-Python), and had to partition data by ourselves. Is there an other way? Do you know how to automatically process feature class data sequentally buffered with numpy arrays? That would be of great help. Your example just shows statical arrays, which would have to be populated in advance.

DanPatterson_Retired
MVP Emeritus

If you are doing a lot with python numpy and gdal, then *.npy/*.npz or  netcdf would be the preferred storage and not a geodatabase anyway

0 Kudos
MarianneRohrbach
Occasional Contributor

You are right. It does not make sense, to store large raster datasets in geodatabases for raster processing, and we effectively do not. But we also process large vector dataset, and i used to work for a long time as a software engineer for ArcGIS network information systems in the utilities sector (where gas pipelines belong to).

Neverthless the memory-problem with numpy arrays stays the same for raster and vector data, and an out-of-the-box solution for partitioning seems not to exist.

By the way: Another difference with my solution is, that the database model of the original point feature class does not change (no fields added), which is often a requirement in a production environment.