Select to view content in your preferred language

Database is stale after deleted table data using Pro UI and fetching using SearchCursor

284
20
Tuesday
chris_del101
Regular Contributor

`arcpy.da.SearchCursor` is returning unpredicable results. 

I have an empty table in pro. I just deleted it and saved it, so I know it's empty. 
When I then `SearchCursor` on that table it returns results! But the table is empty! 

cur = arcpy.da.SearchCursor(full_table_path, field_names)
for c in cur:
    print(c)

I wanted to use this function to check for duplicates before inserting, but it seems highly unstable. Is there any way to make it pick up recent changes?
Whatever the issue, if the table is not actually empty but just shows empty, or if it's the function, I just need it to work accurately.
It's a file table, so maybe this is why?

0 Kudos
20 Replies
TonyAlmeida
MVP Regular Contributor

Try this, always use the cursor within a WITH statement to ensure proper cleanup.

with arcpy.da.SearchCursor(full_table_path, field_names) as cur:
    for row in cur:
        print(row) 

 

HaydenWelch
MVP Regular Contributor

As Tony said, you always want to use the context manager protocol when it's implemented (with Foo() as bar : ...).

Cursors will maintain a lock on the table until they're deleted (with del) which happens automatically once you exit the with block.

They can also be confusing if you are using different inputs. Say a layer with a selection or a definition query instead of a raw table/feature resource.

Cursors that are passed a layer object with implicitly respect any selections and queries you have on that layer, but if you pass it the raw data source, they won't respect those selections and queries.

It's good practice to be very clear what you're passing into the Cursor. I usually do this by extracting the selection set from the layer and populating the "where_clause" argument of the initializer with a SQL statement "OBJECTID IN (<selection_set>)" as this lets me more easily debug what the Cursor is seeing.

RPGIS
by MVP Regular Contributor
MVP Regular Contributor

I agree with both @TonyAlmeida and@HaydenWelch. Using the 'with' statement is general best practice when it comes to using any cursor. Like @HaydenWelch said, the 'with' statement ensures the object deletes after the cursor finishes executing. The other thing you can include are break conditions to stop the execution earlier or pass conditions to skip over rows.

HaydenWelch
MVP Regular Contributor

I'm gonna make a "Comprehending Cursors in Context" Blog post soon to try and document more advanced Cursor patterns. My favorite way to use a SearchCursor is in a comprehension because it immediately de-structures the data into whatever Python object you want and doesn't store a ref so it's deleted as soon as the comprehension runs.

RPGIS
by MVP Regular Contributor
MVP Regular Contributor

I think that is a great idea. Let me know if you are open to making it a joint blog or allowing some input from others to add to it. I often use the comprehension to create dictionaries while simultaneously having another dictionary to field map different schemas on the fly. That significantly reduces the amount of effort when it comes to migrating data or running specific analyses.

HaydenWelch
MVP Regular Contributor

I'll set up a repo on GitHub, DM me your username and I'll add you as a collaborator!

ShariF
by
Regular Contributor

I find cursors to be very confounding although I read they can be very powerful. Here's a use case. We have data we pull from a 3rd party app, do GIS magic, and put it back up into our geodatabase daily. We do this for both code violations and mapping certificates of occupancy for a local municipality.  Currently the workflow using a python script is:  pull data from a 3rd party app, do GIS magic, delete all (rows) of data in geodatabase (this can be over 20K records at a time)  and then append the new data, even though we KNOW some of that data hasn't changed. It would be great (and more efficient and better coding) to use an update cursor to update the values that have changed and then an insert cursor for those items that are new. 

 

HaydenWelch
MVP Regular Contributor

Here's a pretty lightweight implementation of a TableUpdater class that you could use. It leverages the 'EDITED@' token to compare update timestamps and then applies updates to existing rows with an update cursor and inserts rows that don't map to the target.

from datetime import datetime

from arcpy import (
    Exists, 
    Describe,
    Geometry,
)

from typing import (
    Generator,
    Any,
)

from arcpy.da import (
    Editor,
    SearchCursor,
    UpdateCursor,
    InsertCursor,
)

import arcpy.typing.describe as typedesc

from dataclasses import dataclass

@dataclass
class Changes:
    updates: list[int]
    inserts: list[int]
    deletes: list[int]
           
class TableUpdater: 
    
    Row = tuple[int, Geometry, datetime] # Type first 3 values of Row records
     
    def __init__(self, source_table: str, target_table: str) -> None:
        if not Exists(source_table) and Exists(target_table):
            raise ValueError('Invalid table paths!')
        
        self.target_table = target_table
        self.target_desc: typedesc.FeatureClass = Describe(target_table)
        
        self.source_table = source_table
        self.source_desc: typedesc.FeatureClass = Describe(source_table)
        
        # Do some filering to make sure we can use OID@ and SHAPE@ for the table cursors
        # By default a cursor will return SHAPE@XY
        self.source_shape_field = self.source_desc.shapeFieldName
        self.source_oid_field = self.source_desc.OIDFieldName
        self.source_fields = [field.name for field in self.source_desc.fields]
        
        self.target_shape_field = self.target_desc.shapeFieldName
        self.target_oid_field = self.target_desc.OIDFieldName
        self.target_fields = [field.name for field in self.target_desc.fields]
        
        self.target_editor = Editor(self.target_desc.workspace.catalogPath)
                
        self.fields = ['OID@', 'SHAPE@', 'EDITED@'] + [
            field
            for field in 
            set(self.source_fields) | set(self.target_fields)
            if field not in [self.source_oid_field, self.target_oid_field, self.source_shape_field, self.target_shape_field]
        ]
        
        self._table_diff = None
        
    @property
    def source_state(self):
        return self._gather_rows(self.source_table)
    
    @property
    def target_state(self):
        return self._gather_rows(self.target_table)
    
    @property
    def table_diff(self):
        if not self._table_diff:
            self._table_diff = self._get_table_diff()
        return self._table_diff
    
    def _gather_rows(self, table: str) -> dict[int, str]:
        """Gather all rows and their last updated times
        
        Parameters:
            table (str): Path to the table that needs updating
            
        Returns:
            ( dict[int, str] ): Mapping of OID to updated_at string
        """
        return {oid: updated_at for oid, updated_at in SearchCursor(table, ['OID@', 'EDITED@'])}
    
    def _get_table_diff(self) -> Changes:
        """Filter out records from the new_table that are unchanged from the old_table
        
        Returns:
            a list of OIDs in the new_table that need to be 
        """
        old_rows = self.target_state
        new_rows = self.source_state
        return Changes(
            updates=list(oid for oid in new_rows if oid in old_rows and old_rows[oid] < new_rows[oid]), 
            inserts=list(new_rows.keys() - old_rows.keys()), 
            deletes=list(old_rows.keys() - new_rows.keys()),
        )
    
    # Change Detection methods
    def _inserts(self) -> list[tuple[Any, ...]] | None:
        inserts = self.table_diff.inserts
        if inserts:
            if len(inserts) == 1:
                where_clause = f"{self.source_oid_field} = {inserts[0]}"
            else:    
                where_clause = f"{self.source_oid_field} IN ({','.join(map(str, inserts))})"
                
            return [row for row in SearchCursor(self.source_table, self.fields, where_clause=where_clause)]
    
    def _updates(self) -> dict[int, tuple[Any, ...]] | None:
        updates = self.table_diff.updates
        if updates:
            if len(updates) == 1:
                where_clause=f"{self.source_oid_field} = ({updates[0]})"
            else:
                where_clause=f"{self.source_oid_field} IN ({','.join(map(str, updates))})"
                
            return {
                row[0]: row
                for row in SearchCursor(self.source_table, self.fields, where_clause=where_clause)
            }
    
    def _deletes(self) -> list[int] | None:
        return self.table_diff.deletes or None
    
    def apply_changes(self) -> dict[str, list[int]]:
        """Applies all changed rows to target dataset
        
        Returns:
            ( dict[str, list[int]] ): A dictionary with 'updates' and 'insert' keys that have a list of oids updated or inserted
        """
        changes: dict[str, list[int]] = {
            'updates': [],
            'inserts': [],
            'deletes': [],
        }
        
        if ( updates := self._updates() ):
            if len(updates) == 1:
                oid = list(updates.keys())[0]
                where_clause = f"{self.target_oid_field} = {oid}"
            else:
                where_clause = f"{self.target_oid_field} IN ({','.join(map(str, updates.keys()))})"
            with self.target_editor, UpdateCursor(self.target_table, self.fields, where_clause=where_clause) as cur:
                changes['updates'] = [cur.updateRow(updates[oid]) or int(oid) for oid, *_ in cur]
        
        if ( inserts := self._inserts() ):
            with self.target_editor, InsertCursor(self.target_table, self.fields) as cur:
                changes['inserts'] = [cur.insertRow(row) for row in inserts]
        
        if ( deletes := self._deletes() ):
            if len(deletes) == 1:
                oid = deletes[0]
                where_clause = f"{self.target_oid_field} = {oid}"
            else:
                where_clause = f"{self.target_oid_field} IN ({','.join(map(str, deletes))})"
            with self.target_editor, UpdateCursor(self.target_table, self.fields, where_clause=where_clause) as cur:
                changes['deletes'] = [cur.deleteRow() or oid for oid, *_ in cur]
        
        return changes
    
def main():
    from pprint import pprint
    source = r'source_table'
    target = r'target_table'
    
    table_updater = TableUpdater(source, target)
    changes = table_updater.apply_changes()
    print(f"Updated {len(changes['updates'])} rows")
    print(f"Inserted {len(changes['inserts'])} rows")
    print(f"Deleted {len(changes['deletes'])} rows")
    pprint(changes)
    
if __name__ == '__main__':
    main()

 

You can use that as a rough base for what you're describing. I tried to use some advanced techniques in there so you can see the creative ways to use a cursor.

0 Kudos
ShariF
by
Regular Contributor

😮wow. Thank you. That looks daunting but I'll see if I can figure it out. I hope to see your blog about cursors.