Select to view content in your preferred language

How to use Python to report version differences?

244
9
a week ago
kduerksen_nwic
Emerging Contributor

I am running ArcGIS Pro 3.4.3, with its Python 3.11 interpreter. I posted this same question in Data Management, but this seems the better place for this question. I have created a tool written in Python that takes a SDE file as input and loops through the child versions listed within that SDE file, trying to gather and report the version differences (inserts, updates, deletes) for each of those child versions. The code reports different difference data and values for each of the different child versions, but does not appear to even closely resemble the difference information reported within ArcGIS Pro's "Version Changes" tool button.

For example, within ArcGIS Pro the Version Differences for the version I'm testing the code on displays:

dbo.report_points

--Delete (1)

dbo.report_polys:

--Insert (3)

--Delete (7)

--Update (1)

dbo.report_other:

--Delete (1)

--Update (1)

dbo.report_cfmou

--Insert (1)

 

However, my code's output (debug and other messages) displays:

Processing Feature Layer: DBO.informal_resource_lines
my_layer=DBO.informal_resource_lines, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.informal_resource_lines
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.informal_resource_lines':
Added features: 0
Deleted features: 0
Updated features: 2
Processing Feature Layer: DBO.informal_resource_points
my_layer=DBO.informal_resource_points, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.informal_resource_points
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.informal_resource_points':
Added features: 15
Deleted features: 6
Updated features: 1
Processing Feature Layer: DBO.informal_resource_polys
my_layer=DBO.informal_resource_polys, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.informal_resource_polys
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.informal_resource_polys':
Added features: 23
Deleted features: 8
Updated features: 3
Processing Feature Layer: DBO.report_lines
my_layer=DBO.report_lines, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.report_lines
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.report_lines':
Added features: 5
Deleted features: 193
Updated features: 15
Processing Feature Layer: DBO.report_points
my_layer=DBO.report_points, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.report_points
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.report_points':
Added features: 26
Deleted features: 201
Updated features: 14
Processing Feature Layer: DBO.report_polys
my_layer=DBO.report_polys, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.report_polys
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.report_polys':
Added features: 123
Deleted features: 4526
Updated features: 162
Processing Feature Layer: DBO.report_aprxloc
my_layer=DBO.report_aprxloc, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.report_aprxloc
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.report_aprxloc':
Added features: 25
Deleted features: 449
Updated features: 34
Processing Feature Layer: DBO.report_other
my_layer=DBO.report_other, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.report_other
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.report_other':
Added features: 28
Deleted features: 310
Updated features: 22
Processing Feature Layer: DBO.report_restricted
my_layer=DBO.report_restricted, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.report_restricted
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.report_restricted':
Added features: 0
Deleted features: 0
Updated features: 0
Processing Feature Layer: DBO.report_cfmou
my_layer=DBO.report_cfmou, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.report_cfmou
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.report_cfmou':
Added features: 1
Deleted features: 0
Updated features: 0
Processing Feature Layer: DBO.resource_districts
my_layer=DBO.resource_districts, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.resource_districts
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.resource_districts':
Added features: 1
Deleted features: 24
Updated features: 12
Processing Feature Layer: DBO.resource_lines
my_layer=DBO.resource_lines, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.resource_lines
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.resource_lines':
Added features: 21
Deleted features: 388
Updated features: 76
Processing Feature Layer: DBO.resource_points
my_layer=DBO.resource_points, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.resource_points
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.resource_points':
Added features: 42
Deleted features: 1055
Updated features: 21
Processing Feature Layer: DBO.resource_polys
my_layer=DBO.resource_polys, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.resource_polys
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.resource_polys':
Added features: 135
Deleted features: 2627
Updated features: 353
Processing Feature Layer: DBO.resource_aprxloc
my_layer=DBO.resource_aprxloc, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.resource_aprxloc
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.resource_aprxloc':
Added features: 58
Deleted features: 627
Updated features: 103
Processing Feature Layer: DBO.resource_restricted
my_layer=DBO.resource_restricted, ver_name.name="NWIC\RINERG".GR_report_voids_1, parent ver=dbo.DEFAULT
Creating Feature Class DBO.resource_restricted
Comparison between 'dbo.DEFAULT' and '"NWIC\RINERG".GR_report_voids_1' for feature 'DBO.resource_restricted':
Added features: 1
Deleted features: 2
Updated features: 5
"NWIC\RINERG".GR_report_voids_1 Version Changes = 11743

 

If someone can review my code and pass along any tips regarding what I am missing or doing incorrectly, it would be appreciated.

The main code I have performing the version processing for differences is as follows:

    # start the outer loop through the list of versions
    for ver_name in versions_list:
        if ver_name in db_roots:															# check for versions to ignore
            arcpy.AddMessage("**SKIPPING** ROOT VERSION {0}".format(ver_name.name))
            continue

        arcpy.AddMessage("PROCESSING VERSION {0}".format(ver_name.name))
        VersionChangesTally = 0

        for curr_feature in feature_list:
            arcpy.AddMessage("Processing Feature Layer: {0}".format(curr_feature))

            arcpy.AddMessage("my_layer={0}, ver_name.name={1}, parent ver={2}".format(curr_feature, ver_name.name, ver_name.parentVersionName))

            # Get add, update, delete counts
            # --- Define workspace and versions ---
            parent_version = "dbo.DEFAULT" 
            child_version = ver_name.name

            # --- Create temporary table views for comparison ---
            parent_view = "parent_view"
            child_view = "child_view"

            # Ensure temporary views are cleared
            if arcpy.Exists(parent_view):
                arcpy.Delete_management(parent_view)
            if arcpy.Exists(child_view):
                arcpy.Delete_management(child_view)

            arcpy.AddMessage("Creating Feature Class {0}".format(curr_feature))

            # Make table views for the parent and child versions
            try:
                # Set the workspace to the parent version and make a table view
                arcpy.env.workspace = sde_file
                parent_layer = arcpy.management.MakeFeatureLayer(curr_feature, parent_view)
                arcpy.ChangeVersion_management(parent_view, "TRANSACTIONAL", parent_version)

                # Set the workspace to the child version and make a table view
                child_layer = arcpy.management.MakeFeatureLayer(curr_feature, child_view)
                arcpy.ChangeVersion_management(child_view, "TRANSACTIONAL", child_version)
                arcpy.MakeTableView_management(curr_feature, child_view) # NOTE: this code seems to have no effect on execution

                # --- Extract OIDs into Python sets for efficient comparison ---
                parent_oids = set()
                with arcpy.da.SearchCursor(parent_view, "OBJECTID") as cursor:
                    for row in cursor:
                        parent_oids.add(row[0])

                child_oids = set()
                with arcpy.da.SearchCursor(child_view, "OBJECTID") as cursor:
                    for row in cursor:
                        child_oids.add(row[0])
                try:
                    add_oids = child_oids - parent_oids
                    adds = len(add_oids)

                    del_oids = parent_oids - child_oids
                    deletes = len(del_oids)

                    update_oids = set() # empty set to be added to below
                    updates = 0

                    common_oids = parent_oids.intersection(child_oids)
                finally:
                    parent_oids.clear()
                    child_oids.clear()

                # A more robust update check would involve comparing a field value
                # (e.g., a last_edited_date) or hashing the row values.
                with arcpy.da.SearchCursor(parent_view, ["OBJECTID", "SHAPE@JSON"]) as p_cursor, \
                        arcpy.da.SearchCursor(child_view, ["OBJECTID", "SHAPE@JSON"]) as c_cursor:
                    parent_dict = {row[0]: row[1] for row in p_cursor if row[0] in common_oids}
                    child_dict = {row[0]: row[1] for row in c_cursor if row[0] in common_oids}

                with arcpy.da.SearchCursor(child_view, ["OBJECTID", "OtherID"]) as c_cursor:
                    child_otherid_dict = {row[0]: row[1] for row in c_cursor} #if row[0] in common_oids}

                with arcpy.da.SearchCursor(parent_view, ["OBJECTID", "OtherID"]) as p_cursor:
                    parent_otherid_dict = {row[0]: row[1] for row in p_cursor} #if row[0] in common_oids}

                for oid in common_oids:
                    if parent_dict[oid] != child_dict[oid]:
                        updates += 1
                        update_oids.add(oid)

                # --- Report results ---
                arcpy.AddMessage(f"Comparison between '{parent_version}' and '{child_version}' for feature '{curr_feature}':")
                arcpy.AddMessage(f"Added features: {adds}")
                arcpy.AddMessage(f"Deleted features: {deletes}")
                arcpy.AddMessage(f"Updated features: {updates}")

                # innermost loop: go through the 3 kinds of differences (inserts, deletes, updates)
                for oid in del_oids:
                    VersionChangesTally += 1
                    Primary = "<unknown>"

                    xls_file.row(child_version, curr_feature, "Deletes", Primary, oid, parent_otherid_dict[oid]) #"other")

                for oid in update_oids:
                    VersionChangesTally += 1
                    Primary = "<unknown>"
                    xls_file.row(child_version, curr_feature, "Updates", Primary, oid, child_otherid_dict[oid]) #"other")

                for oid in add_oids:
                    VersionChangesTally += 1
                    Primary = "<unknown>"
                    xls_file.row(child_version, curr_feature, "Inserts", Primary, oid, child_otherid_dict[oid]) #"other")


                #arcpy.AddMessage(f"VERSION CHANGE COUNT: {VersionChangesTally}")

            except arcpy.ExecuteError:
                arcpy.AddMessage(arcpy.GetMessages(2))
            finally:
                # Clean up temporary table views
                if arcpy.Exists(parent_view):
                    arcpy.Delete_management(parent_view)
                if arcpy.Exists(child_view):
                    arcpy.Delete_management(child_view)

        # report tallied changes
        arcpy.AddMessage("{0} Version Changes = {1}".format(ver_name.name, VersionChangesTally))

 

0 Kudos
9 Replies
TonyAlmeida
MVP Regular Contributor

A couple things:

1. Use  ChangeVersion_management tool

2. Compare all fields, not just geometry and one field

3. Use GlobalIDs if available (more reliable than OIDs for versioning)

0 Kudos
kduerksen_nwic
Emerging Contributor

Thanks for the feedback, but I don't see anything in arcpy resembling "CompareVersions_management" (or arcpy.manangement.CompareVersions) to call in Python code.

0 Kudos
TonyAlmeida
MVP Regular Contributor

My bad, I meant ChangeVersion_management, which you are using.

List all versions like below or call a specific version by name or GlobalID.

You must first start_reading on a version, generate a dictionary of changes, and then stop_reading.

from arcgis.gis import GIS
from arcgis.features import _version

my_gis = GIS(URL, Username, Password) #many options depending on authentication type
my_vms = r'https://gis.***.com/un/rest/services/***/VersionManagementServer'
VersionManager = _version.VersionManager(my_vms, my_gis)
VersionList = VersionManager.all

for Version in VersionList:
    if not Version.properties["isLocked"]:
        Version.start_reading()
        Differences = Version.differences(result_type='objectIds', moment=None) 
        Version.stop_reading()
        if 'differences' in Differences:
            print(Differences)
        else:
            print('No changes: '+ str(Version))

 

0 Kudos
kduerksen_nwic
Emerging Contributor

Thanks again for your feedback and code example, but the data I'm working with is different. As noted in my original post, I am taking an SDE file as a parameter to this ArcGIS Pro Python-based tool. I am not dealing with web-based data, so I don't have a URL, username, and password to pass along to GIS(). I properly round up the list of versions I have available in my SDE file with the following code:

versions_list = []
    for version in arcpy.da.ListVersions(sde_file):
        # The version.parent property will be None for the DEFAULT version
        # It will contain the name of the parent version for all other versions
        if version.parentVersionName and version.parentVersionName == "dbo.DEFAULT":
            # Add to list of versions to report on
            versions_list.append(version)

However, the version information and its object properties are not the same as the code example you have. There is no start/stop reading method, and no differences method.

Your feedback, along with one other user's tip, did make me wonder about the date the child version was created and how it relates to the parent version. Since the child "version" information I am working with does have a created date, and ChangeVersion_management() does have a date parameter, I tried passing the child version's date of creation like so:

arcpy.ChangeVersion_management(parent_view, "TRANSACTIONAL", parent_version, ver_name.created.date())

...but I still did not get any different result.

0 Kudos
HaydenWelch
MVP Regular Contributor

If you want to do manual checks, I've written a simple table diff class that will handle checking for changes using edit_date:

from datetime import datetime

from arcpy import (
    Exists, 
    Describe,
    Geometry,
)

from typing import (
    Any,
)

from arcpy.da import (
    Editor,
    SearchCursor,
    UpdateCursor,
    InsertCursor,
)

import arcpy.typing.describe as typedesc

from dataclasses import dataclass

@dataclass
class Changes:
    updates: list[int]
    inserts: list[int]
    deletes: list[int]
           
class TableUpdater: 
    
    Row = tuple[int, Geometry, datetime] # Type first 3 values of Row records
    
    def __init__(self, source_table: str, target_table: str) -> None:
        if not Exists(source_table) and Exists(target_table):
            raise ValueError('Invalid table paths!')
        
        self.target_table = target_table
        self.target_desc: typedesc.FeatureClass = Describe(target_table)
        
        self.source_table = source_table
        self.source_desc: typedesc.FeatureClass = Describe(source_table)
        
        # Do some filering to make sure we can use OID@ and SHAPE@ for the table cursors
        # By default a cursor will return SHAPE@XY
        self.source_shape_field = self.source_desc.shapeFieldName
        self.source_oid_field = self.source_desc.OIDFieldName
        self.source_fields = [field.name for field in self.source_desc.fields]
        
        self.target_shape_field = self.target_desc.shapeFieldName
        self.target_oid_field = self.target_desc.OIDFieldName
        self.target_fields = [field.name for field in self.target_desc.fields]
        
        self.target_editor = Editor(self.target_desc.workspace.catalogPath)
        
        self.fields = ['OID@', 'SHAPE@', 'EDITED@'] + [
            field
            for field in 
            set(self.source_fields) | set(self.target_fields)
            if field not in [self.source_oid_field, self.target_oid_field, self.source_shape_field, self.target_shape_field]
        ]
        
        self._table_diff = None
        
    @property
    def source_state(self):
        return self._gather_rows(self.source_table)
    
    @property
    def target_state(self):
        return self._gather_rows(self.target_table)
    
    @property
    def table_diff(self):
        if not self._table_diff:
            self._table_diff = self._get_table_diff()
        return self._table_diff
    
    def _gather_rows(self, table: str) -> dict[int, str]:
        """Gather all rows and their last updated times
        
        Parameters:
            table (str): Path to the table that needs updating
            
        Returns:
            ( dict[int, str] ): Mapping of OID to updated_at string
        """
        return {oid: updated_at for oid, updated_at in SearchCursor(table, ['OID@', 'EDITED@'])}
    
    def _get_table_diff(self) -> Changes:
        """Filter out records from the new_table that are unchanged from the old_table
        
        Returns:
            a list of OIDs in the new_table that need to be 
        """
        old_rows = self.target_state
        new_rows = self.source_state
        return Changes(
            updates=list(oid for oid in new_rows if oid in old_rows and old_rows[oid] < new_rows[oid]), 
            inserts=list(new_rows.keys() - old_rows.keys()), 
            deletes=list(old_rows.keys() - new_rows.keys()),
        )
    
    # Change Detection methods
    def _inserts(self) -> list[tuple[Any, ...]] | None:
        inserts = self.table_diff.inserts
        if inserts:
            if len(inserts) == 1:
                where_clause = f"{self.source_oid_field} = {inserts[0]}"
            else:    
                where_clause = f"{self.source_oid_field} IN ({','.join(map(str, inserts))})"
                
            return [row for row in SearchCursor(self.source_table, self.fields, where_clause=where_clause)]
    
    def _updates(self) -> dict[int, tuple[Any, ...]] | None:
        updates = self.table_diff.updates
        if updates:
            if len(updates) == 1:
                where_clause=f"{self.source_oid_field} = ({updates[0]})"
            else:
                where_clause=f"{self.source_oid_field} IN ({','.join(map(str, updates))})"
                
            return {
                row[0]: row
                for row in SearchCursor(self.source_table, self.fields, where_clause=where_clause)
            }
    
    def _deletes(self) -> list[int] | None:
        return self.table_diff.deletes or None
    
    def apply_changes(self) -> dict[str, list[int]]:
        """Applies all changed rows to target dataset
        
        Returns:
            ( dict[str, list[int]] ): A dictionary with 'updates' and 'insert' keys that have a list of oids updated or inserted
        """
        changes: dict[str, list[int]] = {
            'updates': [],
            'inserts': [],
            'deletes': [],
        }
        
        if ( updates := self._updates() ):
            if len(updates) == 1:
                oid = list(updates.keys())[0]
                where_clause = f"{self.target_oid_field} = {oid}"
            else:
                where_clause = f"{self.target_oid_field} IN ({','.join(map(str, updates.keys()))})"
            with self.target_editor, UpdateCursor(self.target_table, self.fields, where_clause=where_clause) as cur:
                changes['updates'] = [cur.updateRow(updates[oid]) or int(oid) for oid, *_ in cur]
        
        if ( inserts := self._inserts() ):
            with self.target_editor, InsertCursor(self.target_table, self.fields) as cur:
                changes['inserts'] = [cur.insertRow(row) for row in inserts]
        
        if ( deletes := self._deletes() ):
            if len(deletes) == 1:
                oid = deletes[0]
                where_clause = f"{self.target_oid_field} = {oid}"
            else:
                where_clause = f"{self.target_oid_field} IN ({','.join(map(str, deletes))})"
            with self.target_editor, UpdateCursor(self.target_table, self.fields, where_clause=where_clause) as cur:
                changes['deletes'] = [cur.deleteRow() or oid for oid, *_ in cur]
        
        # Clear the diff after operations are completed in case the object is re-used later
        self._table_diff = None
        return changes

 

It's meant to be an example for advanced cursor usage, but you're welcome to use it and expand on it for your use case. Or just copy parts out that you need

0 Kudos
kduerksen_nwic
Emerging Contributor

At a glance, that code you've assembled looks like it will be useful. I'll have to implement it in my project and see if I can gather the info I'm needing. Thanks for your feedback and code.

HaydenWelch
MVP Regular Contributor

This code is looking exclusively at last edit date for the diff, so you'll need to make sure that all tables you want to do a diff on have that field and have it managed by the system.

This was originally written to show how to keep one table in sync with another assuming that both have an objectid sequence that is synced. 

0 Kudos
kduerksen_nwic
Emerging Contributor

Thanks for the additional clarification about last edit date. I've implemented the code in my project, and I'm testing its usage for my needs. I'll have more feedback on whether I got my issue solved in a while.

0 Kudos
kduerksen_nwic
Emerging Contributor

Thanks again for the code to try out. I cannot get the results I'm trying to achieve using that example, nor can I get my original code manipulated enough to get the desired results. I all changes and testing, I don't get any insert/update/delete counts that nearly match the version changes in the ArcGIS Pro window. In all tests of all the coding methods, I got child versions reporting changes where there were actually none, and other cases where child versions had changes but were reported as having zero changes.

It's Friday afternoon and I'm brain dead. Thanks again for the input and code example. I'm not giving up on this, but need a break.

0 Kudos