Append only ids that don't exist in another class

ThomasHamming · ‎02-03-2022

I have 2 points classes with the same attributes, and I am looking to append data from one to the other. This is simple enough, but the first class has a lot of data that is already in the second class. I am looking to append only the rows/ids that don't exist in the other class already, and I can't seem to find an easy way to do this in pro. Any idea on how to do this?

For example, if this were a program it might look something like this:

for row in class1:

if row.id not in class2:

class2.append(row)

RexRobichaux · ‎08-11-2022

Hello Johannes,

Awesome and elegant solution! I'm trying to use this script to achieve a similar delta load between two feature classes but I'm running into an error I can't seem to get past. Regardless of how I try to structure the WHERE clause...arcpy continues to throw the 000358 error code indicating something isn't right with the SQL syntax. I'm printing out the (large) where clause (I have about 18k unique STATION_IDs that it's parsing through)...but everything looks find from a syntax perspective. After two days of testing/tweaking...I'm about to give up. Do you see anything amiss here? For development, both feature classes are local in a file geodatabase. I've tested in Python 3.x and 2.x and receive the same error in either environment.

import arcpy

target_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS_FINAL"
append_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS"
id_field = "STATION_ID"

# read the ids that are already in the target class
existing_ids = [row[0] for row in arcpy.da.SearchCursor(target_class, [id_field])]

# create a SQL where clause "STATION_ID NOT IN (1, 2, 3)"
if isinstance(existing_ids[0], str):
    id_list = ["'{}'".format(i) for i in existing_ids]
    #print id_list
else:
    id_list = ["'{}'".format(i) for i in existing_ids]
    #print id_list
where_clause = '{} NOT IN ({})'.format(id_field, ", ".join(id_list))
print (where_clause)
# create a layer of all append features that are not in the target class, append
append_layer = arcpy.management.MakeFeatureLayer(append_class,  "append_layer", where_clause)
arcpy.management.Append(append_layer, target_class)

Thanks Johannes (and/or anyone else) that might have any ideas!

JohannesLindner · ‎08-11-2022

Syntax looks fine. Could maybe be something with your weird id format.

Is the field in the append class really called STATION_ID? Are you able to use the query in the Select By Attributes tool?

Can you post the gdb, either here or in a pm?

Have a great day!
Johannes

RexRobichaux · ‎08-12-2022

Thanks for the prompt response Johannes! Correct- the field name in both feature classes is "STATION_ID". I've tried a subset selection in both ArcGIS Pro and ArcMap and both worked fine (with say 4-5 stations)... which further made me scratch my head. I've attached the zip scratch/dev file geodatabase to this post to see if that helps. If/when I get this working and pushed to test/prod, the WQM_STATIONS feature class will be access/stored in an enterprise Oracle geodatabase, however the target FC (WQM_STATIONS_FINAL) will still be in a FGDB.

PS- it's expected for there to be a good deal more attributes in the WQM_STATIONS_FINAL FC...I'll likely have to make a field mapping at some point as well for the append if I can just get past this SQL expression error. Thanks again for offering to take a look at this!

JohannesLindner · ‎08-12-2022

Yeah, it's your id values. Some of them contain apostrophes/single quotes. These characters are used in SQL to denote string values, so the where clause got corrupted.

For example, in this part, it recognizes the commas as string values and everything you actually want to select as mumbo jumbo:

(for the record, putting the where clause in there froze my PC for over 5 minutes. Can't imagine why, it's only 404.000 characters long :D)

Well, turns out there is only one id with an apostrophe:

[i for i in existing_ids if "'" in i]
["YOUNG'S POND"]

So either fix that id or make the code safer. Online search suggests that to escape a single quote in SQL, you just put another single quote in front of it.

So the code gets a tiny bit more complicated (replace call in line 12):

import arcpy

target_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS_FINAL"
append_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS"
id_field = "STATION_ID"

# read the ids that are already in the target class
existing_ids = [row[0] for row in arcpy.da.SearchCursor(target_class, [id_field])]

# create a SQL where clause "STATION_ID NOT IN (1, 2, 3)"
if isinstance(existing_ids[0], str):
    id_list = ["'{}'".format(i.replace("'", "''")) for i in existing_ids]
    #print id_list
else:
    id_list = ["'{}'".format(i) for i in existing_ids]
    #print id_list
where_clause = '{} NOT IN ({})'.format(id_field, ", ".join(id_list))
print (where_clause)
# create a layer of all append features that are not in the target class, append
append_layer = arcpy.management.MakeFeatureLayer(append_class,  "append_layer", where_clause)
arcpy.management.Append(append_layer, target_class)

And it works!

(And then the Append throws an exception, because your fields don't match, but that's besides the point...)

Have a great day!
Johannes

RexRobichaux · ‎08-12-2022

Awesome- thanks for sniffing that issue out Johannes...I suspected there might be a illicit character or two in there somewhere. This gets me up and running.

It's always surprised me there wasn't an off the shelf GP tool to do a delta compare/load as it's such a common ETL scenario (for us at least). Thanks again for the help and this nifty solution!

PeterDouglassEPD · ‎07-10-2023

Hello,

Is there a way ion which the syntax can be run but doesn't create a new feature layer, and just appends the target dataset? It may already do this, I'm a python novice

JohannesLindner · ‎07-11-2023

doesn't create a new feature layer

The three easiest ways to solve the original question in Python are:

create a layer with a definition query and append that layer (this is what my original answer does)
create a layer of the whole feature class, select the wanted rows and append that layer
do it all with arcpy.da.*Cursors

Options one and two create a temporal layer. If you run the script in the Python Window of an ArcGIS Project, that layer will be added to the current map. If that is your problem, you can add these lines to the end to remove the layer.

try:
    m = arcpy.mp.ArcGISProject("current").activeMap
    lyr = m.listLayers("append_layer")[0]
    m.removeLayer(lyr)
except:
    print("Could not remove the temp layer")

If you don't want to create the layer for a different reason, you can use option 3 (untested, will probably be slower for large datasets):

target_class = "path:/to/class_2"
append_class = "path:/to/class_1"
id_field = "RowID"

# read the ids that are already in the target class
existing_ids = [row[0] for row in arcpy.da.SearchCursor(target_class, [id_field])]

# create an SQL where clause "IdField NOT IN (1, 2, 3)"
if isinstance(existing_ids[0], str):
    id_list = ["'{}'".format(i) for i in existing_ids]
else:
    id_list = [str(i) for i in existing_ids]
where_clause = "{} NOT IN ({})".format(id_field, ", ".join(id_list))

# get a list of all field names
fields = [f.name for f in arcpy.ListFields(target_class)]
# exclude some of those fields to avoid raising errors
# replace the field "Shape" with "SHAPE@"
exclude_fields = ["OBJECTID", "GlobalID", "Shape_Length", "Shape_Area"]
fields = [f for f in fields if f not in exclude_fields]
fields = ["SHAPE@" if f == "Shape" else f for f in fields]

# start inserting into the target class
with arcpy.da.InsertCursor(target_class, fields) as i_cursor:
    # start reading from the append_class, use the query we defined earlier
    with arcpy.da.SearchCursor(append_class, fields, where_clause) as s_cursor:
        # copy the rows
        for row in s_cursor:
            i_cursor.insertRow(row)

Have a great day!
Johannes

PeterDouglassEPD · ‎09-15-2023

Hi Johannes,

This for the code, but it is returning this error:

NameError                                 Traceback (most recent call last)
In  [3]:
Line 18:    fields = [f.name for f in arcpy.ListFields(fc)]

NameError: name 'fc' is not defined

JohannesLindner · ‎09-15-2023

Oops, used a wrong variable name... fixed

Have a great day!
Johannes

PeterDouglassEPD · ‎09-22-2023

Perfect, thanks Johannes