Select to view content in your preferred language

Append only ids that don't exist in another class

4221
19
Jump to solution
02-03-2022 07:22 AM
ThomasHamming
Emerging Contributor

I have 2 points classes with the same attributes, and I am looking to append data from one to the other. This is simple enough, but the first class has a lot of data that is already in the second class. I am looking to append only the rows/ids that don't exist in the other class already, and I can't seem to find an easy way to do this in pro. Any idea on how to do this?

For example, if this were a program it might look something like this:

for row in class1:

if row.id not in class2:

class2.append(row)

0 Kudos
19 Replies
RexRobichaux
Frequent Contributor

Hello Johannes, 

 

 Awesome and elegant solution! I'm trying to use this script to achieve a similar delta load between two feature classes but I'm running into an error I can't seem to get past. Regardless of how I try to structure the WHERE clause...arcpy continues to throw the 000358 error code indicating something isn't right with the SQL syntax. I'm printing out the (large) where clause (I have about 18k unique STATION_IDs that it's parsing through)...but everything looks find from a syntax perspective.  After two days of testing/tweaking...I'm about to give up. Do you see anything amiss here? For development, both feature classes are local in a file geodatabase. I've tested in Python 3.x and 2.x and receive the same error in either environment. 

 

 

 

 

import arcpy

target_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS_FINAL"
append_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS"
id_field = "STATION_ID"

# read the ids that are already in the target class
existing_ids = [row[0] for row in arcpy.da.SearchCursor(target_class, [id_field])]

# create a SQL where clause "STATION_ID NOT IN (1, 2, 3)"
if isinstance(existing_ids[0], str):
    id_list = ["'{}'".format(i) for i in existing_ids]
    #print id_list
else:
    id_list = ["'{}'".format(i) for i in existing_ids]
    #print id_list
where_clause = '{} NOT IN ({})'.format(id_field, ", ".join(id_list))
print (where_clause)
# create a layer of all append features that are not in the target class, append
append_layer = arcpy.management.MakeFeatureLayer(append_class,  "append_layer", where_clause)
arcpy.management.Append(append_layer, target_class)

 

 

 

 

RexRobichaux_0-1660244105414.png

 

Thanks Johannes (and/or anyone else) that might have any ideas!

 

0 Kudos
JohannesLindner
MVP Frequent Contributor

Syntax looks fine. Could maybe be something with your weird id format.

Is the field in the append class really called STATION_ID? Are you able to use the query in the Select By Attributes tool?

Can you post the gdb, either here or in a pm?


Have a great day!
Johannes
RexRobichaux
Frequent Contributor

Thanks for the prompt response Johannes! Correct- the field name in both feature classes is "STATION_ID". I've tried a subset selection in both ArcGIS Pro and ArcMap and both worked fine (with say 4-5 stations)... which further made me scratch my head. I've attached the zip scratch/dev file geodatabase to this post to see if that helps. If/when I get this working and pushed to test/prod, the WQM_STATIONS feature class will be access/stored in an enterprise Oracle geodatabase, however the target FC (WQM_STATIONS_FINAL) will still be in a FGDB.

PS- it's expected for there to be a good deal more attributes in the WQM_STATIONS_FINAL FC...I'll likely have to make a field mapping at some point as well for the append if I can just get past this SQL expression error. Thanks again for offering to take a look at this!

0 Kudos
JohannesLindner
MVP Frequent Contributor

Yeah, it's your id values. Some of them contain apostrophes/single quotes. These characters are used in SQL to denote string values, so the where clause got corrupted.

For example, in this part, it recognizes the commas as string values and everything you actually want to select as mumbo jumbo:

JohannesLindner_0-1660314404543.png

(for the record, putting the where clause in there froze my PC for over 5 minutes. Can't imagine why, it's only 404.000 characters long :D)

 

Well, turns out there is only one id with an apostrophe:

[i for i in existing_ids if "'" in i]
["YOUNG'S POND"]

 

So either fix that id or make the code safer. Online search suggests that to escape a single quote in SQL, you just put another single quote in front of it.

So the code gets a tiny bit more complicated (replace call in line 12):

import arcpy

target_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS_FINAL"
append_class = "C:\\Rex\\WQM_STATIONS_FINAL\\WQM_STATIONS_DELTA_LOAD.gdb\\WQM_STATIONS"
id_field = "STATION_ID"

# read the ids that are already in the target class
existing_ids = [row[0] for row in arcpy.da.SearchCursor(target_class, [id_field])]

# create a SQL where clause "STATION_ID NOT IN (1, 2, 3)"
if isinstance(existing_ids[0], str):
    id_list = ["'{}'".format(i.replace("'", "''")) for i in existing_ids]
    #print id_list
else:
    id_list = ["'{}'".format(i) for i in existing_ids]
    #print id_list
where_clause = '{} NOT IN ({})'.format(id_field, ", ".join(id_list))
print (where_clause)
# create a layer of all append features that are not in the target class, append
append_layer = arcpy.management.MakeFeatureLayer(append_class,  "append_layer", where_clause)
arcpy.management.Append(append_layer, target_class)

 

And it works!

(And then the Append throws an exception, because your fields don't match, but that's besides the point...)


Have a great day!
Johannes
RexRobichaux
Frequent Contributor

Awesome- thanks for sniffing that issue out Johannes...I suspected there might be a illicit character or two in there somewhere. This gets me up and running.

It's always surprised me there wasn't an off the shelf GP tool to do a delta compare/load as it's such a common ETL scenario (for us at least). Thanks again for the help and this nifty solution!

0 Kudos
PeterDouglassEPD
Emerging Contributor

Hello,

 

Is there a way ion which the syntax can be run but doesn't create a new feature layer, and just appends the target dataset? It may already do this, I'm a python novice

0 Kudos
JohannesLindner
MVP Frequent Contributor

doesn't create a new feature layer

The three easiest ways to solve the original question in Python are:

  1. create a layer with a definition query and append that layer (this is what my original answer does)
  2. create a layer of the whole feature class, select the wanted rows and append that layer
  3. do it all with arcpy.da.*Cursors

Options one and two create a temporal layer. If you run the script in the Python Window of an ArcGIS Project, that layer will be added to the current map. If that is your problem, you can add these lines to the end to remove the layer.

 

 

try:
    m = arcpy.mp.ArcGISProject("current").activeMap
    lyr = m.listLayers("append_layer")[0]
    m.removeLayer(lyr)
except:
    print("Could not remove the temp layer")

 

 

 

 

If you don't want to create the layer for a different reason, you can use option 3 (untested, will probably be slower for large datasets):

 

 

target_class = "path:/to/class_2"
append_class = "path:/to/class_1"
id_field = "RowID"

# read the ids that are already in the target class
existing_ids = [row[0] for row in arcpy.da.SearchCursor(target_class, [id_field])]

# create an SQL where clause "IdField NOT IN (1, 2, 3)"
if isinstance(existing_ids[0], str):
    id_list = ["'{}'".format(i) for i in existing_ids]
else:
    id_list = [str(i) for i in existing_ids]
where_clause = "{} NOT IN ({})".format(id_field, ", ".join(id_list))

# get a list of all field names
fields = [f.name for f in arcpy.ListFields(target_class)]
# exclude some of those fields to avoid raising errors
# replace the field "Shape" with "SHAPE@"
exclude_fields = ["OBJECTID", "GlobalID", "Shape_Length", "Shape_Area"]
fields = [f for f in fields if f not in exclude_fields]
fields = ["SHAPE@" if f == "Shape" else f for f in fields]

# start inserting into the target class
with arcpy.da.InsertCursor(target_class, fields) as i_cursor:
    # start reading from the append_class, use the query we defined earlier
    with arcpy.da.SearchCursor(append_class, fields, where_clause) as s_cursor:
        # copy the rows
        for row in s_cursor:
            i_cursor.insertRow(row)

 

 

 


Have a great day!
Johannes
0 Kudos
PeterDouglassEPD
Emerging Contributor

Hi Johannes, 

This for the code, but it is returning this error:

NameError                                 Traceback (most recent call last)
In  [3]:
Line 18:    fields = [f.name for f in arcpy.ListFields(fc)]

NameError: name 'fc' is not defined
0 Kudos
JohannesLindner
MVP Frequent Contributor

Oops, used a wrong variable name... fixed


Have a great day!
Johannes
0 Kudos
PeterDouglassEPD
Emerging Contributor

Perfect, thanks Johannes

0 Kudos