Hi, I'm trying to use the overwrite feature within the Python API to add/update a feature layer. The script is supposed to update the layers data every morning, but it's not doing so. I'm not getting any error message, in fact, the overwrite method is returning 'success' : True. See below the code I'm using for the update/overwrite inside my python script. I followed the instructions in https://developers.arcgis.com/python/sample-notebooks/overwriting-feature-layers/ and reuse a method in another post here to go from table to pandas dataframe. Can anyone help, if I do the overwrite manually within Pro with layers generated by the script everything works fine and the data is updated.
from arcgis import GIS import arcpy import pandas as pd import numpy as np from arcgis.features import FeatureLayerCollection import os Default_gdb = 'Project.gdb' arcpy.env.overwriteOutput = True arcpy.env.workspace = Default_gdbgis = GIS("pro") parcel_layer = gis.content.get(id_layer1) parcel_layer = parcel_layer.layers parcelDF = parcel_layer.query().sdf DB = 'DBConnection.odc/dbo.Refresh' arcpy.conversion.TableToGeodatabase(DB, Default_gdb) def table_to_data_frame(in_table, input_fields=None, where_clause=None): """Function will convert an arcgis table into a pandas dataframe with an object ID index, and the selected input fields using an arcpy.da.SearchCursor.""" OIDFieldName = arcpy.Describe(in_table).OIDFieldName if input_fields: final_fields = [OIDFieldName] + input_fields else: final_fields = [field.name for field in arcpy.ListFields(in_table)] data = [row for row in arcpy.da.SearchCursor(in_table, final_fields, where_clause=where_clause)] fc_dataframe = pd.DataFrame(data, columns=final_fields) fc_dataframe = fc_dataframe.set_index(OIDFieldName, drop=True) return fc_dataframe CDW = table_to_data_frame(os.path.join(Default_gdb, 'dbo_Refresh')) parcel_land = parcelDF.merge(CDW, left_on='FOLIO', right_on='FOLIO', how='left') parcel_land .spatial.to_featureclass(os.path.join(Default_gdb, 'Data_All')) data_layer = gis.content.get(id2) data_item = FeatureLayerCollection.fromitem(data_layer) data_item.manager.overwrite(os.path.join(Default_gdb, 'Data_All'))
In my own personal experience, doing a full overwrite is hit-or-miss, and I have broken quite a number of layers that way.
Since you're already using the ArcGIS Python API, I would suggest as an alternative to look at using Truncate and then appending to the layer. Unless you're changing the schema, that is. But I've found that in-place updates are more reliable than a full overwrite.
Also, you could use pandas.read_sql to convert your DB table to a pandas dataframe directly. You might not need to use arcpy at all.
Thanks Josh, I'm using the overwrite because I have around 1 million rows, and at the moment, a lot of those are being updated daily, that's why I'm using overwrite. Regarding the schema, it's the same schema as in the Feature Layer, my biggest concern is that if I do it manually from PRO, using the same layer generated by the script, the overwrite works fine while with the script no. Thanks again, I'll take a look into pd.read_sql
Out of curiosity, how many actually get updated? In our update scripts, we have big datasets (60k is a far cry from a million, though!), but only a subset of the features are actually modified in a given day. It gets a bit more complex, but we use last_edited_date timestamps and pandas.DataFrame.compare to selectively update only those features which have been edited in a given time span.
So for instance, out of our 60k parcels, we identify the few hundred which had their assessment data updated, and apply edits to just those features. Not sure if your particular dataset would work with that idea, but figured it's worth a mention.
Right now less than 10k are updated daily, the problem is that we don't have the last_edited_date in our database so I'll have to create a method to do that. I do have another layer that has just 22k records, and I do the overwrite daily without any problem using the same approach. I'll check how manually check the updates inside an audit table we use. Thanks for the suggestion.
Hey @jcarlson ,
how do you using the last_edited_date timestamp with pandas.dataframe.compare? We have an enterprise sql server 2019 that we'd like to somehow update data with our published layers arcgis online. Autmating something has been quite difficult. We've tried using python to overwrite layers which breaks and geojson seems to be a hit or miss in terms of breaking. We can't seem to find any working examples to setup a workflow
Edits only happen on one side of the process, so the last_edit timestamp isn't included in the comparison. We just use it to identify features which have been recently edited, but you could easily run the comparison against the entire layer. Just takes longer.
I put a much-simplified version of the process into a GitHub repo for a presentation recently, which you can find here: https://github.com/jdcarls2/ilgisa-2022/blob/main/hosted-copy/hosted-copy.ipynb
This is the exact problem I've been bumping into for at least a year, overwrites are unreliable.
I will try your suggestions. Typically I update the base layers we use and I end up having to delete services and am forced to update maps. I've even done some scripts to find what maps use the broken layers to speed up those 11th hour repairs.
I have found that vector tile layers are pretty easy to update, usually, and the "replace" function works. Sadly you can't query them with Esri code (I am pretty sure Openlayers can!) so I still have feature layers (that break on overwrite) on top of the vector tiles so I still have to come up with a reliable way to update in place.
Aha I hit that case where it ALWAYS returns "success" too, so I added code at the end of the overwrite operation that does a query on the finished layer to make sure it's really there. When I do stuff like that I always put (sometimes less than complimentary) comments into the Python so that I can figure out why I did it in 6 months.
It's a joy to me to test something like this in 6 months or a year and find out I can safely delete the extra code. Usually I leave it in if it's not a time sink, extra testing leads to more confidence in the results.