I'm curious what the best practices are for truncate and load for large datasets into ArcGIS Online feature services using Python. My inclination is to use a spatial data frame, the manager.truncate and edit_features methods on the featurelayer class. However is writing 100k records using edit_features supported? Should I break it up into chunks of 10k and load that way? Maybe wait 10 seconds between each load?
Another patten (one I don't like is much) is using a CSV, shapefile, or something else and then doing the "overwrite layer" pattern. However in most cases I am querying the source data from an API or database elsewhere so its easier to create a SDF instead of a shapefile or CSV.
Yet another is creating a new feature service from the data and choosing replace layer.
The context here is I will sometimes get some unexplained errors using the edit_features method - it will just crash and say "json decode error".
Hi @Jay_Gregory, I've recommended the following script to several users who have had success. Just today I updated it to include upsert functionality.
@JakeSkinner Thanks - though this seems a little beefy for a lot of use cases. But my question is specific to large feature classes - I would like to use the following two methods:
def truncate_portal_data(fl_url:str)->object:
"""Truncates a feature layer or able through the REST endpoint
Args:
fl_url (str): REST endpoint of the feature layer
Returns:
object: results object
"""
ids = fl_url.query(return_ids_only=True)['objectIds']
results = fl_url.edit_features(deletes=ids) if len(ids)>0 else {"results":"No features to delete"}
return results
def update_portal_data(df:pd.DataFrame, fl_url:str, truncate:bool=True, chunk_size:int=500)->object:
import numpy as np
from time import sleep
"""Adds features from a dataframe to a Portal / AGOL feature service.
Args:
df (pd.DataFrame): DataFrame from which to update features
fl_url (str): Feature service URL
truncate (bool, optional): Truncate table before updating. Defaults to True.
Returns:
object: results of update operation
"""
if truncate:
truncate_portal_data(fl_url)
numchunks = int(len(df)/chunk_size) or 1
chunks = np.array_split(df,numchunks)
for chunk in chunks:
fl_url.edit_features(adds=chunk.spatial.to_featureset())
sleep(5)
return True
But my question is is there any reason your script is preferable over the much simpler API methods above for large feature classes?