Stable patterns for truncate and loading large datasets in ArcGIS Online programmatically

Jay_Gregory · ‎10-16-2024

I'm curious what the best practices are for truncate and load for large datasets into ArcGIS Online feature services using Python. My inclination is to use a spatial data frame, the manager.truncate and edit_features methods on the featurelayer class. However is writing 100k records using edit_features supported? Should I break it up into chunks of 10k and load that way? Maybe wait 10 seconds between each load?

Another patten (one I don't like is much) is using a CSV, shapefile, or something else and then doing the "overwrite layer" pattern. However in most cases I am querying the source data from an API or database elsewhere so its easier to create a SDF instead of a shapefile or CSV.

Yet another is creating a new feature service from the data and choosing replace layer.

The context here is I will sometimes get some unexplained errors using the edit_features method - it will just crash and say "json decode error".

JakeSkinner · ‎10-17-2024

Hi @Jay_Gregory, I've recommended the following script to several users who have had success. Just today I updated it to include upsert functionality.

Jay_Gregory · ‎10-22-2024

@JakeSkinner Thanks - though this seems a little beefy for a lot of use cases. But my question is specific to large feature classes - I would like to use the following two methods:

def truncate_portal_data(fl_url:str)->object:
    """Truncates a feature layer or able through the REST endpoint

    Args:
        fl_url (str): REST endpoint of the feature layer

    Returns:
        object: results object
    """
    ids = fl_url.query(return_ids_only=True)['objectIds']
    results = fl_url.edit_features(deletes=ids) if len(ids)>0 else  {"results":"No features to delete"}
    return results


def update_portal_data(df:pd.DataFrame, fl_url:str, truncate:bool=True, chunk_size:int=500)->object:
    import numpy as np
    from time import sleep
    """Adds features from a dataframe to a Portal / AGOL feature service.  
    Args:
        df (pd.DataFrame): DataFrame from which to update features
        fl_url (str): Feature service URL
        truncate (bool, optional): Truncate table before updating. Defaults to True.

    Returns:
        object: results of update operation
    """
    if truncate:
        truncate_portal_data(fl_url)
    numchunks = int(len(df)/chunk_size) or 1
    chunks = np.array_split(df,numchunks)
    for chunk in chunks:
         fl_url.edit_features(adds=chunk.spatial.to_featureset())
         sleep(5)
    return True

But my question is is there any reason your script is preferable over the much simpler API methods above for large feature classes?