Appending from pandas dataframe directly to hosted table

450
4
Jump to solution
03-07-2024 12:05 AM
HanliePetoors
New Contributor III

Hi,

I have a hosted feature layer in AGOL that contains 209 layers of asset types.

I want to export the assets to an asset register that resides in a hosted table in AGOL.

To do this I loop through the asset layers and append them to a pandas dataframe containing a subset of the common fields in the asset layers. All the layers' data are then in one dataframe.

I then want to append this dataframe to the hosted table. The table must be truncated and appended each time the script is run. I already have the code to truncate the table but now I'm struggling with the append part.

The intention is for the script to be run as an AGOL notebook. Most of the examples I have found require a local CSV version of the dataframe that then gets re-published  to AGOL.

1. Is it possible to bypass the local CSV and just append the dataframe directly to the hosted table? Perhaps with an insert cursor?

2. If not, how does the local file work if one is running the notebook from inside AGOL? How do you specify where the local file is supposed to be?

Thanks

Hanlie

0 Kudos
1 Solution

Accepted Solutions
jcarlson
MVP Esteemed Contributor

Ah, I see. Well, you could start your notebook with a command to install geopandas, but it's probably not the right approach.

Does your data have any spatial information in it?

As for the append method, when I'm in a situation where I have a dataframe or other data source that I want to append to my layer, the process, in my mind, is very simple:

source → function → destination

But the append method requires you to export your dataframe to a file, upload the file to your portal, get its itemid, run append and refer to the itemid, plus any additional info like field mappings, etc. It seems like a lot of extra steps when I already have the data in my python workspace. Creating and uploading files is just a waste of time and storage space.

Also, the append method assumes that your data is complete, which is not always the case for what I'm doing. I may have a dataframe of a particular subset of columns that I want to update. As long as I have the matching objectid, FeatureLayer.edit_features can handle that just fine. If you know what you're doing, you can be a lot more precise and efficient than the append method permits.

- Josh Carlson
Kendall County GIS

View solution in original post

4 Replies
jcarlson
MVP Esteemed Contributor

In the ArcGIS Python API, there is actually a method for converting you GeoPandas frame to Esri's "GeoAccessor".

https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#geoaccessor

Once your data is in a GeoAccessor, you can use GeoAccessor.spatial.to_featureset() to generate a featureset that can be applied directly to a table.

The "append" method in the API is far more trouble than it's worth, so we can just use "edit_features" on the feature layer object.

from arcgis.features import GeoAccessor

# your current process goes here where you build your geodataframe
# let's assume your geodataframe is called "gdf"
# you also truncated the layer already, so let's assume that featurelayer objected is called "fl"

# convert to geoaccessor
sdf = GeoAccessor.from_geodataframe(gdf)

# convert to featureset
fs = sdf.spatial.to_featureset()

# apply edits
fl.edit_features(adds=fs)

 

The documentation suggests doing your edits in batches if there is a very large number of features.

Also, if you're certain that the data all needs to be refreshed, truncating and appending is fine. But if some of the features don't actually change, consider using the pandas method compare. You'd have to query the hosted layer to its own dataframe, but once you have it, you can compare the two frames to identify which specific rows and columns have changed.

We've moved many of our scripts to that model. Our parcels layer, for instance, has about 60k features in it, but on a given day, only about 300 parcels are edited, and those only in a few columns like owner name. By isolating the specific attributes edited, our actual edits are kept to a minimum, and the layer is never empty.

- Josh Carlson
Kendall County GIS
HanliePetoors
New Contributor III

Hi Josh,

Thanks for the reply.

I'm struggling with your suggestion though because I'm using a standard pandas dataframe and I'm not sure about using Geopandas. If the script is going to run from a Notebook server in AGOL, then the administrator of the organisation will need to to add that module, correct? I'm not sure if that's an option.

On top of that Geopandas requires an address or coordinates that I don't have. I could put in fake ones, but it seems like a very roundabout way of doing things.

Why do you say that the append method in the API is far more trouble than it's worth?

Thanks

Hanlie

0 Kudos
jcarlson
MVP Esteemed Contributor

Ah, I see. Well, you could start your notebook with a command to install geopandas, but it's probably not the right approach.

Does your data have any spatial information in it?

As for the append method, when I'm in a situation where I have a dataframe or other data source that I want to append to my layer, the process, in my mind, is very simple:

source → function → destination

But the append method requires you to export your dataframe to a file, upload the file to your portal, get its itemid, run append and refer to the itemid, plus any additional info like field mappings, etc. It seems like a lot of extra steps when I already have the data in my python workspace. Creating and uploading files is just a waste of time and storage space.

Also, the append method assumes that your data is complete, which is not always the case for what I'm doing. I may have a dataframe of a particular subset of columns that I want to update. As long as I have the matching objectid, FeatureLayer.edit_features can handle that just fine. If you know what you're doing, you can be a lot more precise and efficient than the append method permits.

- Josh Carlson
Kendall County GIS
HanliePetoors
New Contributor III

Hi Josh,

Thanks, I'll give the edit_features method a go.

The data from which I construct the dataframe is spatial, but I don't bring the geometries into the dataframe because my output is to a hosted table. Basically an asset register for the financial people.

Regards

Hanlie

0 Kudos