I am writing a pipeline script that gets data from different sources using post requests, adds them to a geodataframe and processes the data using Geopandas functions. Then I need to add the result to AGOL.
I don't use local files, everything is in memory.
Can I use gis.content.add to add a geojson dictionary to AGOL? Or any other in memory data like a feature set..
I have been trying this:
gis=GIS("https://arcgis.com", username, password)
json_gdf=gdf.to_json()
data_properties = {
'title': 'new_test',
'description': 'test',
'tags': 'test',
'type': 'GeoJson'
}
flayer=gis.content.add(data_properties, json_gdf)
I get the following error:
flayer=gis.content.add(data_properties, json_gdf)
File "/mnt/c/Users//Documents/Github//.venv/lib/python3.8/site-packages/arcgis/gis/__init__.py", line 5141, in add
itemid = self._portal.add_item(
File "/mnt/c/Users//Documents/Github//.venv/lib/python3.8/site-packages/arcgis/gis/_impl/_portalpy.py", line 362, in add_item
raise RuntimeError("File(" + data + ") not found.")
In a way it makes sense, because ContentManager.add expects the following formats:
"Content can be a file (such as a service definition, shapefile, CSV, layer package, file geodatabase, geoprocessing package, map package) or it can be a URL (to an ArcGIS Server service, WMS service, or an application)."
What are my options here?
I also tried adding the geodataframe to AGOL directly using spatial.to_featurelayer - it works, but the field names are messed up, since that tool uses a shapefile as an intermediary (e.g. "SOURCE_FEATURE" becomes "source_fea"). That means I cannot append that data to an existing feature layer without having to write hundreds of field mapping rules.
NOTE: edit_features is not an option, the data is too big, and even chunked, it takes too long to add (and times out in Azure, but that's another story).
Take a look at arcgis.features.GeoAccessor.from_geodataframe. You can convert your geodataframe to a spatially enabled dataframe, which can then be written directly to a new layer using to_featurelayer. Using an intermediate shapefile shouldn't be necessary.
There is also the append function, but I don't have any experience with using it.
I know you said it's another story, but how does edit_features time out? How are you chunking it up when you try that?
Also: your post makes it sound like you want this data to get added to an existing layer, rather than be added as a new layer, so it's a bit confusing that you're focusing on the content.add function here. Is the end goal a separate layer, or edits to an existing service?
If the latter: are you simply adding new records to the layer, or are some rows being updated / deleted?
I am sorry for the confusion.
Yes, I am trying to add new records to an existing layer. I was using edit_features with good results, but this being a massive dataset - it times out. I am rerunning now to be able to add the exact error here.
How do I chunk? I wrote a function that basically iterates over the feature set and adds a couple hundred features at a time. Nothing fancy.
Documentation for edit_features also mentions: "When making large number (250+ records at once) of edits, append should be used over edit_features to improve performance and ensure service stability."
Tried arcgis.features.GeoAccessor.from_geodataframe and to_featurelayer - but it messes up the field names. It's not me using the intermediate shapefile, the to_featurelayer tool uses it, thus it messes up those field names. This makes it unusable with the append tool (unless I do field mappings)
First thing: have you tried using sanitize_columns=False when you convert the dataframe? It defaults to True, and does have the potential to mess with your columns quite a bit, depending on the original names. But I know there are places where it will sanitize the column names and not let you choose otherwise.
For chunking your edits, I do the same thing, basically.
i = 0
chunk = 200
while i < len(sdf):
fs = sdf.iloc[i:i+chunk].spatial.to_featureset()
featurelayer.edit_features(adds=fs)
i += chunk
I have a few services I regularly append to and edit, and some of them are quite large datasets (though I wouldn't call them "massive"), and this method seems to work just fine.
Finally, if you're working w/ JSON data, you can submit a list of dicts to the edit_feature function, so long as the follow the same format as it expects. I have a couple scripts that use that method for one reason or another.
feats = [
{
'attributes': {
'some_attribute': 'a value',
'another_attribute': 1002.14
},
'geometry': {
'x': 44.05,
'y': 27.2201,
'spatialReference': {
'wkid': 4326
}
}
},
{
'attributes': {
'some_attribute': 'a different value',
'another_attribute': -1.04
},
'geometry': {
'x': 41.3,
'y': 28.912,
'spatialReference': {
'wkid': 4326
}
}
}
]
featurelayer.edit_features(adds=feats)
I also do chunks with edit_features to avoid time outs. In a recent version of the python API they also added the future parameter which will allow asynchronous updates. I haven't had a chance to see what kind of effect that has on timeouts though.
Josh
Documentation says sanitize_columns is set to False by default, but I just tried it nonetheless. This was the code:
sdf = GeoAccessor.from_geodataframe(gdf)
lyr = sdf.spatial.to_featurelayer(title='test', sanitize_columns=False)
I got the following error:
try:
add_result = Target_Layer.layers[0].edit_features(adds=fset)
except:
for feature in fset:
try:
add_result = Target_Layer.layers[0].edit_features(adds=[feature])
except:
# some error message
I'll see how this acts once deployed to Azure.