Hi,
I am trying to write a python script which will calculate a field data from one table to another table in a hosted feature service which is hosted in AGOL. The both table has 90K+ records and has a joinID field. which I use as query to calculate the data. I am using below code -
from arcgis import *
username = "user"
password = "password"
gis = GIS("https://www.arcgis.com", username=username, password=password)
# Get the feature service item ID
feature_service_item_id = "item_id"
# Access the feature service item
feature_service_item = gis.content.get(feature_service_item_id)
feature_service_tables = feature_service_item.tables
feature_service_layers = feature_service_item.layers
data_table = None
feat_layer = None
joinTable = None
for table in feature_service_tables:
if table.properties.name == 'feat_DataTable':
data_table = table
print("Table found: {}".format(data_table.properties.name))
elif table.properties.name == 'feat_linear_interpolation':
joinTable = table
print("Table found: {}".format(joinTable.properties.name))
for layer in feature_service_item.layers:
if layer.properties.name == 'featurelayer':
feat_layer = layer
print("Layer found: {}".format(feat_layer.properties.name))
query = "year = 1995"
result_set = joinTable.query(where=query,out_fields = ["joinID,year,F01_human_field"])
for feature in result_set.features:
joinID = feature.attributes["joinID"]
year = feature.attributes["year"]
dataValue = feature.attributes["F01_human_field"]
innerquery = "joinID = "+joinID
data_table.calculate(where=innerquery,
calc_expression={"field": "Data_Value", "value": dataValue})
data_table.calculate(where=innerquery,
calc_expression={"field": "year", "value": year})
the code is working but the issue is it is taking 25 minutes to calculate 3000 features which is impractical. is there any better approach to make the calculation efficient? Any code suggestions would be highly appreciated. I haven't find any method in API reference which batch calculate the whole lot in one go.
Thanks is advance.
Regards,
Tauhid
You could try storing the results of your initial query to a python dictionary, then using the FeatureLayer.edit_features() to modify the existing records you wish to perform the calculations on.
For one, you can store references to the tables / layers directly.
data_table = feature_service.tables[0] # or whatever the index is
The layer index will not change unless the service is republished with drastic changes.
And as @Raul suggests, storing the contents of the tables in some temporary object will make this much easier. Personally, I would suggest using pandas DataFrames.
table_df = data_table.query(as_df=True)
layer_df = feat_layer.query(as_df=True)
join_df = table_df.merge(
layer_df,
on = 'joinID'
)
You can use all the usual methods in pandas to join, update, calculate, filter, etc., and prepare the data, but utilizing local resources, rather than sending many requests to your server.
Once you have your join_df DataFrame updated as you need, you can truncate the join table and apply the new values to the service endpoint using:
joinTable.manager.truncate()
joinTable.edit_features(adds=join_df.spatial.to_featureset())
Or if you like, you can query the join table, join that in, and use the ObjectID field to submit the dataframe as a series of updates instead of adds.
For reference, we run a number of scripts like this against very large datasets, and none of them take more than a few minutes to run.
Hi Josh,
I have tried to adopt a code by your suggestion it looks below -
from arcgis import GIS
import pandas as pd
portal_url = "portalarcgis"
username = "user"
password = "pass"
gis = GIS(portal_url, username, password)
service_name = 'service_name'
items = gis.content.search(query=service_name, item_type='Feature Service')
if len(items) == 0:
print(f"No feature service named '{service_name}' found.")
exit()
feature_service = items[0]
tables = feature_service.tables
data_table = None
join_table = None
for table in feature_service_tables:
if table.properties.name == 'DataTable':
data_table = table
print("Table found: {}".format(data_table.properties.name))
elif table.properties.name == 'linear_interpolation_20230612':
join_table = table
print("Table found: {}".format(join_table.properties.name))
if data_table is not None and join_table is not None:
data_table_df = county_data_table.query(as_df=True)
join_table_df = join_table.query(as_df=True, out_fields = ["joinid, year, farm_kg_sqkm"])
join_table_df = join_table_df.rename(columns={'year': 'year_join'})
join_df = county_data_table_df.merge(join_table_df, on='joinid', how='inner')
try:
join_df['year'] = join_df['year_join']
join_df['data_value'] = join_df['farm_kg_sqkm']
data_table_df['year'] = join_df['year_join']
data_table_df['data_value'] = join_df['data_value']
updates = data_table_df.to_dict('records')
# Update the feature service table
data_table.edit_features(updates=updates)
print("Feature Service table Updated!!!")
except Exception as e:
print(str(e))
I tried to get the tables directly but it was giving me none exception.
so this code does what I want within the dataframe level but when it goes to
data_table.edit_features(updates=updates)
it works for about the a minute and prints the message that the Feature service table updated. Actually the calculation didn't happened. in the api documentation it says if the layer is more 250+ then should avoid the edit_features method. I tried to truncate first then add the features but no luck. is there any other method should I use?
Thanks in Advance.
Regards,
Tauhidul Islam
GIS Consultant
TT Coffey Australia