Hi Josh,
I basically created items using group.content() as a test of a small sample (9 items).
But in reality, I need to collect usage of last year, last 6 months, and last 60 days for 4K items. When I do so synchronously, it takes almost 1 minute per item, which means more than 2 days to finish 4K items.
This is the code snippet:
import dask.bag as dbag
from arcgis.gis import GIS, Group
from datetime import datetime
# user defined function
# basically it returns a dictionary of usage metrics for each item
def get_usage_dct(i):
today = datetime.now().strftime('%Y-%m-%d')
i_usage_df_1y = i.usage(date_range='1Y', as_df=True)
i_usage_df_1y = i_usage_df_1y[i_usage_df_1y['Date'] < today]
i_usage_df_6m = i.usage(date_range='6M', as_df=True)
i_usage_df_6m = i_usage_df_6m[i_usage_df_6m['Date'] < today]
i_usage_df_2m = i.usage(date_range='60D', as_df=True)
i_usage_df_2m = i_usage_df_2m[i_usage_df_2m['Date'] < today]
return {'item_name':i.title,'usage_count_1Y':i_usage_df_1y['Usage'].sum(),'usage_count_6m':i_usage_df_6m['Usage'].sum(),'usage_count_2m':i_usage_df_2m['Usage'].sum()}
# define the GIS object
mapit = GIS("home")
# getting items of a specific group (9 items for this example)
g = Group(mapit,'xxxxxxxxx')
itms = g.content()
# create items bag from the itms list
itms_bag = dbag.from_sequence(itms, npartitions = 9)
# this line is causing the error
final_bag = dbag.map(lambda x: get_usage_dct(x), itms_bag).compute()
I then added the scheduler attribute in compute in the last line as:
final_bag = dbag.map(lambda x: get_usage_dct(x), itms_bag).compute(scheduler="threads")
But now it gave me
Exception: Too many requests. Please try again later.
(Error Code: 400)
Which I believe is related to API limits since the item usage method is basically an API call. Please correct me if I am wrong.
I hope you have a good advice 🙂
Thank you