Content Item class .dict built-in method doesn't return 'size' attribute

James_Whitacre_PGC · ‎09-09-2024

I am creating a table of all of my content items in ArcGIS Online and ArcGIS Enterprise. As a starting point, I am using the built-in vars() function to return all the Item attributes as a dictionary (.__dict__ can also be used with the same results). Here is my code (which is fast and works well):

# Log into ArcGIS Online
gis = GIS('home')

# Get all content items
qe = 'NOT owner:esri*'
items = gis.content.search(query=qe, max_items=-1, outside_org=False)

# Convert all Item class attributes to dictionary
items_dict = [vars(i) for i in items]

However, vars() does not return all the Item attributes listed in the ArcGIS REST API documentation for Item. Most notably, the 'size' attribute is not returned, and for my purposes, this attribute is important. Why is 'size' not returned using the vars() method?

I was able to use the following code to get it to work, but it went from less than three seconds to 6 minutes(!):

items_dict = [vars(i) for i in items if i.size >= 0]

By initiating the 'size' class attribute, it is then included, but only if it is initiated like this.

I also have other code (see below; sourced from Managing ArcGIS Online Content with ArcGIS Dashboards and ArcGIS Notebooks) that returns a smaller subset of Item property information, including 'size', but fewer attributes. But, it doesn't work on ArcGIS Enterprise...Any help would be appreciated.

#From the REST URL, request all items in the organization. 
#Create a dictionary of the results, and print out the output. 
url = f'{gis.url}/sharing/rest/content/portals/{gis.properties.id}'

params = {
    'f': 'csv',
    'token': gis._portal.con.token
}

#Get a string response from the request and construct a DataFrame
csv_out = requests.get(url, params=params).text
df = pd.read_csv(io.StringIO(str(csv_out)))

James_Whitacre_PGC · ‎09-12-2024

Below is the final code for my best shot at a higher performing script to complete what I was looking for regarding this issue. Note that I have only tested this in ArcGIS Online Notebook and not on a Notebook in ArcGIS Pro. I did have issues with 'token' parameter working in ArcGIS Pro, but was able to get it to work when I used arcpy.GetSigninToken() function instead (not shown in script). This still does not fix the root issue, but it has far better performance than my earlier versions and other suggestions. for me, this code is taking less than 5 seconds to execute on about 1,300 AGO items. I also added some helper code to clean things up and add some other values.

# Imports
from arcgis.gis import GIS, Item, User
import csv
import datetime
import io
import os
import pandas as pd
import requests

# Connect to ArcGIS Online
gis = GIS("home")

# Create the '.../sharing/rest/content/portals/...' REST URL to request all items in the organization as a CSV 
gis_url = gis.url if gis.properties.isPortal else f'https://{gis.properties.urlKey.lower()}.maps.arcgis.com'
content_url = f'{gis_url}/sharing/rest/content/portals/{gis.properties.id}'
params = {'f': 'csv', 'token': gis._portal.con.token}

# Get the request string response
csv_out = requests.get(content_url, params=params).text

# Create Pandas DataFrame from the CSV data
df_csv = pd.read_csv(io.StringIO(str(csv_out)))

# Get all content items
qe = 'NOT owner:esri*'
items = gis.content.search(query=qe, max_items=-1, outside_org=False)

# Convert all Item class attributes to dictionary
items_dict = [vars(i) for i in items]

# Create a Pandas dataframe of all Items
df_items = pd.DataFrame.from_dict(items)

# Convert date fields from UNIX millisecond units to datetime
date_fields = ['created', 'modified', 'lastViewed']
for field in date_fields:
    if field in df_items.columns:
        df_items[field] = pd.to_datetime(df_items[field], unit='ms', origin='unix')

# Add and calculate the GIS URL field
df_items['gisUrl'] = gis_url

# Merge data frames
df_merge = pd.merge(df_items.drop(columns='size'), df_csv[['id', 'fullname', 'size']], on='id', how='left')

df_merge.fillna('None', inplace=True)

items_dict = df_merge.to_dict(orient='records')

View solution in original post

EarlMedina · ‎09-09-2024

A similar question was asked a while back: Solved: Populating dataframe with item search results - va... - Esri Community

Vars did not work well for them either (also took a long time). I can't say what the problem might be, but a list comprehension seemed to do the trick for them.

[{"id": item.id, "size": item.size, "type": item.type} for item in items]

James_Whitacre_PGC · ‎09-10-2024

@EarlMedina this doesn't really solve the problem at the core. It is just a workaround, and just as I and @JillianStanford both note, it takes way longer to get the size attribute, which is frankly unacceptable. There must be a quicker solution to get all the Item class attributes.

In the Python documentation for __dict__, it mentions that the dictionary will only return 'an object’s (writable) attributes'. So, size isn't 'writable', apparently? Why might this be? Maybe because the size likely refers to the 'data' attribute size, so it may need to go an extra step to initiate that value and therefore isn't 'natively' writeable to the Item class? I am not sure, just using logic of how the class might be setup. It would be nice to get a ArcGIS API for Python developer to shed some light on this...

EarlMedina · ‎09-10-2024

I would recommend you log an issue here: Issues · Esri/arcgis-python-api · GitHub

Myself, I've never encountered this problem in the wild and don't know why it's slow to populate/missing for some people. I do know that the _hydrate method is involved in the population of that property and yes, calling this method should involve an additional call to get item information. It's possible the logic in the _hydrate method needs to be adjusted for the size property to be properly set in all items, or perhaps the REST endpoint is not reliably returning the size information for certain items. It looks like the supporting get_item_data method points to this endpoint: /data: Item Data | ArcGIS REST APIs | ArcGIS Developers

Do you get similar results when making a request to that endpoint? If you don't, then that at least narrows down the problem to the code.

James_Whitacre_PGC · ‎09-12-2024

Below is the final code for my best shot at a higher performing script to complete what I was looking for regarding this issue. Note that I have only tested this in ArcGIS Online Notebook and not on a Notebook in ArcGIS Pro. I did have issues with 'token' parameter working in ArcGIS Pro, but was able to get it to work when I used arcpy.GetSigninToken() function instead (not shown in script). This still does not fix the root issue, but it has far better performance than my earlier versions and other suggestions. for me, this code is taking less than 5 seconds to execute on about 1,300 AGO items. I also added some helper code to clean things up and add some other values.

# Imports
from arcgis.gis import GIS, Item, User
import csv
import datetime
import io
import os
import pandas as pd
import requests

# Connect to ArcGIS Online
gis = GIS("home")

# Create the '.../sharing/rest/content/portals/...' REST URL to request all items in the organization as a CSV 
gis_url = gis.url if gis.properties.isPortal else f'https://{gis.properties.urlKey.lower()}.maps.arcgis.com'
content_url = f'{gis_url}/sharing/rest/content/portals/{gis.properties.id}'
params = {'f': 'csv', 'token': gis._portal.con.token}

# Get the request string response
csv_out = requests.get(content_url, params=params).text

# Create Pandas DataFrame from the CSV data
df_csv = pd.read_csv(io.StringIO(str(csv_out)))

# Get all content items
qe = 'NOT owner:esri*'
items = gis.content.search(query=qe, max_items=-1, outside_org=False)

# Convert all Item class attributes to dictionary
items_dict = [vars(i) for i in items]

# Create a Pandas dataframe of all Items
df_items = pd.DataFrame.from_dict(items)

# Convert date fields from UNIX millisecond units to datetime
date_fields = ['created', 'modified', 'lastViewed']
for field in date_fields:
    if field in df_items.columns:
        df_items[field] = pd.to_datetime(df_items[field], unit='ms', origin='unix')

# Add and calculate the GIS URL field
df_items['gisUrl'] = gis_url

# Merge data frames
df_merge = pd.merge(df_items.drop(columns='size'), df_csv[['id', 'fullname', 'size']], on='id', how='left')

df_merge.fillna('None', inplace=True)

items_dict = df_merge.to_dict(orient='records')

Content Item class .__dict__ built-in method doesn't return 'size' attribute

Content Item class .dict built-in method doesn't return 'size' attribute