Troubleshooting why a python script for searching our organization's content for feature class usage takes 3.5 hours to run when scanning web apps.

MDB_GIS · ‎05-05-2023

I've been doing some spring cleaning of our ArcGIS online organization. In doing so, I kept wanting a way to find out what maps and apps a feature service was used in. That lead me to this thread, which has been very helpful. I am attempting to use @Katie_Clark's excellent version from page two which is set up to run like a GP tool.

Here is the code:

from arcgis.gis import GIS
import pandas as pd

# Log in to portal; 'home' uses the credentials used to login within Pro
gis = GIS('home')

# Set up input parameters to use in the GUI
find_id = arcpy.GetParameterAsText(0)
search_type = arcpy.GetParameterAsText(1)

find_url = gis.content.get(find_id).url

if search_type == 'Web Map':
    arcpy.AddMessage("Searching for Web Maps. This could take a few minutes...")
    
    # Pull list of all web maps in portal
    webmaps = gis.content.search('', item_type='Web Map', max_items=-1)

    # Return subset of map IDs which contain the service URL we're looking for
    matches = [m.id for m in webmaps if str(m.get_data()).find(find_url) > -1]

    # Create empty list to populate with results
    map_list = []

    # Check each web map for matches
    for w in webmaps:

        try:
            # Get the JSON as a string
            wdata2 = str(w.get_data())

            criteria = [
                wdata2.find(find_url) > -1,  # Check if URL is directly referenced
                any([wdata2.find(i) > -1 for i in matches])  # Check if any matching maps are in app
            ]

            # If layer is referenced directly or indirectly, append map to list
            if any(criteria):
                map_list.append(w)

        # Some apps don't have data, so we'll just skip them if they throw a TypeError
        except:
            continue

    output = pd.DataFrame([{'title': m.title, 'id': m.id, 'type': m.type} for m in map_list])
    arcpy.AddMessage(f"OUTPUT TABLE: \n \n {output}")

if search_type == 'Web Application':
    arcpy.AddMessage("Searching for Web Applications. This could take a few minutes...")
    
    # Pull list of all web apps in portal
    arcpy.AddMessage("1")
    webapps = gis.content.search('', item_type='Application', max_items=-1)

    # Create empty list to populate with results
    arcpy.AddMessage("2")
    app_list = []

    # Return subset of map IDs which contain the service URL we're looking for
    arcpy.AddMessage("3")
    matches = [a.id for a in webapps if str(a.get_data()).find(find_url) > -1]
    
    # Check each web app for matches
    arcpy.AddMessage("4")
    for w in webapps:

        try:
            # Get the JSON as a string
            wdata = str(w.get_data())

            criteria = [
                wdata.find(find_url) > -1, # Check if URL is directly referenced
                any([wdata.find(i) > -1 for i in matches]) # Check if any matching maps are in app
            ]

            # If layer is referenced directly or indirectly, append app to list
            if any(criteria):
                app_list.append(w)

        # Some apps don't have data, so we'll just skip them if they throw a TypeError
        except:
            continue

    output = pd.DataFrame([{'title':a.title, 'id':a.id, 'type':a.type} for a in app_list])
    arcpy.AddMessage(f"OUTPUT TABLE:  \n \n {output}")

The tool does work. when searching for web maps, it takes less than a minute on average to run. The problem is if I use the option to search for web applications, it takes 3.5 hours to run. We are a pretty small organization with less than 500 items in our org, so that seems a bit excessive. Is there something I have done wrong in the code that could lead to this? Is there some way I can track what is causing this slow down? I'm very much a python journeyman, so implementing an error reporting method is outside my area of knowledge. Any help you can provide would be appreciated!

MDB_GIS · ‎05-05-2023

So I swapped to @jcarlson's version, just to see if I experienced the same issue, and I do. Tool is still taking a long time to run while seemingly looking through our web applications. I did add in some messages so I could see where it was stalling out. Here is the updated script:

from arcgis.gis import GIS
import pandas as pd

# Log in to portal; prompts for PW automatically
gis = GIS('home')
username = gis.properties.user.username
arcpy.AddMessage("Logged in as: " + username)

# Set up input parameters to us in the GUI
find_id = arcpy.GetParameterAsText(0)
find_url = gis.content.get(find_id).url
arcpy.AddMessage("Looking for all instances of " + find_url + "in organization.")

# Pull list of all web maps in portal
arcpy.AddMessage("Pulling list of all web maps in organization")
webmaps = gis.content.search('', item_type='Web Map', max_items=-1)

# Return subset of map IDs which contain the service URL we're looking for
arcpy.AddMessage("Grabbing maps that contain target feature class.")
matches = [m.id for m in webmaps if str(m.get_data()).find(find_url) > -1]

# Pull list of all web apps in portal
arcpy.AddMessage("Compiling all web apps in organization.")
webapps = gis.content.search('', item_type='Application', max_items=-1)

# Create empty list to populate with results
arcpy.AddMessage("Generating empty list to load results.")
app_list = []

# Check each web app for matches
arcpy.AddMessage("Checking each web app for matching feature class")
for w in webapps:
    
    try:
        # Get the JSON as a string
        wdata = str(w.get_data())

        criteria = [
            wdata.find(find_url) > -1, # Check if URL is directly referenced
            any([wdata.find(i) > -1 for i in matches]) # Check if any matching maps are in app
        ]

        # If layer is referenced directly or indirectly, append app to list
        if any(criteria):
            app_list.append(w)
    
    # Some apps don't have data, so we'll just skip them if they throw a TypeError
    except:
        continue

pd.DataFrame([{'title':a.title, 'id':a.id, 'type':a.type} for a in app_list])

It is stalling on the "Checking each web app for matching feature class message."

Katie_Clark · ‎05-05-2023

Hmmm, I'm not completely sure right away what might be causing the delay when searching for web apps specifically (since you say it performs normally when searching for web maps). However, wanted to let you know that my "normal" performance time is ~4-7 minutes when searching for web maps, and less than a minute when searching for web applications. My organization has 10,000+ items (~1600 web maps, ~350 web apps).

I'll keep thinking about this and let you know if I can think of anything else that might be contributing to the issue. However, Josh Carlson is super knowledgeable and might be able to offer some more insight. 🙂

Good luck!

Best,
Katie

If this answer helped you, please consider giving a kudos and/or marking as the accepted solution. Thanks!

MDB_GIS · ‎05-05-2023

Thank you! I've pinged him as well because I tried his version of the script and got the same issue. Since his defaults to searching apps, it also takes about 3.5 hours to run. We have less than 100 actual apps in our org, so I genuinely have no clue what could be causing this.