How can I query more than 10,000 users at a time from my Portal?

1713
4
11-04-2021 03:16 PM
EricaPfister
New Contributor III

I need to pull a list of all our Portal users. There are a couple of Python tools we have built that use this full list, and then iterate through each user to check for various characteristics.

However, now that our Portal has over 11k users we have noticed that the previous code (snippet below) is no longer pulling the complete list. It pulls only 10,000 records, even though the max_users parameter is set to 99,999.

user_list = gis.users.search(query='!admin', max_users=99999)

https://developers.arcgis.com/python/api-reference/arcgis.gis.toc.html#arcgis.gis.UserManager.search does not mention any sort of start parameter which I could potentially use to paginate, just the max_users option. Is there something that I can do which will let me query that full list of more than 10,000 users? I can switch over to using the REST API if I absolutely must, but it's surprising/disappointing to me that I can do almost anything else through the ArcGIS API for Python.

Tags (2)
0 Kudos
4 Replies
LongDinh
Occasional Contributor II

Hi @EricaPfister,

It seems that arcgis module has not implemented pagination yet. According to the User Search API, you can search from an index using the start parameter which is used to paginate the search result.

The API reference notes are a bit vague, but to search for 100,000 users or until the max users is reached, it would go something like this:

 

 

 

 

import requests

# Get your token
token = getToken() # A function to retrieve an Admin Portal Token to access the API

# Payload init values
query = ""
start = 1
num = 50
sortField = 'username'
returned_total = 1000

total_users_to_return = 100000

user_search_api = "https://.../arcgis/sharing/rest/communitiy/user"

# Search through entire portal for users until the total number returend
next_start = True
user_count = 0

results = []
_next_start = None
while next_start and user_count <=total_users_to_return:
    if _next_start is not None:
        start = _next_start
    payload = {
        'q':query,
        'start':start,
        'sortField': sortField,
        'num':num,
        'f':'json',
        'token':token
    }
    try:
        response = requests.post(
            user_search_api,
            data=payload
        )

        resp_json = response.json()
        # Add the result to the results
        if resp_json.get('results'):
            results += resp_json.get('results')

        # Check while conditions
        user_count += resp_json.get('total')

        if resp_json.get('nextStart'):
            _next_start = resp_json.get('nextStart')
        else:
            # Stop the loop
            next_start = False
        
    except Exception as e:
        print (e)
        print (f"Failed to get start index: {start}")
        continue
 

print (f"Retrieved {len(results)} users.")

 

 

 

 

jcarlson
MVP Esteemed Contributor

@LongDinh  has the right kind of solution, I think. But to elaborate on the problem, this appears to be a limitation that's baked in, per the REST API docs:

... if the total number is greater than this value, 10000 will be returned. The top 10000 query results are available to retrieve via pagination.

- Josh Carlson
Kendall County GIS
LongDinh
Occasional Contributor II

Good catch @jcarlson. I misinterpreted that as 10,000 for the num parameter rather than query result.

So perhaps manipulating the query parameter to return <10,000 results would work. For example, you could query a date range (by month) from DatetimeBefore until DatetimeNow which would hopefully return <10,000 at each iteration.

 

Tags (2)
0 Kudos
HenryLindemann
Esri Contributor

Hi @EricaPfister , this is how you do it in ArcGIS API 

I don't use the user.search because if there is a corrupt profile the whole query fails.  

This script will iterate trough each user and skip broken profiles.

import arcgis

url = "https://url.com/portal"
admin_user = 'username'
password = 'password'


con = arcgis.gis.GIS(url, admin_user, password, verify_cert=True)
user_manager = arcgis.gis.UserManager(con)

users_count = user_manager.advanced_search(query="0123456789ABCDEF AND !_esri", return_count=True)
for user_num in range(1, users_count+1):
try:
user = user_manager.advanced_search(query="0123456789ABCDEF AND !_esri", start=user_num, max_users=1)
print(user)
print(user['results'][0].username)
except Exception as e:
print(f"User {user_num} failed")

Hope it Helps 

Regards

Henry