Select to view content in your preferred language

What is the best way to get Wildcard title searches from a GIS object? Trying to generate a report.

1684
4
Jump to solution
11-15-2022 08:38 AM
Labels (2)
AaronManuel2
New Contributor III

I'm using the arcgis api for Python to generate a report that shows all of our content on arcgis online. We are particularly interested in having the usage stats on here. Its a pretty basic script tbh.

mygis = GIS("https://mygis.maps.arcgis.com","Username", "Password", **exp_time )

content = mygis.content.search(query="", sort_field="title", sort_order="asc", max_items=100)

def main():

    outpath = Path(r"C:\\GIS\\output\\agol\\testagolreport.csv")
    content_run = content

    with outpath.open("a", newline='') as outfile:

       # Loop through content items and write to CSV, etc.

Because we have so much content I am unable to run my script for all the content at once. I have to break up my script into chunks, so I was thinking something like [A*,B*,C*,D*], [E*,F*,G*,H*], etc.

But the way AGOL / Portal use Elasticsearch, I can't search for items by doing a wildcard title search for the first letter or whatever. If you search for 'EAST*', It returns items with the word East anywhere in title, not just at the beginning. 

I am guessing there is a way to solve this using list comprehension or filtering somehow. Should I get all the matching items from the initial elasticsearch query, then filter them out? I found this snippet on an older post:

searched_items=[item for item in gis. content.search(query="title: base_map", item_type="Map Service", max_items=20 if item.title=="base_map"]

 

Couldn't get it to work. It seems inefficient to have to do things this way. Fuzzy search queries work fine at the interface level but for generating reports its a real pain, maybe I'm missing something. Anyone figured this out and have a snippet they can share?

Thanks!

0 Kudos
1 Solution

Accepted Solutions
JosephRhodes2
Occasional Contributor II

Hey @AaronManuel2,

The following should work, let me know if you run into issues or have questions. It searches by user, which helps sidestep the 10,000-item limitation (unless a single user has more than 10,000 items). It assumes that you only want to find items that *start with* your search term.

 

from arcgis import GIS
import csv
import time

agol_username = ''								# change
agol_password = ''								# change
search_term = 'EAST'							# change
output_csv = 'C://my_directory//my_csv.csv'		# change
##############################

mygis = GIS('https://arcgis.com', agol_username, agol_password)

items = []

user_list = mygis.users.search(query='*', max_users=10000)

for user in user_list:
	print(f'Inventorying items for user {user.username}')
	user_items = gis.content.search(query=f'owner:{user.username}', max_items=10000)
	for item in user_items:
		if item.title.lower().startswith(search_term.lower()):
			items.append(item)

with open(output_csv, 'w', encoding='utf-8') as file:
	csvfile = csv.writer(file, delimiter=',', lineterminator='\n')
	csvfile.writerow(["ID",  				# these are the headers; modify according to whatever properties you want in your report
					"Title",
					"Name",
					"Owner",
					"Sharing",
					"Type",
					"Created",
					"Modified",
					"Size",
					"Views",
					"URL",
					"Tags",
					"Description",
					"Summary",
					"Terms",
					"Spatial Reference"
					])	
	
	for item in items:
		csvfile.writerow([item.id,  		# modify according to whatever properties you want in your report
                        item.title,
                        item.name,
                        item.owner,
                        item.access,
                        item.type,
                        time.strftime('%m/%d/%Y', time.localtime(item.created/1000)),
                        time.strftime('%m/%d/%Y', time.localtime(item.modified/1000)),
                        round(item.size/1000),
                        item.numViews,
                        item.url,
                        item.tags,
                        item.description,
                        item.snippet,
                        item.licenseInfo,
                        item.spatialReference
                        ])	

 

 

 

 

View solution in original post

4 Replies
JosephRhodes2
Occasional Contributor II

Hey @AaronManuel2,

The following should work, let me know if you run into issues or have questions. It searches by user, which helps sidestep the 10,000-item limitation (unless a single user has more than 10,000 items). It assumes that you only want to find items that *start with* your search term.

 

from arcgis import GIS
import csv
import time

agol_username = ''								# change
agol_password = ''								# change
search_term = 'EAST'							# change
output_csv = 'C://my_directory//my_csv.csv'		# change
##############################

mygis = GIS('https://arcgis.com', agol_username, agol_password)

items = []

user_list = mygis.users.search(query='*', max_users=10000)

for user in user_list:
	print(f'Inventorying items for user {user.username}')
	user_items = gis.content.search(query=f'owner:{user.username}', max_items=10000)
	for item in user_items:
		if item.title.lower().startswith(search_term.lower()):
			items.append(item)

with open(output_csv, 'w', encoding='utf-8') as file:
	csvfile = csv.writer(file, delimiter=',', lineterminator='\n')
	csvfile.writerow(["ID",  				# these are the headers; modify according to whatever properties you want in your report
					"Title",
					"Name",
					"Owner",
					"Sharing",
					"Type",
					"Created",
					"Modified",
					"Size",
					"Views",
					"URL",
					"Tags",
					"Description",
					"Summary",
					"Terms",
					"Spatial Reference"
					])	
	
	for item in items:
		csvfile.writerow([item.id,  		# modify according to whatever properties you want in your report
                        item.title,
                        item.name,
                        item.owner,
                        item.access,
                        item.type,
                        time.strftime('%m/%d/%Y', time.localtime(item.created/1000)),
                        time.strftime('%m/%d/%Y', time.localtime(item.modified/1000)),
                        round(item.size/1000),
                        item.numViews,
                        item.url,
                        item.tags,
                        item.description,
                        item.snippet,
                        item.licenseInfo,
                        item.spatialReference
                        ])	

 

 

 

 

AaronManuel2
New Contributor III

Hey Joseph, thanks for this. Seems to work but I was not able to get it to complete, got a 'too many requests' error once it had been running for about 30 seconds.

The reason I'm having to do this in the first place is that Esri temporarily blocked my account for making too many api calls, so I'm trying to find the best work around possible by breaking things down into smaller pieces.

Your script has given me the idea though to just do a per-user report. I didn't want to have to do it that way but that might be the best way to make this work. In addition to content we have a lot of users, but its really only a few dozen that are creating most of the content.

0 Kudos
JosephRhodes2
Occasional Contributor II

Hey Aaron,

I'm not sure what Esri's usage limits are, but another option could be to sleep for a few minutes between users:

for user in user_list:
	print(f'Inventorying items for user {user.username}')
	user_items = gis.content.search(query=f'owner:{user.username}', max_items=10000)
	for item in user_items:
		if item.title.lower().startswith(search_term.lower()):
			items.append(item)
	time.sleep(300)

 

AaronManuel2
New Contributor III

Thanks Joseph, I'll try that as well. I think breaking it up by user and putting in some sleep time to slow down the requests will get me there. Appreciate the help

0 Kudos