Notebook within ArcGIS online times out before script completes

1302
3
Jump to solution
03-08-2021 03:08 AM
Labels (1)
BenJackson
New Contributor III

I have a script that pulls a fairly large vector file and performs a query on it in order to access the contents of the table so that I can run a series of calculations based on pandas. However, the dataset is so big that it takes too long to run the script, and therefore the notebook timesout before it completes. Is there a way of increasing the time before the notebook timesout, or is there a more efficient way than my code below which can be used to extract the table from a vectorfile and converting it into a pandas dataframe?

LU = gis.content.get("x")
Layer_LU = LU.layers
Layer_LU = LU.layers[0].query()
Layer_LU = Layer_LU.sdf

#Insert pandas operations here
0 Kudos
1 Solution

Accepted Solutions
jcarlson
MVP Esteemed Contributor

If it's a server-based service (i.e., not hosted), you should be able to raise the timeout values on the service via Server Manager under Pooling.

jcarlson_0-1615212909273.png

If it's a hosted service or you don't have admin access to the service, you might try either of the following:

 

Limiting the Query()

There are a few parameters you can include on your query to limit the amount of data being transferred. One easy refinement is to specify only those fields which you need to work with in Pandas, using out_fields=['field1', 'field2', ...].

I'm assuming you want the geometry, but a query will return results much faster if return_geometry=False is set.

 

Working in Chunks

A couple of other parameters worth noting on the query are result_record_count and result_offset, which must be paired with return_all_records=False in order to work. You could perform multiple queries and then merge the resulting dataframes together. It might look something like this:

 

lyr = gis.content.get('x').layers[0]
fields = ['the', 'fields', 'you', 'need']

# Return the total number of records
record_count = lyr.query(return_count_only=True)

# Set a chunk size, adjust as needed for performance / timeout
chunk_size = 10000

# Start with an empty dataframe
out_df = pandas.DataFrame()

n = 0

# Perform chunked queries and append resulting dataframes to out_df
while n <= record_count:
    out_df = out_df.append(
        lyr.query(
            out_fields=fields,
            result_record_count=chunk_size,
            result_offset=n,
            return_all_records=True,
            as_df=True),
        ignore_index=True)

    n += chunk_size

out_df

 

I'm also using as_df=True in the query, which returns your query directly as a dataframe.

Use one or both methods to make your query / queries return results quicker, and hopefully you won't run into any timeouts.

- Josh Carlson
Kendall County GIS

View solution in original post

3 Replies
jcarlson
MVP Esteemed Contributor

If it's a server-based service (i.e., not hosted), you should be able to raise the timeout values on the service via Server Manager under Pooling.

jcarlson_0-1615212909273.png

If it's a hosted service or you don't have admin access to the service, you might try either of the following:

 

Limiting the Query()

There are a few parameters you can include on your query to limit the amount of data being transferred. One easy refinement is to specify only those fields which you need to work with in Pandas, using out_fields=['field1', 'field2', ...].

I'm assuming you want the geometry, but a query will return results much faster if return_geometry=False is set.

 

Working in Chunks

A couple of other parameters worth noting on the query are result_record_count and result_offset, which must be paired with return_all_records=False in order to work. You could perform multiple queries and then merge the resulting dataframes together. It might look something like this:

 

lyr = gis.content.get('x').layers[0]
fields = ['the', 'fields', 'you', 'need']

# Return the total number of records
record_count = lyr.query(return_count_only=True)

# Set a chunk size, adjust as needed for performance / timeout
chunk_size = 10000

# Start with an empty dataframe
out_df = pandas.DataFrame()

n = 0

# Perform chunked queries and append resulting dataframes to out_df
while n <= record_count:
    out_df = out_df.append(
        lyr.query(
            out_fields=fields,
            result_record_count=chunk_size,
            result_offset=n,
            return_all_records=True,
            as_df=True),
        ignore_index=True)

    n += chunk_size

out_df

 

I'm also using as_df=True in the query, which returns your query directly as a dataframe.

Use one or both methods to make your query / queries return results quicker, and hopefully you won't run into any timeouts.

- Josh Carlson
Kendall County GIS
BenJackson
New Contributor III

Thank you, this is actually really useful. I don't need the geometry data so that saves a load of time processing. Everything seems to work nicely now.

MikeSlattery
Occasional Contributor

+1, great answer.

0 Kudos