Map performance and org query maximum request rate - assessing volume of requests?

ChristopherCounsell · ‎03-09-2023

Hi,

ArcGIS Online subscriptions have an org query maximum request rate. A shared 10K for standard organizations up to 96K for premium data stores (M4) per min.

Are there service-specific rate limits?
Is there a way we can identify if we reach this org max request rate? Is it indicated in the x-esri-org-request-units-per-min response header?
What exactly happens if the maximum request rate is reached?

Also looking for suggestions or tools that can assess map performance in browser or field maps. Anything generally useful would be great, but if they also review aggregated RU and RU when the map is loaded (to interpret caching) that would be amazing.

@GlenShepherd if you can reach out to some gurus via Chatter that would be amazing 🙂

Cheers,

MarianneFarretta · ‎03-10-2023

Hello, I'm the Product Manager for Feature Data Store and Map Viewer; thank you for a great question!

This limit is not quite what it appears at face value--the "query" limit does not mean a specific number of queries. Rather, it means a specific number of Request Units, which is a system we invented to tell our backend systems when the database is being asked to perform more work than it can comfortably handle. It is not a one-to-one relationship to actual queries; different query types are weighted and the query rate limit is based on the cumulative "weight" with which the Feature Data Store is being tasked.

This limit is defensive--it's like the red line on your vehicle engine. It's not a proactive design tool; we use it to avoid overwhelming the database to the point of shutting it down. You can watch a parallel metric in your Feature Data Store usage chart, which shows you the draw on your database compute resources. This is the tool I recommend be used for design and planning. We expect to lift this chart into orgs using Standard Feature Data Store later this year.
Tracking your ArcGIS Online Feature Data Store Key Health Indicators

Once the organization's query rate limit is reached, a one-minute "cooling off" period is triggered during which non-repeatable queries (i.e. those which cannot be answered by a cache) are responded to with a 429 error. This gives the database time to cool off, and interrupts clients with a cautionary response.

The limit is in the header--and though it does apply to the entire organization, it is defined by the size Feature Data Store you are using. The query rate limit increases with Feature Data Store size.

I'd like to understand more about your ideas regarding map performance. Feature Data Store is only one aspect of map performance, and its correlation to feature service performance is best reviewed using the usage chart. I'm interested to know what kind of performance you want to track, who in your org should see it, how it would be accessed--lots of questions!

I'd love to schedule a chat for follow-up if you're available; you're welcome to send me your email and we can set something up.

Thank you,
Marianne

Marianne Farretta

Product Manager | ArcGIS Online
Map Viewer | Premium Feature Data Store
mfarretta@esri.com

View solution in original post

MarianneFarretta · ‎03-10-2023

Hello, I'm the Product Manager for Feature Data Store and Map Viewer; thank you for a great question!

This limit is not quite what it appears at face value--the "query" limit does not mean a specific number of queries. Rather, it means a specific number of Request Units, which is a system we invented to tell our backend systems when the database is being asked to perform more work than it can comfortably handle. It is not a one-to-one relationship to actual queries; different query types are weighted and the query rate limit is based on the cumulative "weight" with which the Feature Data Store is being tasked.

This limit is defensive--it's like the red line on your vehicle engine. It's not a proactive design tool; we use it to avoid overwhelming the database to the point of shutting it down. You can watch a parallel metric in your Feature Data Store usage chart, which shows you the draw on your database compute resources. This is the tool I recommend be used for design and planning. We expect to lift this chart into orgs using Standard Feature Data Store later this year.
Tracking your ArcGIS Online Feature Data Store Key Health Indicators

Once the organization's query rate limit is reached, a one-minute "cooling off" period is triggered during which non-repeatable queries (i.e. those which cannot be answered by a cache) are responded to with a 429 error. This gives the database time to cool off, and interrupts clients with a cautionary response.

The limit is in the header--and though it does apply to the entire organization, it is defined by the size Feature Data Store you are using. The query rate limit increases with Feature Data Store size.

I'd like to understand more about your ideas regarding map performance. Feature Data Store is only one aspect of map performance, and its correlation to feature service performance is best reviewed using the usage chart. I'm interested to know what kind of performance you want to track, who in your org should see it, how it would be accessed--lots of questions!

I'd love to schedule a chat for follow-up if you're available; you're welcome to send me your email and we can set something up.

Thank you,
Marianne

Marianne Farretta

Product Manager | ArcGIS Online
Map Viewer | Premium Feature Data Store
mfarretta@esri.com

ChristopherCounsell · ‎03-15-2023

Hi Marianne,

I've sent you an email - please let me know if you've had any issues receiving it.

What I've found is that the health chart is not an accurate reflection of the usage limit.
If the [x-esri-org-request-units-per-min: 'usage'] is the current total across the org, we occasionally see high responses (50000-70000) on an M4 database. The health chart sticks at below 10% during this time, I'd assume as a result of the aggregation method and low-usage responses when the minute expires.

To me, if we hit 50-70k, it's plausible to hit 96k, and there's no guarantee the requests will not continue after the 1min timeout. So high RU usage should be better reflected in the organization health chart as it's essentially going to make the org unavailable intermittently.

It's clear that we need to reduce this usage to avoid reaching timeouts but it's difficult to initiate and direct the change required when the org. appears 'healthy' and there's been confirmed server-side outages (tendency to just blame Esri for everything, be it client or server side based issues).

Moving forward:

I wrote a script to ping an empty service at the database level every minute and record the response headers. Will be interesting to compare this against our health chart and assess usage.
Am placing an enhancement request with support to better illustrate RU consumption across a service (ArcGIS Idea here)
Looking for a method of capturing a 429 response, preferably client-side so we know why a webhook failed or layer failed to load in a web map (not sure if my script will capture this properly)
Looking for tools to assess RU usage of Web Maps and other service interactions. Currently loading a map and loosely counting the queries + RU incurred through network traffic. Surely there's a better way to capture this + do a refresh to assess caching.

For others interested - check out this resource (thanks Linder from support for the latest link)

https://mediaspace.esri.com/media/t/1_oznrvijl

Thanks

tmatiques_bouyant · ‎04-06-2023

Hi Marianne,

I would like to know if these request units refer only to external API calls (i.e., via FME HTTPCaller) or do they also get triggered when a user simply pans around a map viewer map, selects a feature, edits an attribute. Also, does this exist for feature services used in ArcGIS Pro as well?

We have a fairly small / moderate size user base (maybe 50 - 80 people). Most of these users are browsing a few different map viewer maps, around 20 are creating and editing content in ArcGIS Pro and around 20-30 are editing attributes in ArcGIS Field Maps.

We get this error response a lot or the performance is abysmal during peak times.

ChristopherCounsell · ‎04-06-2023

It's from any source, querying feature layers hosted in your ArcGIS Online organisation. depending on a few factors it could make more than one query as part of the interaction. For example, planning a map would make several queries across spatial tiles. Loading records in FME paginated the requests, so pulling 3000 records makes 3 requests.

How resource units are incurred also varies heavily. You can use caching and the CDN to have no units used. Other requests like 'past 24 hours' always hit the database.

I'd strongly recommend watching the developer summit videos on hosted feature layers and map optimisation. The content on highly scalable or viral apps is also really good, although largely focused on publicly shared items.

https://mediaspace.esri.com/media/t/1_oznrvijl

Edits do not contribute to this max org query rate limit.

I'd recommend reducing automated interactions - i.e. FME - during data collection activities. You can also optimise workbenches. Put in delays, run asynchronously, etc.

ChristopherCounsell · ‎04-02-2023

To wrap this up:

A 429 error is returned within the response data
Inconclusive outcome for our org where it's strongly evident we should have been getting 429s in responses over a 1-2 years period, but did not, up until a week ago.
Esri improved the AGOL feature data store health charts to reflect query volumes
Only service-specific tool is usage (item details page or .usage() python api). However not clear if these request incur RU, or if they show requests for when the max org query rate is limited.
Esri did a great job applying enhancements and bug fixes around this, and have been open to feedback.

For the majority of readers this will only be relevant if you have public / viral apps. Use the CDN, properly!