resultOffset and resultRecordCount queries timing out

155
0
03-19-2024 04:52 AM
Labels (1)
p-sin
by
New Contributor

Hi,

I'm very new to using these services. I am creating a data pipeline in Python which sources data from the UK ONS geoportal (https://geoportal.statistics.gov.uk/), specifically their geography boundary data, such as LSOAs in England and Wales (https://geoportal.statistics.gov.uk/datasets/bb427d36197443959de8a1462c8f1c55_0/explore)

I've used the ArcGIS API to grab data for all different geographies, mostly with <1,000 records. However, there are 32,000 LSOAs, which exceed the max record count and fall foul of the hardcoded 60 second timeout. Other datasets also took longer than 60 seconds to download.

To resolve this, I made use of the resultOffset and resultRecordCount to collect the data in batches (configured separately for each dataset). This has worked fine for every single one, except for the LSOAs above.

What I have noticed is that whilst I can grab the data in batches of 10, 100, 1,000 or whatever, each subsequent query takes longer. It's as if the query is actually getting records 0-Y and then returning records X-Y (where X is the offset and Y is the record count). So that as I iterate, and Y gets bigger, it's actually just pulling most of the records anyway and the connection times out.

Am I doing something wrong? I simply can not find a way to query the latter records in the dataset. This is the link to the API page: https://services1.arcgis.com/ESMARspQHYMw9BZ9/ArcGIS/rest/services/Lower_layer_Super_Output_Areas_20...

I can set the offset to 500 and record count to 4,500, and that completes in about 20 seconds. But setting to 20,000 and 24,000 respectively results in a timeout.

0 Kudos
0 Replies