I have a large dataset on AGOL, and shared on Hub as a hosted feature layer. When attempting to download as a filtered shapefile, my download processes for hours. This download should be less than 5mb zipped. At the current rate of 360 bytes a second (which is crazy slow) the download should complete in roughly 3 hours. My download has been processing for at least 10 hours.
Would someone mind testing the download? Unfortunately the URL with geometry parameter won't zoom you to the location, so if you test please pick a small area: a city block.
Wow, I didn't realize ArcGIS Open Data had been renamed! Neato.
As for your issue, are you sure the app supports filtering by spatial location? It might be taking a long time because it's trying to download all 600,000+ features.
We've just hit a backup with our download processing queue. We've added more capacity and regular service should be restored shortly. Please note that even if you are requesting small portion of a large file, the filtered download could still take a long time to complete. The time to create a filtered download is proportional to the size of the original dataset AND the size of the filter.
Software Engineer | ArcGIS Hub
Thanks for your reply, Daniel.
I'm trying to understand what a reasonable amount of time to wait for a download should be. When I request JSON from the AGOL API with the same extent I get a 24MB file in 22 seconds. You can test from the URL below. A zipped shapefile with the same extent is 4-5MB. This should be a sub-second operation.
Imagining myself as a user in this scenario: I'm probably leaving the page and contacting the data provider for a custom download option, or worse, abandoning the download request completely.
I'll check for improved speeds tomorrow. I'm still seeing 360 bytes / sec, which isn't acceptable.
You're running into a tradeoff we made in order to serve full downloads to customers at scale and make updates in the background so a download is always available.
The filtered files are produced by rolling over every feature that we have extracted from the underlying service and then stored on S3. There's no database here, just streams of data flowing to and from S3. Therefore the operation is the equivalent of a full table scan, plus network time to move the data back and forth from s3, plus the time it takes to convert the filtered data into the requested format. This clock starts after the job to create this file reaches the front of the queue.
Once the download starts you should receive the data with plenty of bandwidth.
I understand why you would want to receive a filtered download more quickly, but the current system is not optimized for that. With future work, we may be able to support faster filters.
I hope this helps you understand why things are the way they are. If not, I'm happy to try to explain further.
Software Engineer | ArcGIS Hub
Thanks for providing an explanation.
I hope you are able to prioritize improving performance for filters. The reason we went with Hub was to cut down on delivery time of public data requests for these large datasets. If we can't offset the time taken to deliver data there is a reduced need to use the product. Current performance is not at all ideal. This might be part of the reason some customers have opted to embed old-school and self-hosted FTP links in the description and file gdb download.