We recently built and launched a new Hub and Open Data Portal site for our agency (https://geohub-vadeq.hub.arcgis.com/) and so far have really enjoyed both the user and administrative experience it's offered. We have encountered some strange behavior regarding the Open Data Portal and Data Downloads however. I have a case open with Esri Support Services who are assisting, but thought I'd post here to take another approach as well.
Our basic deployment / environment is:
The weird behavior we have seen lately is:
1) The underlying file geodatabase data and REST service data is updated at night, and we don't see updates at the Open Data Downloads page, more than 8hrs after the underlying data has updated (one such example where we noticed this): https://geohub-vadeq.hub.arcgis.com/datasets/e00ca5e58a8b4c9e9c47e61348d2f040
2) Even after the Open Data Page updated, the data downloads (shapefile) were still stale / outdated. I tried clearing local browser cache, incognito, etc., and nothing worked. This behavior was still seen 24hrs after the underlying data updated.
3) Even after we refreshed the Data Download Files, until I went in and cleared out my browser cache and cookies, I was pulling the outdated downloads. So there appears to be a client-side level of caching that is causing issues. Is there a workaround to prevent local browser caching of data downloads so that users aren't unexpectedly pulling outdated data unknowingly?
Sorry for the heap of questions...as a recap- it appears there are at least three levels of complexity/issue here:
Thanks for any help and information anyone can provide!
1) yes you can script that call. we're working on API documentation as well but don't see too much risk if you call it as the hub's client does as it's a versioned API
2) i'm a bit unclear on exactly what the issue is but for private items the download system leverages the export functionality and the exports themselves are made as items. if you have viewer-level users trying to download things previously downloaded by a creator then there is a possibility they get the last time the private download was created instead of the last updated data. customers in this setup have and do script the creation of these download items as we look for longer term solutions in private data extract across the platform.
3) this doesn't sound right...can you send us your case/BUG numbers so we can look at the details?
Thank you for the response Graham! I'll try to reply in line:
Thanks again for the assistance!
We don't have any starter scripts, but the check for content updates and the recreate download options trigger the calls you'd need to make. As Graham mentioned, we're going to work on our API documentation once we're sure that the API is ready for higher traffic from programmatic access. In your case, your datasets are public, not private, which means it uses a download system native to Hub. We have a ticket in our backlog to allow customers to schedule update checks (and download file generation). My plan is to start this work sometime in the first half of 2021. Regarding browser caching, I'll take a look at the support issue and get back to you if there are any easy changes you can make.
Can you tell me if allowing customers to schedule update checks and thus generate download files has been completed. We files that are updated periodically and have a difficult time getting new download packages even doing it manually. Would love to script this to complete during off hours.
Hi guys! I am a chief officer of Brno city arcgis hub and we are encountering the very same issues with some services hosted on our portal 10.8.1. Some of them are ok but some of them cant be downloaded at all or if they can be then the download file is not up to date. It is very strange behaviour as for some identical services, that are updated through api, everything works fine(though its large minority of them)We tried to identify with our customer support if there is any pattern that would explain the behaviour but we were unsuccessful and they have filed the problem with you esri us recently. I dont expect any answer I just wanted to let you know that Rex is not alone with the problem. Thanks a lot and merry Christmas to all! Stay safe!
I wanted to take a moment to post a little update on something I noticed today that I'm sure is related to the behavior noted in the first post just in case it's helpful. I noticed today while reviewing a couple of our downloadable Data Hub layers, that the dataset metadata (last updated and record count...along with (I presume) the map) is out of date / synch with the underlying data both within the service REST endpoint, and even within the attribute table in the Hub Data Page. Below are a couple of examples. For both of these datasets, manually going into the item, and checking for updates (I also recreated the download files...because why not?) resolved the issue. However this gets back to the manual vs automated update problem. I'm guessing this will be addressed in the 2021 enhancement you noted but thought this was good to mention just in case.
Here, we can see that there was a record updated on 12/31/20, and the actual record count should be 2,159. The layer metadata is showing it was last updated 2 months ago (incorrect) and is missing at least the latest record (2,158 records). Again, this can be pretty misleading both to our internal editors / data users, and external consumers of our datasets.
The next example is also a little perplexing... In this example, there were several updates on 12/30/20 and the metadata shows a last update of 4 days ago...which is at least close- but the record count is seriously off- by almost 60. Again, a manual update seems to have refreshed things:
Thanks again for continuing to look into the Hub / Open Data update issues and any help you can provide!
Thank you for your excellent post. I appreciate the level of detail you've used to describe this issue. I am experiencing everything you've mentioned and would also love to see API access.
This is maybe the 3rd or 4th major site change since 2018 that negatively impacts performance in one way or another. While I appreciate frequent updates to the Hub framework it gets frustrating to update maintenance scripts once a year. Stability on this platform would be a pleasant surprise. API documentation would go a long way to help users navigate framework updates. Keeping AGOL Items and the Open Data page in sync shouldn't be this difficult/prone to issues.