Programmatic Method to Update Open Data Portal Item Pages and Data Download Items?

3739
12
12-23-2020 06:37 AM
RexRobichaux
Occasional Contributor

Hello Hubbers,

 

  We recently built and launched a new Hub and Open Data Portal site for our agency (https://geohub-vadeq.hub.arcgis.com/) and so far have really enjoyed both the user and administrative experience it's offered. We have encountered some strange behavior regarding the Open Data Portal and Data Downloads however. I have a case open with Esri Support Services who are assisting, but thought I'd post here to take another approach as well. 

Our basic deployment / environment is:

  • Open Data Content is pointing to AGOL Feature Layers, that are created from on-premise ArcGIS Server 10.7.1 web services (actually all come from individual layers within this service: https://apps.deq.virginia.gov/arcgis/rest/services/public/EDMA/MapServer)
  • The underlying data source for the web service are a set of file geodatabases that are update nightly via python scripts (primarily truncate and join/appends from Oracle 12c enterprise geodatabases).

The weird behavior we have seen lately is:

1) The underlying file geodatabase data and REST service data is updated at night, and we don't see updates at the Open Data Downloads page, more than 8hrs after the underlying data has updated (one such example where we noticed this): https://geohub-vadeq.hub.arcgis.com/datasets/e00ca5e58a8b4c9e9c47e61348d2f040

  • I resolved this issue by manually updating the AGOL feature layer item tag, which seems to have caused Hub to search for item updates.
  • From speaking with Esri Support, it looks like you can also force Hub to look for updates on all content through the Content Library (upper right < Check for Content Updates)- which seems to send a harvest call to opendata.arcgis.com: https://opendata.arcgis.com/api/v3/jobs/site/53f0be9030ac4298814b66f4a8c7faa8/harvest
  • Would it be possible to script this harvest call using the python api and schedule this to run say 1-2 times daily, so that we know Hub / Open Data pages are always pretty much current with our REST data?

2) Even after the Open Data Page updated, the data downloads (shapefile) were still stale / outdated. I tried clearing local browser cache, incognito, etc., and nothing worked. This behavior was still seen 24hrs after the underlying data updated.

  • In speaking with Esri Support, we found that at the dataset level, you can go in and choose to manually "Recreate Download Files". However, we have ~55 layers that change either daily, weekly, or adhoc (and I'm a shop of 1) so I can't go in daily and manually run this update for all of our data.
  • It looks like when I choose to Recreate the Download Files, I get a "File creation failed" error but I have a feeling it maybe a false flag, and it's actually working...as only the Delete call is getting a 403 server error (screenshot).
  • Would it be possible to script this "Recreate Download Files" and run daily via the Python API to ensure our data downloads are fresh / current as well?

3) Even after we refreshed the Data Download Files, until I went in and cleared out my browser cache and cookies, I was pulling the outdated downloads. So there appears to be a client-side level of caching that is causing issues. Is there a workaround to prevent local browser caching of data downloads so that users aren't unexpectedly pulling outdated data unknowingly?

 

Sorry for the heap of questions...as a recap- it appears there are at least three levels of complexity/issue here:

  1. Open Data Item Update caching / frequency
  2. Open Data Download Updates caching / frequency
  3. Client-side local browser caching.

Thanks for any help and information anyone can provide!

-Rex

 

12 Replies
PhilLarkin1
Occasional Contributor III

@RexRobichaux 
Thank you for your excellent post. I appreciate the level of detail you've used to describe this issue. I am experiencing everything you've mentioned and would also love to see API access.

This is maybe the 3rd or 4th major site change since 2018 that negatively impacts performance in one way or another. While I appreciate frequent updates to the Hub framework it gets frustrating to update maintenance scripts once a year. Stability on this platform would be a pleasant surprise. API documentation would go a long way to help users navigate framework updates. Keeping AGOL Items and the Open Data page in sync shouldn't be this difficult/prone to issues.

young_mossy
New Contributor

any movement here by the ESRI team? 

0 Kudos
AlisonMynsberge
New Contributor III

Also hoping to learn if this is something that is resolved, and if so, if others have scripts that reliably update the downloadable files and sync metadata.

0 Kudos