Select to view content in your preferred language

Programmatic Method to Update Open Data Portal Item Pages and Data Download Items?

4337
13
12-23-2020 06:37 AM
RexRobichaux
Occasional Contributor

Hello Hubbers,

 

  We recently built and launched a new Hub and Open Data Portal site for our agency (https://geohub-vadeq.hub.arcgis.com/) and so far have really enjoyed both the user and administrative experience it's offered. We have encountered some strange behavior regarding the Open Data Portal and Data Downloads however. I have a case open with Esri Support Services who are assisting, but thought I'd post here to take another approach as well. 

Our basic deployment / environment is:

  • Open Data Content is pointing to AGOL Feature Layers, that are created from on-premise ArcGIS Server 10.7.1 web services (actually all come from individual layers within this service: https://apps.deq.virginia.gov/arcgis/rest/services/public/EDMA/MapServer)
  • The underlying data source for the web service are a set of file geodatabases that are update nightly via python scripts (primarily truncate and join/appends from Oracle 12c enterprise geodatabases).

The weird behavior we have seen lately is:

1) The underlying file geodatabase data and REST service data is updated at night, and we don't see updates at the Open Data Downloads page, more than 8hrs after the underlying data has updated (one such example where we noticed this): https://geohub-vadeq.hub.arcgis.com/datasets/e00ca5e58a8b4c9e9c47e61348d2f040

  • I resolved this issue by manually updating the AGOL feature layer item tag, which seems to have caused Hub to search for item updates.
  • From speaking with Esri Support, it looks like you can also force Hub to look for updates on all content through the Content Library (upper right < Check for Content Updates)- which seems to send a harvest call to opendata.arcgis.com: https://opendata.arcgis.com/api/v3/jobs/site/53f0be9030ac4298814b66f4a8c7faa8/harvest
  • Would it be possible to script this harvest call using the python api and schedule this to run say 1-2 times daily, so that we know Hub / Open Data pages are always pretty much current with our REST data?

2) Even after the Open Data Page updated, the data downloads (shapefile) were still stale / outdated. I tried clearing local browser cache, incognito, etc., and nothing worked. This behavior was still seen 24hrs after the underlying data updated.

  • In speaking with Esri Support, we found that at the dataset level, you can go in and choose to manually "Recreate Download Files". However, we have ~55 layers that change either daily, weekly, or adhoc (and I'm a shop of 1) so I can't go in daily and manually run this update for all of our data.
  • It looks like when I choose to Recreate the Download Files, I get a "File creation failed" error but I have a feeling it maybe a false flag, and it's actually working...as only the Delete call is getting a 403 server error (screenshot).
  • Would it be possible to script this "Recreate Download Files" and run daily via the Python API to ensure our data downloads are fresh / current as well?

3) Even after we refreshed the Data Download Files, until I went in and cleared out my browser cache and cookies, I was pulling the outdated downloads. So there appears to be a client-side level of caching that is causing issues. Is there a workaround to prevent local browser caching of data downloads so that users aren't unexpectedly pulling outdated data unknowingly?

 

Sorry for the heap of questions...as a recap- it appears there are at least three levels of complexity/issue here:

  1. Open Data Item Update caching / frequency
  2. Open Data Download Updates caching / frequency
  3. Client-side local browser caching.

Thanks for any help and information anyone can provide!

-Rex

 

13 Replies
by Anonymous User
Not applicable

1) yes you can script that call. we're working on API documentation as well but don't see too much risk if you call it as the hub's client does as it's a versioned API

2) i'm a bit unclear on exactly what the issue is but for private items the download system leverages the export functionality and the exports themselves are made as items. if you have viewer-level users trying to download things previously downloaded by a creator then there is a possibility they get the last time the private download was created instead of the last updated data. customers in this setup have and do script the creation of these download items as we look for longer term solutions in private data extract across the platform.

3) this doesn't sound right...can you send us your case/BUG numbers so we can look at the details?

RexRobichaux
Occasional Contributor

Thank you for the response Graham! I'll try to reply in line:

 

  1. That's great to hear. Do you (or does anyone) have any starter-scripts I could use as a basis to further customize? I found a few online but I believe they are outdated from before the Hub / Open Data integration. If not, now worries, I just didn't want to reinvent the wheel if there is already a documented Hub / Open Data api-based script out there for use.
  2. This may be what we are encountering but I think I need to understand the "private items" aspect more. All of our Open Data Portal items are shared publicly and available via anonymous REST access. In the AGOL items, they are shared publicly as well. Folks can enter the site without logging in to download any dataset they need- so I'm not sure if that would come into play with our data or not.
  3. Sure- my ESS case is #02704584. The local caching issue was reproduced by several of my data editors / data owners as well on various browsers (Mozilla and Chrome at least). Once we cleared the local browser cache and cookies (not sure if cookies were related), the updated or current download pulled successfully. I noticed this because I had a thought to try on my personal laptop which had never accessed the dataset before and voila! It worked, while simultaneously, my work PC pulled an outdated download.

Thanks again for the assistance!

ThomasHervey1
Esri Contributor

Hi @RexRobichaux,

We don't have any starter scripts, but the check for content updates and the recreate download options trigger the calls you'd need to make. As Graham mentioned, we're going to work on our API documentation once we're sure that the API is ready for higher traffic from programmatic access. In your case, your datasets are public, not private, which means it uses a download system native to Hub. We have a ticket in our backlog to allow customers to schedule update checks (and download file generation). My plan is to start this work sometime in the first half of 2021. Regarding browser caching, I'll take a look at the support issue and get back to you if there are any easy changes you can make.

RexRobichaux
Occasional Contributor

This is great information- thank you again for your help with this @ThomasHervey1 !

0 Kudos
TheresaMulroney1
New Contributor II

Thomas,

Can you tell me if allowing customers to schedule update checks and thus generate download files has been completed. We files that are updated periodically and have a difficult time getting new download packages even doing it manually. Would love to script this to complete during off hours.

AlisonMynsberge
New Contributor III

I haven't seen anything in the API documentation for this; is it there and I'm just missing it, or is the API not ready for higher traffic for download file regeneration?  Thanks!

0 Kudos
AndrigoSalvador
New Contributor

Hi @ThomasHervey1 

How are you? 

Do you have any news about  "Recreate Download Files" and run daily via the Python API ? 
I use AGO 11.2 and I published some features hosted services to can apper the download button in my ArcGis Hub (Sites) . I developed one script in python to overwrite de feature service daily, but the downloads (shape, csv, etc) isn't refresh togeter the feature services. I need refresh the download formats every day for anonimous user can download the data.

Sorry for my bad english If you can help me....

 

0 Kudos
RobertSpál1
New Contributor III

Hi guys! I am a chief officer of Brno city arcgis hub and we are encountering the very same issues with some services hosted on our portal 10.8.1. Some of them are ok but some of them cant be downloaded at all or if they can be then the download file is not up to date. It is very strange behaviour as for some identical services, that are updated through api, everything works fine(though its large minority of them)We tried to identify with our customer support if there is any pattern that would explain the behaviour but we were unsuccessful and they have filed the problem with you esri us recently. I dont expect any answer I just wanted to let you know that Rex is not alone with the problem. Thanks a lot and merry Christmas to all! Stay safe!

Rob

by Anonymous User
Not applicable

thanks for sharing! we're looking into that case