Auto Update of Datasets on OpenData

7541
30
03-02-2017 10:52 AM
CraigSwadner
Occasional Contributor

We have had several issues with our data updating.  After talking to other municipalities it seems to be a huge flaw in the software for customers not to have the ability to choose an automatic nightly update.  ESRI appears to have fixed the issue where you download a dataset it will create a new cache of data, however it could go days, weeks, months, maybe years before someone downloads that dataset again.  There really needs to be a way customers can set their sites to initiate a nightly update of their dataset caches ....

30 Replies
by Anonymous User
Not applicable

Hi Craig,

Just wanted to let you know that with our release for search last week we've made a significant update to the indexing system. Here's a bit more info about how the new download system works:

When a download request is made, one of two things happens:

  1. For hosted services, we check for changes before refreshing the cache. If a dataset has been changed or updated since the cache was made, we refresh the cache. Otherwise the download comes from the previous cache
  2. For services hosted elsewhere (e.g. on-premises servers) we refresh the cache if it has been more than 24 hours since the last refresh. If it is within the 24 hour window then the user gets the data as it was last cached

Hopefully the new download system helps resolve the issue, but be sure to let us know if you observe any issues with download freshness going forward.

Best,

Patrick

0 Kudos
ZahirIbrahim
New Contributor II

Hi Patrick,

This is definately not the case. We have services being used in our Open Data site, where we know the data, when downloaded, is not up to date, or in some cases, does not even download properly (in spreadsheet format for instance).

We also use a script to force update all items used in our Open Data site every night, to ensure that the cache is updated. Even then, we have erratic and inconsistant behaviour. Its only when we manually go in the back end, and update the cache, that in some instances, this may resolve the issue. We seem to find this issue with 2 of our large datasets, ie one with over 12,000 rows, and another with 120,000 rows. We dont consider this big data, therefore are unsure as to why this is occuring.

Like the above post, we are thinking of creating offline versions of our data to ensure our users get current data, which is not ideal. 

If you could provide best practice methods for these large datasets, that would be great.

0 Kudos
by Anonymous User
Not applicable

Hi Zahir,

I'm sorry to hear the way the caching system is designed isn't being reflected on your site. Are the out of date downloads happening for all items or just the two large items you refer to? Could you provide a link to your site as well as the two large items specifically so we can dive in a bit? 

Thanks,

Patrick

0 Kudos
ZahirIbrahim
New Contributor II

Hi Patrick,

Just to confirm, since the changed made recently to hub, we are no longer experiancing the cache issue.

0 Kudos
PhilLarkin1
Occasional Contributor III

Patrick-

Post-April update I'm still seeing an issue that has persisted since Open Data (Hub) was first made public.

In the scenario of on-premise services how should I trigger a change to the Updated date?  It is my understanding that date displayed from the item's Hub page matches the AGOL Item page. The date shown at the AGOL Item page does not change when service is republished or new data is available at the service. The date only seems to track when the item's properties (description, tags, etc) are updated.

This leads users to think data is out-of-date. Ideally the update date would reflect the change in all properties, data included.

by Anonymous User
Not applicable

Hi Phil,

Since we released the new search, there are now 3 places Hub looks when writing the "Last Updated" date. If an admin has explicitly set a revised date in the metadata editor in ArcGIS Online, we will use that date. If not, then depending on whichever was updated more recently we will use either a) the last time the item was edited or “touched” in ArcGIS Online or b) the last time an update was made to the service (if editor tracking is enabled).

Long story short, you can either explicitly set the revised date when you push your updates or enable editor tracking for the service. If neither option is viable, you could add a block to your update scripts that "touch" the item in ArcGIS online, which pushes a change to the item without actually changing anything.

Hope that helps,

Patrick

0 Kudos
PhilLarkin1
Occasional Contributor III

Is the assumption with option b that editor tracking fields are exposed in the service? We don’t expose editor tracking fields publicly. In any case editor tracking is not employed on all data published to this platform. The assumption with these options is that either someone is going to manually update items, or editor tracking is enabled, correct? I had hoped that in the process of indexing these services the data would be hashed for nightly comparison.

It would be extremely helpful if your team publishes an official code example relating to “touching” AGOL Items shared on Open Data. If one exists, please reference it here. Ideally this example will highlight a method that targets this case and is written from the perspective of someone trying to solve this problem. Providing examples that utilize both the arcgis python api and arcpy would be beneficial.

by Anonymous User
Not applicable

Hi Phil,

We are constantly trying to improve the way we index your data. Before the most recent change to search the only way we knew a change happened to an item was by checking the update date in ArcGIS Online. From your comment higher in the thread it appears you're already quite familiar with touching an item:

I found that if I updated a property of a dataset (disclaimer, description, ... ) the 'Updated' date would update

That is exactly what touching is: updating a property to trigger our system to know a change came through.

Editor tracking, as you point out, has some downsides, but apart from admins explicitly giving us the date, it is currently the best way our system can know that an update came through. 

That said, your idea of hashing at index is one we are working to add to the logic. There are changes coming to make large downloads faster and more reliable, which will necessitate better change detection on our part. Once that work is done in the coming weeks the last updated logic will get better as well.

Sorry it's not working perfectly, but hopefully your experience today is better than yesterday.

Best,

Patrick

0 Kudos
PhilLarkin1
Occasional Contributor III

Yes, in mid-2017 I employed the arcgis api in the attempt to overcome this limitation. However, at api v1.5 I’ve experienced some unfortunate side effects with tags. I’ve attempted to rectify this issue with Premium Support to no avail. I’ve also attempted to use arcpy which resulted in other unintended consequences.

Ultimately, I’m looking for authoritative educational content which addresses this issue. Please let me know if you’ve heard and will respond to my request for updated documentation. Advise if this would be more effectively sent as a premium support request.

I appreciate the upgrades made to-date and am glad ESRI is putting resources behind this platform.

CatherineWendland
New Contributor II

I'm having the same issue. My open data services are refreshed weekly, but they do not "touch" AGOL or make changes there, so it looks like they haven't been updated for months or years (according to the date on the item's page). Ideally hashing at index would be the solution, but for the workaround, we need to "touch" the AGOL item whenever the service refreshes.

If ESRI developers are working on this document, like Phil has requested above, maybe they could write a script to update the "revision date" in the item's metadata when the service refreshes.

I'll be watching this topic and hoping for a solution, too.