Open Data and large datsets

DaveWatson · ‎02-06-2020

We have been having issues with Hub and Open Data related to large data sets. Once we add large data sets to Open Data they fail to download. These would include our parcel and contour data. We have had an incident open with ESRI for almost a year with no resolution.

Has anyone else experienced this?

Thanks!

Anonymous User · ‎02-06-2020

Can you send us a link to the problematic dataset?

If you aren't comfortable putting it here feel free to email me at ghudgins@esri.com

DaveWatson · ‎02-06-2020

https://gis-countyofdane.opendata.arcgis.com/datasets/parcels-2016

I just attempted to pull 518 records and it is still working. We had to remove the share for Contours and current parcels due to the cache refresh killing our data server.

Thanks,

Dave

Anonymous User · ‎02-06-2020

Hi Dave,

Sorry to hear you've been having longstanding issues with downloads. We’re making a few improvements to the download experience, on both the backend and in the UI. First, we're wrapping up a months-long project to improve our support for large datasets and are planning to release the update within the next 4-6 weeks. Second, we're updating the dataset page to surface more information about the caching process and when caching takes longer than a few seconds, adding alerts so that users can come back when downloads are ready.

In the meantime, we've looked at the dataset you link above and were wondering if you could try making some adjustments to the service to see if that helps mitigate the issue a bit. The error message we're observing is typically addressed by reducing the max record count for your service. You currently have it set at 1000, but given the complexity of your geometries and number of fields I would recommend trying 250 or 100.

Try that and let me know what you see.

Thanks,

Patrick

DaveWatson · ‎02-07-2020

Okay so I changed the max record count on historical parcels to 250 and changed the max sample size to 50K. I’ll try 2016 parcels again.

I am working on my morning email to Don.

This is what I am thinking. They ‘reset’ our cloud store again. The largest dataset that we have on open data would be parcels. The rest will have to wait or go to the zip file solution.

Anonymous User · ‎02-07-2020

We tested your data on one of our servers and the caching failed when maxRecordCount was set to 250. 100 worked much better and successfully cached in <30min which seems like an improvement.

As others have probably mentioned, this dataset, while large in both a record count and field sense, also seems incredibly complex geometry-wise. Have you considered simplifying the geometry a bit? In most of our tests things slow down ~245k records in, which leads us to think there's one record that may have a large amount of vertices.

DaveWatson · ‎02-07-2020

Ha, I was just testing the max count at 250. I will make the change to 100 and run a test. Would there be any benefit in modifying the Max sample size variable?