What is the workflow for using a file geodatabase feature class as an input for a GeoAnalytics Server tool?

DalindaDamm · ‎03-23-2018

I am curious about the typical workflow to use when needing to process data in a file geodatabase feature class format with tools specific to the GeoAnalytics Server. It doesn't appear to be possible using a big data file share. Would services have to be created for these feature classes and then registered? I understand that the output will always be a feature service, placed in the Spatiotemporal Data Store.

SarahAmbrose · ‎03-23-2018

Hi Dalinda,

I have two recommendations, and they depend on the size of the data. Option 1 is good if you are using Pro, the portal Map Viewer, the ArcGIS REST API, or the ArcGIS API for Python.Option 2 is only available to you if you are using ArcGIS Pro to run analysis.

Share it as a feature service first - and in most cases I would recommend this!
1. You can do this through Pro (creating a hosted or non-hosted layer) using the Sharing pane.
2. You can do this in Portal, by uploading a zipped file gdb + publishing a hosted layer through Portal. Once you have a feature service, you can analyze it!
If the data in the file geodatabase is pretty small you could add it to the map in Pro, and just run analysis on it. Only do this if the dataset is small (< a few hundred features). A good example of this would be if you were joining a big dataset to a small dataset, or aggregating a lot of data into a small polygon dataset. For both of these examples, the file geodatabase is the "small" dataset!

If this is a feature you’re interested in, I would recommend adding an enhancement request through Esri support services. I’ve made a note of it, but with an official enhancement request, we’re able to see when multiple users want the same functionality, and prioritize the most popular ones first.

A note on this comment:

By default, GeoAnalytics results are stored in the spatiotemporal big data store.

At 10.5.1 or later in the portal Map Viewer, and in Pro 2.0 and later, you can optionally save your results in the relational data store as well.

Thanks for the question and let me know if I missed anything.

Sarah Ambrose

GeoAnalytics Product Engineer

View solution in original post

SarahAmbrose · ‎03-23-2018

Hi Dalinda,

I have two recommendations, and they depend on the size of the data. Option 1 is good if you are using Pro, the portal Map Viewer, the ArcGIS REST API, or the ArcGIS API for Python.Option 2 is only available to you if you are using ArcGIS Pro to run analysis.

Share it as a feature service first - and in most cases I would recommend this!
1. You can do this through Pro (creating a hosted or non-hosted layer) using the Sharing pane.
2. You can do this in Portal, by uploading a zipped file gdb + publishing a hosted layer through Portal. Once you have a feature service, you can analyze it!
If the data in the file geodatabase is pretty small you could add it to the map in Pro, and just run analysis on it. Only do this if the dataset is small (< a few hundred features). A good example of this would be if you were joining a big dataset to a small dataset, or aggregating a lot of data into a small polygon dataset. For both of these examples, the file geodatabase is the "small" dataset!

If this is a feature you’re interested in, I would recommend adding an enhancement request through Esri support services. I’ve made a note of it, but with an official enhancement request, we’re able to see when multiple users want the same functionality, and prioritize the most popular ones first.

A note on this comment:

By default, GeoAnalytics results are stored in the spatiotemporal big data store.

At 10.5.1 or later in the portal Map Viewer, and in Pro 2.0 and later, you can optionally save your results in the relational data store as well.

Thanks for the question and let me know if I missed anything.

Sarah Ambrose

GeoAnalytics Product Engineer

JoshuaBixby · ‎03-27-2018

Sarah Ambrose‌, good information, thanks for sharing. A couple of questions that came to mind when reading your response.

In discussing Option 2, you put in bold "Only do this if the dataset is small (< a few hundred features)." I understand why you are emphasizing that fact, but I also wonder what Esri is doing to make users aware they are not following that guidance when they try Option 2 using a huge dataset.

The organization I work for has thousands, some may argue tens of thousands, of GIS users with a wide range of knowledge and skills. You might have your bold text here, and the warning may even be in some documentation somewhere, but I can guarantee 90% plus of the users in my organization will not see such a warning in the documentation or here. What safeguards are going to be in place to at least notify the user they are doing something ill-advised if they try Option 2 using 50,000 records or 500,000 records?

Regarding Option 1, I see that becoming problematic, even in the short term. Typically if people want to use distributed GIS, they are working with fairly large datasets. If everyone working with large datasets needs to copy them to an AGS data store and publish a service, and the derivative products are also going to be stored and published the same way, storage management is going to be a nightmare for IT staffs on the backend. It won't take long for even 100 users working with large data sets and generating lots of derivative products to burn through hundreds of terabytes of storage in the data center hosting ArcGIS Enterprise.

SarahAmbrose · ‎03-28-2018

Hi Joshua,

We currently document in all GeoAnalytics tools in Pro this limitation. We have iterated on a few different solutions to warn users not to do this, but haven’t come across a “best” solution that we want to ship in released software yet.

I’m hoping this isn’t too hidden:

It is recommended to use feature layers hosted on your ArcGIS Enterprise portal or use a big data file share data when running GeoAnalytics Tools through ArcGIS Pro. Other data sources may perform slowly when there are more than 1000 features.

We do find that most users of very large data are using either data from:

A collection of delimited files, in which case big data file shares are the solution
Data ingested from GeoEvent server. In which case, the data is exposed as a layer in their portal.

So far, from what we've seen and heard from user feedback is that having lots of very large datasets in file geodatabases isn’t that common – although it’s obviously something that you have!. I’ll work with the team to figure out and assess this use case.

Sarah Ambrose

Product Engineer, GeoAnalytics Server

JoshuaBixby · ‎03-29-2018

Sarah, I hope our users have a similar experience to other users you are hearing back from, i.e., most of the very large data sets will work with big data file share or GeoEvent Server. We are standing up ArcGIS Enterprise 10.6.x in our development environments this week, and there are some users already prepped to start testing their workflows and our infrastructure/implementation. I will have more definitive feedback for you in a couple months.

DalindaDamm · ‎03-27-2018

Thanks Sarah! Very informative per usual. I will work with my colleagues to log an enhancement request for the option to utilize file geodatabase feature classes in a more direct way when working with the GeoAnalytics Server. When operating with large data sources (such as typical data requiring the processing power of the GeoAnalytics Server), using file geodatabases is the best option in most cases. The fact that the shapefile format is supported within Big Data File Shares is interesting to me since it is really difficult to use shapefiles with large/complex data since they have the 2GB component file size limitations.

SarahAmbrose · ‎03-28-2018

Hi Dalinda,

One of the big advantages of big data file shares is that multiple datasets can be represented as a single dataset. For example I could have a csv for lightning strikes in January, a separate csv for February, and another for March. If I put them all in the same folder (let’s call it “lightningStrikes”), and they have a matching schema, GeoAnalytics analyzes those three datasets as a single “lightningStrikes” dataset. This is a pattern that we’ve followed from the “big data” world. The same applies for shps, so they 2GB limit isn’t that limiting here.

Thanks,

Sarah

JoshuaBixby · ‎05-08-2018

Sarah Ambrose‌, we got our 10.6.x development environment stood up a few weeks back. Just this week I finally managed to find some time to kick the tires, and I am running into this issue head on. Working my way through all of the workarounds for getting my data into GA server for analysis, I started to wonder whether everyone is talking about the same thing when thinking "big" data, i.e., what qualifies as "big" data for Esri? Is it number of features, size of the feature class, some combination? Is a 1 million point feature class considered "big" if it is only 35 MB in size?

SarahAmbrose · ‎05-14-2018

Hey Joshua Bixby‌,

We're getting a bit off topic from the original question. Can you create a new post/thread? If' you're having questions about you specific use case, please add more details in. The more we know, the more help we're able to give

- Sarah

JoshuaBixby · ‎05-15-2018

Sounds good, will post a new question later today.