GeoAnalytics Performance: Feature Layer or BigData Store

1256
9
03-08-2018 05:00 AM
deleted-user-4ympgHKYfQv_
New Contributor II

Hi,

I am new to GeoAnalytics Server. 

My question is which data type is more effictive running analysis? The same data published as a (hosted) Feature Layer or e.g. as a CSV file in a BigData Store?

Thanks for your help.

Tags (1)
9 Replies
SarahAmbrose
Esri Contributor

Hi Carsten_Nexiga

I usually recommend to move the data around as little as possible - especially with big data. For your case, I would guess it’s going to be faster to run the data directly from the big data file share than to publish the data, and then run the tool. But there may be other factors, for example if your csv is on a shared location that has slow network speeds, and it takes a while for the server to access it. Ultimately, I think there is more flexibility with a big data file share (such as being able to easily modify the data, more flexibility in time and geometry formats, ability to add multiple csvs to represent 1 dataset) that using a big data file share is usually more advantageous.

For this response I assumed "effective" meant, “Which is faster?”. Let me know if that’s not what you mean and I’ll change up my answer a bit.

Thanks,

Sarah Ambrose
Product Engineer, GeoAnalytics Server

deleted-user-4ympgHKYfQv_
New Contributor II

Hi Sarah Ambrose,

I made some performance test (for now only with hosted FL). I used the GeoAnalytics "Aggregate Points" tool. My setup is the following.

1. I had a FL stored in a Postgres-Geodatabase (around 800,000 features, ESPG 31493).

2. I created two Portal FL services from that data: 

   a.) a FL with data referenced to the DBMS and

   b.) a FL with data copied to Relational DS of the portal.

Running the tool it takes for a.) around 7 minutes and for b.) around 1 minute.

- The strange thing is, if I use ESPG 3857 instead of 31493 both test only take around 2 minutes with the same dataset. Why that? I could use 3857 instead but being in northern europe the distances of the squares or hexagones are very disturbed.

- When running the tool with the referenced data (a.)) the GeoAnalytics Server logs show the following: 

WARNING     05.04.2018 09:45:29     Internal error: Failed to get serviceDefinition for https://<portalURL>:6443/arcgis/rest/services/bussgeld_31493_pg/FeatureServer/0     System/GeoAnalyticsTools.GPServer
SEVERE     05.04.2018 09:45:29     REST call to 'https://<portalURL>:6443/arcgis/rest/admin/services/bussgeld_31493_pg/FeatureServer' failed (reason=Server returned error: [Service 'bussgeld_31493_pg' does not exist or is not started.])     System/GeoAnalyticsTools.GPServer

Might that be the reason with using the tool with a.) is soooo slow? Is there any way around to fix that?

Thanks,

Carsten

0 Kudos
SarahAmbrose
Esri Contributor

Hi Carsten Hogertz‌,

Given your other question - I have a few questions for you on how you are running the tool: Copy to Data Store - GeoAnalytics

1. Are you running this tool through ArcGIS Pro? 

2. We expect a hosted layer to run faster than a layer in an EGDB. For a hosted layer, the data is read directly from the database. With an EGDB, we read the data over REST.

3. How are you setting the spatial reference for running the job? Do you mean you are creating a new dataset, or are you setting it the output spatial reference (Pro) or the processing spatial reference (Portal, REST), or other?

Thanks,

Sarah

0 Kudos
deleted-user-4ympgHKYfQv_
New Contributor II

Hi Sarah Ambrose,

thanks for your help. I guess something went wrong or I had some strange settings. I've set up the ESPG the correct way now and the results give correct length of calculated squares/hexagones.

But I've still got a performance question:

Setting: 

2 FL hosted on a EGDB and made available in Portal. Both layers share the same database. The difference is that one of the FL is time enabled (Share As Web Layer --> Configuration --> Date Fields --> Time Zone) the other not (same data without Time Zone enabled. 

If I use in Pro the GeoAnalytics tool "Aggregate Points" without any settings regarding time (no Time step interval etc.) it takes 2 min to generate the result for the non-time enabled layer and 7(!!) min on the time enabled. 

Is this a bug? This is a no-go for my client!

2nd question:

You've said that GeoAnalytics tools using the hosted layer in ESRI's DataStore runs faster, because it reads the data directly from the DataStore's database. Okay. But, as far as I know one shouldn't connect to DS directly via SQL to UPDATE, INSERT, etc. But what would be the best solution if productive data from an enterprise DBMS should be updated nightly? Overwrite the web layer every night? With around 25 mio features impossible. Or is there a way to give GeoAnalytics a direct connect to EGDB? Or what would you suggest?

Thanks for your help!

Carsten

0 Kudos
SarahAmbrose
Esri Contributor

Hi Carsten Hogertz‌,

To try and reproduce the issue, I'll need to follow the exact steps you took. Can you please explain what you did to "set up the ESPG the correct way now". Can you also let me know how you ran the tool through Pro. For example:

1. Added the hosted layer from the content pane to the map.

2. Ran Aggregate points with the following parameters {list parameters}, and selected the input layer by {browsing to the layer, typing it in, using the drop down, other?}

3. Copy the logs from the tool run.

Thanks,

Sarah

0 Kudos
deleted-user-4ympgHKYfQv_
New Contributor II

Hi Sarah Ambrose,

Instead of focusing on the how the ESPG was setup the correct way, I chose a totally different dataset now. I did that because I thought maybe the data I used was corrupt anyway. But...the ratio even got worse after excecuting some performance test.

My results:

Same FL (2,200,000 rows, Point layer), both referenced registered data

a. published as web layer without time enabled

b. published as web layer with time enabled

1. Added FL from content pane (Content --> Portal --> My content --> Right click) each to an different empty New Map.

2. Opened Aggregate Points --> Select Point Layer from DropDown --> Set Output Name --> Bin --> Hexagon --> Bin Size: 50 Kilometers. No further parameters has been set (no time steps etc.).

Result with 

a. 3 minutes 21 seconds

b. 19 minutes 57 seconds

It's nearly a ratio of 1:7!! None of my clients would accept this result. 

You can find the log files in the attachments.

It would be very important for the client to know why that happens.

Thanks for your help.

Carsten

0 Kudos
SarahAmbrose
Esri Contributor

Hi Carsten_Nexiga‌,

Can you please let me know the following:

1. Do you see the same difference with a hosted layer

2. In the Pro map, do you have time enabled, and is it set for the full extent of your data? Or a subset?

3. Can you please copy the text that is in the toolbox for each tool run. I'm interested at looking at the time breakdown (I think you did this before...but I can't seem to find them). Ill update if I do end up finding them.

Thanks!

Sarah

0 Kudos
deleted-user-4ympgHKYfQv_
New Contributor II

Hi SAmbrose-esristaff‌,

thanks for your answer.

1. I cannot create a hosted layer as it always fails when trying to copy data to portal/relational DS.

2. I tried with time enabled in Pro and without time enabled. Full extend.

3. You can find the logs in the comment above. I appended it.

I would be very thankful, if you find the error.

By the way, if I export the layer to a csv and put it in a BigDataFileShare the whole data is analyzed in seconds (even when using the date-field as well).

BR,

Carsten

0 Kudos
SarahAmbrose
Esri Contributor

Carsten_Nexiga‌,

Can you please create a hosted layer by adding local data to your map. Then going to the Share tab > Web Layer > Publish as Web Layer > Fill out the name, summary and tags. Select Copy All data, and Layer type = feature? Then try and run analysis on that hosted layer. 

- Sarah

0 Kudos