Daily update Experience apps

432
8
Jump to solution
10-15-2021 01:18 AM
Arnaud_Leidgens
New Contributor II

Dear All,

I'm looking for high level recommendations. We've recently purchased ArcGIS Entreprise (10.9, single machine deployment). I've read documentation and threads but I'm still a bit lost with my project.

We plan to build a system to support industrial agricultural plantations. We have a set of non-changing polygons (around 1500 plots). Dozens of clerks update our company ERP system (SQL Server hosted on another server) day by day. Most of our data (production, workforce, inputs, phyto data, etc.), around 100 columns, is linked to those plots. We want to share those quantitiative and qualitative data in a user-friendly, automatic and dynamic way (Experience apps and/or Arcgis Sites, mainly through web maps, dahsboards & charts). A daily update of the attribute tables would be sufficient (ex: query of cumulative data for the current month), with a refresh of the symbology. We expect a maximum of 1 to 5 simultaneous viewers ("read-only layers").

In term of performance and ease of deployment, what is the best approach ? Should we create a entreprise gdb from our remote SQL server ? Should we buy new SQL licence to create an dedicated instance on our GIS server, to duplicate relevant data only ? Is it better to host pre-queried views of individual tables (csv) ? Is it easier to work with ETL solutions ? We don't need versioning and we cannot put too much load on our ERP system.

I guess we'll have 4 or 5 central hosted feature layers containing all information, updated with Python (overwrite script) and windows scheduler. From each of those layers, dozens of views would be derived to maintain their symbology and legends. With a daily refresh interval, all our applications, dashboard and web maps would then be updated. 

Is that the correct approach ? Should I dig more seriously into Python/API courses ? Thank you in advance for you help or advice :)

Arnaud

Tags (3)
0 Kudos
2 Solutions

Accepted Solutions
jcarlson
MVP Frequent Contributor

I would dig into some Python. We have a similar situation with our cadastral data, in that a lot of other tables on another server feed into our hosted polygons layer, and updating them regularly is important for our users to see the latest ownership, assessments, etc. Our update nightly via Pyton and the task scheduler, and we then use view layers to create more focused layers for particular uses.

The key tools in this process are the ArcGIS Python API, pandas and the spatially enabled dataframe. The latter is really just ESRI's own custom extension of pandas itself.

The steps, in plain English:

  1. Query other server(s) into a non-spatial table using pandas, specifically pandas.read_sql.
  2. Query the existing hosted feature layer using arcgis.features.FeatureLayer
    1. Only bring in the primary key and geometry, not the other fields
  3. Join the two dataframes together using DataFrame.merge
  4. Apply the merged table to the existing hosted feature layer using FeatureLayer.edit_features

And here it is in some simplified code:

import os
import sys
import pandas as pd
from arcgis import GIS, GeoAccessor

# log in to portal
gis = GIS('portal-url', 'user', 'password')

# get hosted layer; not actually using arcgis.features.FeatureLyaer directly in this case
lyr = gis.content.get('itemID').layers[layer-index]

# SQL DB connection string
constr = 'mssql+pymssql://user:pass@server-location/database'

# Read standalone SQL file to query variable
with open(os.path.join(sys.path[0], 'query.sql'), mode='r') as q:
    sql_query = q.read()

## If you are simply copying an entire table, there's no need to establish a query string.
## read_sql can take either a query string or a table name as its parameter

# Query SQL into dataframe
df = pd.read_sql(sql_query, constr)

# Pull hosted feature layer into spatial dataframe, only pulling primary key(s) needed for joining
sdf = lyr.query(out_fields=['primaryID'], as_df=True)

# Merge dataframes
merged = sdf.merge(df, how='inner', on='primaryID')

## If your primary keys have different names, you can use left_on= and right_on= to specify the matching keys from the two dataframes.

## You may need to reshape the data here, depending on the needs of your output.

# Update hosted feature layer
## We do this in 100-row chunks. API docs suggest limiting the number of features being added/edited in a single go.
## Also includes some feedback messages to let you know if the features are updating properly
## For automated scripts, you can have these messages go to an external log rather than simply printing to the console

n = 0
err = False
while n < len(merged):
    fs = merged.iloc[n:n+100].spatial.to_featureset()
    updt = lyr.edit_features(updates=fs)
    msg = updt['updateResults'][0]
    print(f"Rows {n:4} - {n+100:4} : {msg['success']}")
    if 'error' in msg:
        print(msg['error']['description'])
        err = True
if err:
    print('Some or all updates did not apply successfully.')
else:
    if len(merged) > 0:
        print(f'{len(merged)} features updated.')

 

You can extend this even further with logging / notifications, or using pandas' compare function to selectively update only those features that have been updated in the SQL database. We have nearly 60k rows in a table we update nightly, but only a few hundred update on any given day, so being able to isolate the update to actual changes is a big plus.

- Josh Carlson
Kendall County GIS

View solution in original post

MichaelGinzburg
Occasional Contributor

I'd opt for Enterprise GDB on the same SQL server and updating layers by SQL jobs. Another option is to use spatial views joining layers with SQL ERP tables. Doing all the data work at SQL Server IMO is more sustainable solution.

View solution in original post

8 Replies
berniejconnors
Occasional Contributor II
Arnaud,

With such a small user base and no need for editing, just viewing, I would recommend that you store your polygons in a file geodatabase on your arcgis server. You can use python scripts to do nightly updates. The python scripts can read the data from SQL server and update the polygon attributes in the file geodatabase. You can use the file geodatabase as the data source for the map services that you publish with ArcGIS Server. This configuration should be uncomplicated and easy to setup and maintain.

Good luck.
Bernie.
jcarlson
MVP Frequent Contributor

I would dig into some Python. We have a similar situation with our cadastral data, in that a lot of other tables on another server feed into our hosted polygons layer, and updating them regularly is important for our users to see the latest ownership, assessments, etc. Our update nightly via Pyton and the task scheduler, and we then use view layers to create more focused layers for particular uses.

The key tools in this process are the ArcGIS Python API, pandas and the spatially enabled dataframe. The latter is really just ESRI's own custom extension of pandas itself.

The steps, in plain English:

  1. Query other server(s) into a non-spatial table using pandas, specifically pandas.read_sql.
  2. Query the existing hosted feature layer using arcgis.features.FeatureLayer
    1. Only bring in the primary key and geometry, not the other fields
  3. Join the two dataframes together using DataFrame.merge
  4. Apply the merged table to the existing hosted feature layer using FeatureLayer.edit_features

And here it is in some simplified code:

import os
import sys
import pandas as pd
from arcgis import GIS, GeoAccessor

# log in to portal
gis = GIS('portal-url', 'user', 'password')

# get hosted layer; not actually using arcgis.features.FeatureLyaer directly in this case
lyr = gis.content.get('itemID').layers[layer-index]

# SQL DB connection string
constr = 'mssql+pymssql://user:pass@server-location/database'

# Read standalone SQL file to query variable
with open(os.path.join(sys.path[0], 'query.sql'), mode='r') as q:
    sql_query = q.read()

## If you are simply copying an entire table, there's no need to establish a query string.
## read_sql can take either a query string or a table name as its parameter

# Query SQL into dataframe
df = pd.read_sql(sql_query, constr)

# Pull hosted feature layer into spatial dataframe, only pulling primary key(s) needed for joining
sdf = lyr.query(out_fields=['primaryID'], as_df=True)

# Merge dataframes
merged = sdf.merge(df, how='inner', on='primaryID')

## If your primary keys have different names, you can use left_on= and right_on= to specify the matching keys from the two dataframes.

## You may need to reshape the data here, depending on the needs of your output.

# Update hosted feature layer
## We do this in 100-row chunks. API docs suggest limiting the number of features being added/edited in a single go.
## Also includes some feedback messages to let you know if the features are updating properly
## For automated scripts, you can have these messages go to an external log rather than simply printing to the console

n = 0
err = False
while n < len(merged):
    fs = merged.iloc[n:n+100].spatial.to_featureset()
    updt = lyr.edit_features(updates=fs)
    msg = updt['updateResults'][0]
    print(f"Rows {n:4} - {n+100:4} : {msg['success']}")
    if 'error' in msg:
        print(msg['error']['description'])
        err = True
if err:
    print('Some or all updates did not apply successfully.')
else:
    if len(merged) > 0:
        print(f'{len(merged)} features updated.')

 

You can extend this even further with logging / notifications, or using pandas' compare function to selectively update only those features that have been updated in the SQL database. We have nearly 60k rows in a table we update nightly, but only a few hundred update on any given day, so being able to isolate the update to actual changes is a big plus.

- Josh Carlson
Kendall County GIS

View solution in original post

Arnaud_Leidgens
New Contributor II

Thank you very much Bernie and Josh, this is really helpful to start with ! 

So I understand that I could directly work with hosted feature layers - hosted as items through my Portal. Should I create layers that reference source data (ex: map image layer) ? End users are likely to browse those apps with a low quality phone. Is it relevant for performance ?

0 Kudos
MichaelGinzburg
Occasional Contributor

I'd opt for Enterprise GDB on the same SQL server and updating layers by SQL jobs. Another option is to use spatial views joining layers with SQL ERP tables. Doing all the data work at SQL Server IMO is more sustainable solution.

View solution in original post

jcarlson
MVP Frequent Contributor

I'd love to go for a solution like this. In our situation, unfortunately, we have zero control over how the other departments store and use their data, and we only have read access to the other DB servers involved.

Even if we could put an EGDB on the other databases, due to the version of SQL Server being used by the other departments, we'd be unable to utilize the latest versions of the desktop software. Hopefully the OP has a little more say in how their systems are organized, though.

- Josh Carlson
Kendall County GIS
0 Kudos
MichaelGinzburg
Occasional Contributor

Hello,

If you have read only access to the ERP SQL server or it's of an outdated version, you can opt for your own dedicated DB server. You can buy an SQL Server license and fetch data from ERP SQL using a linked server, build SSIS packages to replicate relevant data or to use SQL Server replication features. Another money-saving option is to use open source DB like PostGIS to set up an EGDB on a dedicated server, however  replicating data from SQL Server to PostGIS may require some more efforts than for an SQL Server. After you have got the data to your server you still can use spatial views etc.

0 Kudos
jcarlson
MVP Frequent Contributor

Interesting ideas. But reading a query to pandas and pushing targeted updates to a feature layer requires no additional DBs or duplicating / replicating data, and is flexible enough to work with any SQL platform (and even other sources like standalone DBF files). My "outside" fields (those originating from the other department DBs) can sync minutely with the hosted layer. Plus it's free!

- Josh Carlson
Kendall County GIS
0 Kudos
Arnaud_Leidgens
New Contributor II

We use MSQL Server 2016  and everything seems compatible with Entrepise 10.9. It is located on another non-federated server though, without any ArcGIS compenents installed.  Therefore, to create an EGDB, I understand we should buy a new SQL licence for our own server and replicate data, but I guess it's not worth it since we want ro read the data only.

My ERP administrators won't give me any administrator rights to modify data or create views. However, they could prepare the different queries/views in advance. Those would not change over time. I've very poor knowledge in SQL, Python and database maintenance so this solution seems easy for me.

Another option is to use spatial views joining layers with SQL ERP tables.

Please, could somebody develop this a bit more ? With ArcGIS Pro, this is straightforward to publish copy of data as hosted feature layer on the default Data Store (hosting server). However, I understand that this situation is different and would require to register the data source of the ERP SQL tables with Portal at least , otherwise the joins would be static. Is it correct ? Is this Python workflow really the best method for me, if the SQL views can be joined directly with my polygons ? Thank you in advance.

0 Kudos