Are GP services useless on ArcGIS server with versioned data?

Robertvan_Gilst · ‎01-19-2017

The setup:

Versioned data in an SQL Server database
GP service that connects directly to the SDE data.

Running my service the first time is fine, and the service responds as I expect.

Make an update to the data the GP service uses, either via ArcMap or by using a feature service on the same data.

Running the service the second time, returns the same result at running it the first time (if the service is just reading data, when editing it will just fail).

The service "hangs" in the state the version had the first time it ran. And there does not seem to be a programmatic equivalent to "Refresh version" as there is in ArcMap.

Our Danish distributor says that this is "by design", but I cannot accept this to be true, can it?

The easiest way to reproduce this is to create a model that returns the count of a feature class and publish this to ArcGIS server. Then run the service, add or remove a feature, than run the service again.

My real problem is that I have to update features in multiple feature classes in a GP service, look at the result, make some changes, and run the GP service again. Then I run into error when updating, because the version that the GP services runs in is outdated.

I hope someone has a solution for me.

Friendly regards,

Robert

Robertvan_Gilst · ‎10-11-2018

I finally found a solution to my problem.

It turned out to be very simple, I should clear the workspace cache at the end of my script:

arcpy.ClearWorkspaceCache_management()‍

This way all the connections to the sde workspace are disconnected, and the next time the service task is called it runs on fresh data.

I do not understand why this does not happen automatically when a service task finishes. Services are supposed to be state-less. I would suggest that esri adds this this logic automatically to all GP service tasks when they are published to the server.

View solution in original post

ScottFierro2 · ‎01-19-2017

Based on what you are describing it sounds like you are missing a reconcile & post task in your GP or as a manual step in your workflow.

Essentially, your first edits are stored in a change log and in SQL server these tables are identified as <schema name>.a_1234 and <schema name>.d_1234, where the A table holds appends (this includes both new records and edits to existing records) and the D table holds deletes. Additionally, the 4 digits in the table name will be consistent for a single Feature Class (i.e. FC = PLANS.Test which has related tables of a_3754, d_3754 and i_3754).

In order to take the edits you made in your first edits and have them show as the updated values for your second pass you must run the reconcile & post (to default) process explained here (after making your first edits before starting second): http://desktop.arcgis.com/en/arcmap/latest/manage-data/geodatabases/reconciling-a-version.htm

Won't dive too far into options for conflict resolution and some of the other pieces of versioning because it sounds as though you are acting as the admin authority on the data and your workflow means you accept your edits to always be correct, therefore taking precedence over any other version edits that would produce conflict, which means you can do a post to default directly skipping conflict resolution.

Robertvan_Gilst · ‎01-19-2017

Hi Scott

Thank you for your answer.

Unfortunately this is not the case. The data is just stored in the default version, no reconcile and post required.

The, extremely simplified, GP service in the post only returns the number of features, no editing done in the GP service.

If I run the GP service, the service will return the same number of features until the GP service is stopped and started again.

Robert

ScottFierro2 · ‎01-19-2017

If all you are running is a GET Count statement then yes I'd expect the GP to be dynamic with recognition of your edits. So starting with some overly simplified checks:

1) ensure the database is registered with the server you have the GP hosted on

2) not sure how you are getting your count back (json, txt, etc.) but check cache settings on servers and web browsers

3) modify the published GP processes refresh interval (default is once every 24 hours) to be shorter window

4) review models environment variables to ensure nothing in there is causing this

Just a side note, if it's only a process to perform a GET Count and you are using SQL Server SDE do you have the ability to install SQL Server Management Studio (SSMS)? It's included as part of the SQL Server suite so no extra cost and writing the SQL statement for the SELECT and COUNT is really simple and is a query that can be easily saved for recall needs. Not to mention there is a world of options you can do behind the scenes of ESRI directly in the DB with this tool.

Robertvan_Gilst · ‎01-19-2017

Thanks again, I really appreciate the effort you put into figuring out my problem.

I've done the checks you mention

1. The database is registered.

2. I am just using the default rest endpoint, from a browser. I have checked the caching settings, and it there is no browser cache in the way.

3. I tried setting all values to the minimum, but this still does not work.

4. I cannot see any environment variables that could be the cause of this behavior.

To be clear, this is not my real problem. But this example was the most simple way to explain the, in my opinion, faulty behavior of a GP Service which is using data from a version feature class.

In the real life scenario, I have created a python script that has to add features and update features in a couple of feature classes. The script works fine every time when running it from my IDE. But I experience problems when running it from ArcGIS server when there have been edits after the GP service has ran.

Then I boiled it down to the fact that somehow the GP Service remains on the "version state" the GP service was the last time it was executed.

You can see the same behavior when you have two ArcMap instances open at the same time, that connect to the same feature class. I you make edits in one instance, you cannot see these updates unless you press the Refresh button on the Versioning toolbar.

I really is that functionality that I would like to call in my python script before running, so I could force it to run on the latest data.

ScottFierro2 · ‎01-19-2017

OK, so I will give you an overview of the general workflow we have tied to over 100 ETL's we use for our biggest public facing front end application all built in model builder and python.

1) Input Feature Class

2) Create new timestamped named version [python script]

3) Change version (model is inherently using default version so must change model to use version created in #2) [python script]

4) Perform slew of different GP/python processes as each ETL is different

5) Rec & Post (timestamp named version back to default) [deprecated tool version from 10.0]

6) Change version (changes model back to using default from new version) [python script]

7) Delete version (deletes the timestamp named version from the SQL Server DB)

We perform the equivalent of trunk and loads, simple edits and more all within this framework and many of which are automated tasks via batch files run in Windows Task scheduler and can say we have never hit the hanging version issue you are referencing. I can't say that deploying a similar strategy is best or works based on your needs and workflows. For us, this method was key because there is a built-in rollback functionality that occurs during rec & post when errors occur and we would rather push out day old stale data until we can resolve any issues than to use a non-versioned direct to default method that could leave empty, corrupted or partial data.

If you do decide you want to try that route then I suggest establishing the workflow using your current versions tools for create new version, change version, rec/post, delete version before venturing down paths like ours.

Robertvan_Gilst · ‎01-19-2017

Thanks I will try the create version, do updates in that version, rec/post and delete the version again approach and see if that might help my initial problem.

Just for your information, I haven't had problems with "version hanging" before i published my python code to an ArcGIS server GP service before. I think the hanging occurs because the service is not shut down after running like an ordinary python job, but stays in memory for performance optimization, which in my versioned cases causes all the trouble.

Thank you for your input.

ScottFierro2 · ‎01-19-2017

Yeah I understand and why I suggested the workflow we use as an option because it works to systematically step through the use of the versioning process. As I said might be overkill for your workflow and needs.

Another few base checks to look at as well:

1) Are there un-reconciled versions floating out at the DB level on the data set you are working with?

2) Look into those A_xxxx and D_xxxx tables for the FC in use, if there are records in there but no versions appear at the DB level that are needing to be versioned then this is the only time I have hit a true break in the process and seems to occur when 1 of our automated processes crashes completely mid run leaving the edits in these tables. When this occurs though there is a version left out at the DB that must be deleted along with the need to run a compress to clean up the SDE_States and SDE_State_Lineages tables.

Robertvan_Gilst · ‎01-20-2017

To extend my question a little bit.

When I call a GP service, how do I force it to always run on the latest version of the data?

The versioned data is very simple, just the default version. But when the data is updated in another process, it still runs on the state the database had the last time i ran the service.

I can get to run on the last version by stopping and starting the GP service before calling it again. But this cannot be the solution.

I cannot find documentation that versioned data should not be used when using GP services on ArcGIS for Server.

Robertvan_Gilst · ‎10-11-2018

I finally found a solution to my problem.

It turned out to be very simple, I should clear the workspace cache at the end of my script:

arcpy.ClearWorkspaceCache_management()‍

This way all the connections to the sde workspace are disconnected, and the next time the service task is called it runs on fresh data.

I do not understand why this does not happen automatically when a service task finishes. Services are supposed to be state-less. I would suggest that esri adds this this logic automatically to all GP service tasks when they are published to the server.