Re-using scratchGDB from a GP service job as input to another GP service

ChrisPedrezuela · ‎10-11-2015

Hi guys,

I have a python toolbox that I have developed and is running fine at my desktop. Its using scratchfolder and scratchGDB for writing out intermediate and final data. My toolbox has Tool1 and Tool2. At my desktop testing, after running Tool1, intermediate data is written on scratchGDB (scratchFOlder is used to create temporary SDE connection files). To run Tool2, it needs to check my scratch GDB and folder to be able to do further processing and write final data still to the scratchGDB. So on a desktop this is running smoothly, but my concern now is that the end goal I have is to publish it as a GP service, but as I understand, each run of a specific GP service(or tool), it creates its own job folder and its own scratch GDB. So looking at my two tools, once they are both GP services, they will have independent scratchfolder and GDB. Is there a way to use GP service 1 results as inputs to GP service 2 job? Or should I just merge my tools as one and save trouble figuring out this issue?

Thanks again,

Thanos

VinceAngelo · ‎10-12-2015

Do not attempt to use the ScratchGDB assigned to a different service in a later service -- This can cause the tables in the scratch folder to disappear before you need them, or locks on the folder to cause background deletion to fail.

The end result for a service should be data in a "safe" location (I've created my own "JobFolder" object to manage this externally), usually passed into the service as a parameter.

If you have two cascading services, you can merge them if it makes sense, architecturally. Generally smaller, self-contained services will make better building blocks than large complex services.

- V

ChrisPedrezuela · ‎10-12-2015

Thanks a lot for the response Vince.

I could easily merge my two tools to become one service if published. I figured that would remove my concern for re-using scratchworkspaces, which as you pointed out is not an ideal direction to take.

However, the only concern I could probably have if my 2 tools act as one service is if the 2nd process (2nd tool) fails, then re-doing it needs to go through the first process. You see my first tool's function is to connect to a DB and extract records and write them as features in the scratchGDB, the second tool then does analysis on these features. So if they are a single service, if something fails on the second process, you'd have to re-extract records again.

Looking forward to your thoughts on this.

Regards,

Thanos

ModyBuchbinder · ‎10-12-2015

Hi

You have to remember it is a server that gets many requests.

Say the server got 5 requests, two of them fail. If you send request for the second part again, is there any way for the server to know what data to use?

From the server point of view each request is separated.

The only way is to handle it yourself (not simple), maybe send a time stamp for each request and remember it for the second tool.

If the second tool fails on the first time you can call it again with the old time stamp.

VinceAngelo · ‎10-13-2015

When you're working with services you need to think asynchronously. You can't control processing order or prevent jobs from being issued at the same time, so a unique job assignment mechanism should be used (UUIDs are great for this purpose).

I don't generally consider processing failure to be a real threat, but I do have to consider the possibility that the server will be shut down before processing completes, and architect a solution that permits an "ACK/NACK" to be sent to the controller with ternary logic (success, retry, failure).

If extraction time is a significant cost, you'll need to come up with a mechanism to persist the extracted data until a successful processing job is completed. Many servlet frameworks have built-in capabilities to manage, persist, and retry jobs; you should review the options for your environment before deciding to create your own queue management solution. But the extracted data should not be placed in the scratchGDB that is managed by ArcGIS Server.

- V

ChrisPedrezuela · ‎10-13-2015

Thanks Vince for all the feedback. Im not really that familiar with server architechture or that experienced in publishing geoprocessing services, so most of your server jargons are pretty new to me, sorry. We think the extracted records, which would probably range from 10 to 50k records per transaction do not need to be kept and could just stay in the scratchgdb till it gets cleaned up somewhere in the process. But due to the current environment setup, the database and table im connecting to and extracting those records from sometimes has another process or service running on it based on other jobs from different teams (non-GIS). So connecting and querying to it sometimes during a day would fluctuate from 5min to 15min alone. I could alternatively write the feature-converted records to an sde database so it'll persist. and separate the 2nd tool to fetch the data from there to run the analysis.