Scratch Workspace?

10980
7
02-27-2014 09:16 AM
MikeMacRae
Occasional Contributor III
I am writing a script and started reading through the docs for creating/setting a scratch workspace. So far, all I am getting out of it is that it is a place for temporal data and that it garantees that the scratch workspace will exist if the tool is to be ported. It will also default to the users temp directory which is definied in os when Arc was installed (or you have changed that location). Before I came across the documentation, I was doing this:

if arcpy.Exists(r"C:\GIS\temp_data.gdb"):
    arcpy.Delete_management(r"C:\GIS\temp_data.gdb")       

arcpy.CreateFileGDB_management(r"C:\GIS\temp_data.gdb")


Which seems to me that does the same thing as creating the scratch workspace but also satisfies clearing out the temp data (in my case, that's what I want)

So in a workflow, let's say I wanted:


  1. Clip some feature classes and send output to a temporary workspace

  2. Maybe do some analysis on the clipped data (areas of clipped polygons, buffering, summarizing fields, etc)

  3. Output analysis to a report (excel, word, pdf, etc)

  4. And then have the temp clipped data go away. In my case, the original feature classes are quite dynamic and change on a regular basis, so my report will be considered a "Snaphot in Time"



Can anyone enlighten me as to why I would use a scratch worksapce over this method? Does it speed up performace in some way?

Documentation here

Blog here
Tags (2)
7 Replies
curtvprice
MVP Esteemed Contributor
So far, all I am getting out of it is that it is a place for temporal data and that it garantees that the scratch workspace will exist if the tool is to be ported.

The scratchWorkspace is not guaranteed to exist. The scratchFolder and scratchGDB (which are derived from the scratchWorkspace) are.

In ModelBuilder, intermediate datasets can be pathed with paths like %scratchFolder%\xx0.shp - these will be guaranteed to work as if the workspace and scratch are not defined, they will be either determined or created depending on the existence and type of the current workspace.

In Python you need to clean up after yourself...

from arcpy import env
env.scratchWorkspace = r"C:\GIS\temp_data.gdb"
env.scratchWorkspace = env.scratchFolder
#  the call env.scratchFolder will locate (or create) scratchFolder
# - based on the current scratch workspace. guaranteed writable
# Since the workspace is a gdb, it will create a folder parallel to the gdb named "scratch"
tmpFC  = arcpy.CreateScratchName("", "", "featureclass", env.scratchFolder)
# Then CreateScratchName creates a unique name for you to use for your temp
# feature class. Since you used a folder, it's a shapefile path (including the .shp)
arcpy.CopyFeatures(inFC, tmpFC)
## at the end of your script, always clean up
arcpy.Delete_management(tmpFC)


Can anyone enlighten me as to why I would use a scratch workplace over this method? Does it speed up performace in some way?


Tools that you call may use the scratch workspace to write (and delete) their own intermediate files.

The most effective way to speed up performance with scratch is to use the "in_memory" workspace. This is very very fast, but only recommended if you know the temporary files will not be large. Do not set the scratch workspace to in_memory, just use it explicitly:

tmpFC = arcpy.CreateScratchName("xx1", "", "featureclass", "in_memory")
...
arcpy.Delete_management(tmpFC)


If the files are expected to be large, use the OS TEMP folder (this ensures that your temp files are local, so you aren't processing over the network).:

import os
TEMP = os.getenv("TEMP") # this is a folder Windows promises will exist
env.scratchWorkspace = TEMP
tmpFC = arcpy.CreateScratchName("x1", "", "featureclass", env.scratchFolder) # %TEMP%\x10.shp

# to use file GDB instead (not a bad plan) do this:
tmpFC = arcpy.CreateScratchName("x1", "", "featureclass", env.scratchGDB) # %TEMP%\scratch.gdb\x10


Hope this helps.
MikeMacRae
Occasional Contributor III
Tools that run will use the scratch workspace to write (and delete) their own intermediate files.


Yes, but isn't that impied by using the tool? You don't have to set a scratch workspace for them to place data in the default scratch workspace. They are designed to do that, are they not? Why would I want to manually set a scratch workspace outside of the method I implied above?
0 Kudos
ClintDow
Occasional Contributor
I see the main benefit (in addition to portability) of using the scratch environments being the users set their own path in their environmental variables and therefore know where to look if they need that data. If you set the path yourself, it may be in an unfamiliar location and the user may not be code-savvy enough to be able to open your source to find the location, so they have to either track you down or hope you print output file paths to the results window.

Also the arcpy.CreateScratchName('spam', workspace=arcpy.env.scratchGDB/Folder/Workspace) method guarantees a unique name with a digit appended to it, which is useful so users don't have to remember to delete whats in the scratch workspace every run of the script or conversely worry that data is being overwritten if you enabled overwrite output. Sure you can arcpy.CreateUniqueName to your own workspace, but between creating, deleting and checking for the gdb you created yourself you're shaving off a few extra lines of code and increasing readability.

To clear out temp data created within my own script, I simply append all paths generated from arcpy.CreateScratchName to a list and at the end of the script do a arcpy.Delete_management loop through the list, unless the user has checked a box in the tool parameters to keep intermediate data.
0 Kudos
curtvprice
MVP Esteemed Contributor
Why would I want to manually set a scratch workspace outside of the method I implied above?


The user of your tool has control of the workspace and scratchWorkspace and should (ideally) set them. Then they can control where to look for scratch data to clean up in the (unlikely I'm sure) scenario in which your tool fails and does not delete your temp GDB.

Going with the system's current GP environment values is also important if publishing a tool as a service, as the server sets up unique scratch workspaces for each process so you don't have multiple arcpy instances writing to the same workspace, which can corrupt data. This may also be an issue with background 64 bit processing can conflict with what you're doing in the foreground. Danger!

Note if you do want to create your own scratch GDB you may, but you should create it in the scratchFolder, using the current GP environment settings, whatever they are.

import os
from arcpy import env
scrWS = arcpy.CreateScratchname("", ".gdb", "workspace", env.scratchFolder)
arcpy.CreateFileGeodatabase(os.path.dirname(scrWS), os.path.basename(scrWS))
env.scratchWorkspace = scrWS
tmpFC = arcpy.CreateScratchname("xx", "", "featureclass", scrWS) # path for new temp FC 
# ,,,,
# then at the end of your script, clean up. 
# Note that you better delete any layers based on your temp data first
# or the delete will not work (file locking)
arcpy.Delete_management(scrWS)
0 Kudos
MikeMacRae
Occasional Contributor III

The most effective way to speed up performance with scratch is to use the "in_memory" workspace. This is very very fast, but only recommended if you know the temporary files will not be large. Do not set the scratch workspace to in_memory, just use it explicitly:

tmpFC = arcpy.CreateScratchName("xx1", "", "featureclass", "in_memory")
...
arcpy.Delete_management(tmpFC)




This is actually cool and I didn't see the CreateScratchName function at all. Very nice. The workspace parameter can be set to "in_memory? Really interesting. The help menu says nothing about it. I've never seen that before. Do other tools that have a  workspace parameter be allowed to do this? I'm also guessing that to use this, I need to have a better understanding of my RAM allocation?

If the files are expected to be large, use the OS TEMP folder (this ensures that your temp files are local, so you aren't processing over the network).:



import os
TEMP = os.getenv("TEMP") # this is a folder
env.scratchWorkspace = TEMP
tmpFC = arcpy.CreateScratchName("x1", "", "featureclass") # %TEMP%\x10.shp

# to use file GDB instead (not a bad plan) do this:
env.workspace = TEMP
env.scratchWorkspace = env.scratchGDB
tmpFC = arcpy.CreateScratchName("x1", "", "featureclass") # %TEMP%\scratch.gdb\x10


Hope this helps.


Yes, this helps a lot. Cool approach!
0 Kudos
MikeMacRae
Occasional Contributor III
I see the main benefit (in addition to portability) of using the scratch environments being the users set their own path in their environmental variables and therefore know where to look if they need that data. If you set the path yourself, it may be in an unfamiliar location and the user may not be code-savvy enough to be able to open your source to find the location, so they have to either track you down or hope you print output file paths to the results window.


Great point!


Also the arcpy.CreateScratchName('spam', workspace=arcpy.env.scratchGDB/Folder/Workspace) method guarantees a unique name with a digit appended to it, which is useful so users don't have to remember to delete whats in the scratch workspace every run of the script or conversely worry that data is being overwritten if you enabled overwrite output. Sure you can arcpy.CreateUniqueName to your own workspace, but between creating, deleting and checking for the gdb you created yourself you're shaving off a few extra lines of code and increasing readability.

To clear out temp data created within my own script, I simply append all paths generated from arcpy.CreateScratchName to a list and at the end of the script do a arcpy.Delete_management loop through the list, unless the user has checked a box in the tool parameters to keep intermediate data.


It's not a bad idea to give the user some control over maintaining outputted data. I may use that myself.
0 Kudos
curtvprice
MVP Esteemed Contributor
ArcGIS 10.1 Help: Using in-memory workspace
http://resources.arcgis.com/en/help/main/10.1/index.html#//002w0000005s000000

Note I've fixed my code above. CreateScratchName by default uses the current, not the scratch workspace by default, so if you want to create temp files in the scratch workspace, you need to specify it.
0 Kudos