Windows power options for long geoprocessing?

SpencerMeyer · ‎10-22-2012

Would someone please point me to a basic overview of best practices for setting Windows (64-bit) power options for running long ArcGIS model builder processes? I have some that take hours or even days and I fear Windows sleep and hibernate modes will interfere. On my laptop, I've been using the "Presentation Mode" but I also just setup a desktop to run processes as well and want to be sure I don't lose hours of work to Windows control. I think I'm missing the basics.

thanks!

KimOllivier · ‎11-08-2012

I have some that take hours or even days

That would seem unreasonable in my experience. No single geoprocessing step or model should take days.

"If a process takes longer than a cup of coffee, then interrupt the process and find a better way." Is my rule.

Things to check:
Do not run processes on a desktop with data across a network
Ensure that all key fields are indexed
Use filegeodatabases, not shapefiles, but if you must, index the shapefiles
If you are running geoprocessing tools inside a loop of features, recode it so it is a single pass like using SQL set commands.
Optimise your disk, have 30% free space, plenty of virtual memory.
Make your scratch workspace a filegeodatabase located on a local disk, this is critical for modelbuilder
Abandon modelbuilder and recode the processes in Python scripts
Avoid known slow tools, such as calculating field values with joined tables
Use selections to minimise the unnecessary movement of data
Partition the process to keep within memory limits
Split the task and run parallel tasks on a multiprocessor machine
Add a faster disk with a RAID0 array
Maybe look for some different software to do the task, eg FME, numpy, PIL, gdal.

SpencerMeyer · ‎11-08-2012

Thanks, kimo.

I have most of those covered (file GDB, local data, non-looping, etc.). These models have many geoprocessing steps in them, not just one. Think multiple viewshed analyses across millions of acres. I'm not whiz, but I think I've optimized them fairly well. I haven't tried running them in a python IDE, but that might help.

one other thought: I use both SugarSync and Carbonite for backup and syncing. Those services attempt to backup files as soon as they are created so I wonder if my machine is slowing down a bit b/c temporary files (e.g., intermediate data) are being synced??? Some of my models use the primary file GDB as the scratch workspace because I need to review some intermediate data as I'm debugging steps of the model. Is it a big no-no to have final outputs and intermediate go to the same GDB?

Is there such a thing as a profiler for Model Builder? I really think it's just that I'm doing very intensive geoprocessing steps over very large areas, but maybe there is something else going on.

By the way, I'm on an i7 2.2ghz quad-core.
I don't know much about splitting tasks to run on multiple processors, but my understanding is that ArcGIS 10 on Windows 7 64-bit is already making use of my 8 threads (4 cores).

thanks for the advice!

KimOllivier · ‎11-09-2012

Maybe reduce the resolution of your DTM? Does it need to be so dense?
I found that a background hillshade for my maps works much better at 80m instead of 20m cells
even though the contours can create a 20m DTM.

I have tasks that take a day to complete, they contain many steps too.
I am never happy if a tool takes more than a few minutes. Many tools need to be more scaleable.
If it takes too long I look carefully to see why and change it if possible.
Often it is inconvenient to use a different package inside a workflow, but Python enables that more easily.
Esri developers are aware of the worst areas and you can see their efforts to fix some by building-in partitioning at 10.1
In the meantime you just have to abandon slow tools if you have large datasets to process.

Yes, I do think that on-the-fly syncing will cause the processes to be disk-bound, and even worse, network bound.
Similarly many people find that a virus checker (Norton comes to mind) that checks each new file created for a virus will cripple geoprocessing.

Using modelbuilder style of chaining tools creates a lot of intermediate featureclasses which require a lot of disk read/writes.
That is something that you can often minimise by recoding in Python. There are lots of list processing functions that run in memory that can be substituted for temporary geoprocessing featureclasses that are being used as a temporary set for selection.

Dictionaries, Lists, Sets are all very efficient, as is the numpy module for raster and other analysis using matrices. There are many modules that can be used instead of tools. ET GeoTools are my favourite for alternative geoprocessing tools.

How to profile modelbuilder?
I would run each tool interactively and look a the elapsed time.
In my python scripts I print out the time after each gp tool, sometimes to a log file using the logging module.

import datetime
starttime = datetime.datetime.now()
# .. do something
print datetime.datetime.now() - starttime
# this does all the date arithmetic to print an elapsed time

When a tool takes too long I try to fix it or otherwise replace it with a different process.

Why should we accept ArcGIS processes to be slower than equivalent ARC/INFO workstation tools?
Surely it is just poor programming? Maybe brought on by object oriented methods rather than geo-relational operations.
If I can accelerate a process in an interpreted Python substitute that must say something about Microsoft .NET structures.
The COM interface environment can be very flexible, but not if used to call each feature.
Similarly if each write to a database is committed that will slow performance dramatically, but with simple tools you have no choice.

SpencerMeyer · ‎11-09-2012

There are some great tips, in there. Thanks, kimo. I just implemented the time profiler on some other modules and that has been handy.

You went over my head there at the end, but I think I have a good sense of when to redo the analyses via python and numpy.

thanks for all your help!
YS

KimOllivier · ‎11-09-2012

I have just had a thought to combine the datetime stamp in a model.

You could export the model to a script and paste in a message after each tool.
This is not the way I recommend to learn Python scripting, it is machine generated code that is hard to read.

begin = datetime.datetime.now()
# the tool
arcpy.AddMessage("tool took "+str(datetime.datetime.now() - begin))

There may need to be a bit of tweaking of the code to make it run as a script, watch for layers, not featureclasses as inputs.

DaleHoneycutt · ‎11-18-2012

If you're running 10.1 Service Pack 1, try 64-bit background processing. This isn't a 'magic bullet' to use in place of careful optimization (all good points above).