Geoprocessing History in Metadata

11-08-2011 02:14 PM
Status: Under Consideration
Labels (1)
New Contributor II
I recommend 2 changes related to the geoprocessing history stored in the metadata:

1. Allow users to enable/disable storing geoprocessing history in the metadata
2. In the metadata editor, allow the geoprocessing history items to be deleted in mass and one by one
Agree this needs to be implemented as a normal tool, rather than...

Workaround for removing gp history from metadata:
Thanks for posting the workaround, Captain.

We have several Python scripts that perform daily geoprocessing tasks.  Since geoprocessing history is automatically logged, the GDB_ITEMS table in our geodatabases bloats considerably and affects performance significantly.  I discovered the effect on performance by first taking the workaround code suggested by ESRI and modifying it so that I could run it at the geodatabase level to affect all objects in the geodatabase (the original workaround would take a VERY long time to actually perform if I had to go to each feature class individually).  Once I had cleaned up over two years of geoprocessing history from the GDB_ITEMS table of all of my geodatabases, I found that:

1) The reserved disk space for all of the SDE databases on my SQL Server dropped by nearly 11 Gb.
2) My automation scripts run MUCH faster.  For instance, one job that performs a DB compression and updates statistics went from taking 4 hours and 13 minutes to taking 8 minutes!  Another script that deletes several point feature classes and recreates them using X/Y coordinates downloaded from a business database went from taking 2 hours and 25 minutes to taking 15 minutes.

Having some method to manage logging to the GDB_ITEMS table would be an excellent enhancement!

I would also recommend that there be a way to programmatically turn off logging to metadata of scripts doing things like Analyze to the entire SDE set of features.  It's huge for us to have all this garbage in the metadata. 
ArcGIS is now 10.2.1 and there is still no way of deleting geo-processing logs from the metadata when in the Data Source Item Description editor.

I do think that when a geo-processing tool is run and metadata is updated its a great way of keeping track of what is done but sometimes you do something that is logged and you simply want to remove it but not the rest of the history.

For example you add a field and do a calculation on it, this is logged. You realise the calculation was incorrect so you re-run the calculation, this gets logged. I would want to go back into the metadata editor and remove the first calculation so the values in the field are synchronised with what the geo-processing log is showing.

You can edit/remove other items in the metadata (usually by a red cross button) why can't you make the Geoprocessing History do the same?
This definitely should be a property of a featureclass.
GP History ON or OFF.
There is a file that can be used with the XSLT Transformation tool to zap the GP history:

C:\ArcGIS\Desktop10.1\Metadata\Stylesheets\gpTools\remove geoprocessing history.xslt

So you could build a model build and script to do this for you on a regular basis when your use-case demands it. This unfortunately is not Hornbydd's use case where he wants to just delete a selected few.
This is most annoying especially when having to release data to the public. My metadata document is 15 pages, with an additional 24 pages of geoprocessing steps that users do not need or care about. All they want is the data with a concise metadata document.

Having realised a need for relatively opposite case, as mentioned in Allow IN_MEMORY datasets to store metadata generated by geo-processing tools , I think there could be an interim option instead of on/off switch which let users to control the amount of history to keep with the data items, say the last 20 geoprocessing actions. This history info is just auxiliary most of the time but sometimes it may become critical, particularly if a data item has a lineage related to the source items and their locations (say intersect operation applied and accompanying datasets to control source data versions, instead of rerunning a long intersect operation to make sure you used correct source datasets).


Great idea. I believe the default should be on, though. I have figured out the what and where hundreds of files just from the GP history. Most of the field calc history is a waste of time to read, but the appends are useful. None of these files had real metadata (documentation), and you can't count on most people to create any metadata, especially non-GIS non-IT people.