Modifying Permanent Sort script by Chris Snyder

RichardFairhurst · ‎08-02-2010

Chris:

I am not sure how best to contact you directly, but I have been trying to modify a script you wrote and posted on the public scripts page about a year ago. Your script permanently sorts a feature class or table to a new output feature class or table.

I have been trying to modify the script to optionally add a valid additional field of type Long to the output that will store a copy of the original data source's OID field values. Your script by default drops the original OID values from the output. However, I need these OID values where I want to join the output back to the source, but the data source has no other unique ID than the OID field and I cannot alter the source data schema. I have some code that basically works if I can assume that all OID fields are actually named "OBJECTID", but I do not think that is the case.

In VBA there are methods that can query a data source to confirm it has an OID and then get the OID field's name (which tells me there is more than one way these fields can be named). Is there anything equivalent in Python? Or how would you approach checking these two aspects of a data source to add appropriate error checking routines to your Python script to handle a subroutine like the one I want? If the new subroutine works, I would always be able to optionally include a join field in my sorted output that would work with any source that has an OID value without ever having to modify the source schema.

Thanks for the script and I hope you can help me.

Rich

RichardFairhurst · ‎08-04-2010

Chris:

Your right, I like it very much. :cool: With the geometry capabilities from the shape this starts looking like a real search cursor alternative, especially for when embedded cursors would normally be used. I don't have any background in using code like this, so I will have to play with it to start really understanding the possibilities (but hopefully I can approach it as being similar to a cursor on steriods).

How well do you think it would do with 120,000 features (my main dataset)? If not that well I could do a query selection first and just use the records I needed.

I am very glad I started this dialog, since we seem to be aiming at many of the same things.

Rich

RichardFairhurst · ‎08-05-2010

Chris:

I have implemented the use of the ValidateFieldName for making sure the User Specified field will work with the Output table. See code below:

    #Process: Add the specified ORIG_OID field name
    tempOidFieldName = originalOidFieldName
    if originalOidFieldName not in ["","#"," "]:
        originalOidFieldName = gp.ValidateFieldName(originalOidFieldName, os.path.dirname(outputLayer))
        if outShpDbfFlag == True:
            gp.AddField_management(outputLayer, originalOidFieldName[0:10], "LONG") 
        else:
            gp.AddField_management(outputLayer, originalOidFieldName, "LONG") 

#...... In Warnings section.
    if not tempOidFieldName == originalOidFieldName:
        message = "WARNING: The Original OID field was changed from " + tempOidFieldName + " to " + originalOidFieldName + "!"; showPyWarning()

I would like to modify the inputLayerFieldList to be a 2 dimendional array that holds the original field name and the output field name pair. The output field name would be processed through the ValidateFieldName tool before inclusion in the array. The array associating the original field name with the validated output field name would be better in my mind than simply trying a truncation of each original field name, when a .dbf or .shp file is specified for the output. The array could also be processed outside of the loop against the actual output feature class generated to verify that two validated field names are not identical and make them unique in the same way that the CreateFeatureClass_management tool did. Since the field list of the actual output field class should be in the same basic relative order as the array it should be relatively easy to get the actual field name created for the output into the array.

The loop would have to be altered to get the upper bound of the field name array and use a for loop accessing the array with index values to get both input and output field name associations in a single pass within in the loop. Each new record read within the loop would reset the field name index to 0. I believe this would improve the ability of the code to take an input that is not a .dbf or .shp file and directly output it to a .dbf and .shp file output without loss of data or placing data in the wrong field in the output due to tracated field name duplications. (I would like this tool to be worthy of consideration for inclusion in the ESRI releases of their software and the current code is too fragile on this type of transformation to qualify for that). The loop should also no longer need to test if the output is a .dbf or .shp file and alter the field names within the loop, since the input/output field name pairings and alterations would already have occured outside the loop.

How does Python handle multi-dimensional arrays and index access of arrays? I know the syntax in VBA or VB Script to do what I want, but not in Python.

One other question. I used your tool in a ModelBuilder model. When I attached other tools to the output, the tools did not know the schema of the output to choose fields in tools like the Calculate Field tool until I ran the script one time. Do you know of a way to notify tools connected to the script output what the output schema will be before the tool is run once? Thanks for your help.
Rich

ChrisSnyder · ‎08-05-2010

Damn! I found that while it appears that you can load a geometry object straight into a dictionary it gets corupted somehow in the process!!! The tabular data however is perfect. I am guessing it has something to do with the stupid geometry object being incompatible with the dictionary. If you really need the geometry in the dictionary, you could just loop through all thr coordinate pairs in the searchcursor and add them to the dictionary as raw values rather than the stupid ESRI geometry thing. But that makes things more complicated. Oh well - maybe in v10 since things seem more Pythonic (although I suspect it's just a wrapper of some sort)... Tables are easy at least.

Rich you could handle a multi-dimential array via Python in several ways (embedded lists or a dictionary). Python doesn't really have an "array" like VB, but I think Python array-type objects are better:

listObj = [["ORIG_NAME1","NEW_NAME1"],["ORIG_NAME2","NEW_NAME2]]
#print the original and name of the 2nd field
print str(listObj[1][0]) + ", " + str(listObj[1][1])
ORIG_NAME2, NEW_NAME2

#or my favorite, a dictionary keyed from the original name:
>>> dictionaryObj = {"ORIG_NAME1":["NEW_NAME1", "String"],"ORIG_NAME2":["NEW_NAME2","DOUBLE"]}
>>> str(print dictionaryObj["ORIG_NAME2"][0]) + ", " + str(print dictionaryObj["ORIG_NAME2"][1])
NEW_NAME2, DOUBLE

RichardFairhurst · ‎08-07-2010

Chris:

On the Ideas page I see that as of Version 10 ESRI has added a Permanently Sort Features tool. See the link below:

http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Sort/001700000057000000/

ESRI's tool requires an ArcInfo licence to use it, which has caused a lot of requests to make it work with an ArcView licence. Obvoiusly it is not a Python script.

It differs from your tool in that it takes a comma delimited list of fields rather than the prevalidated field names of your parameters (but their tool supports unlimited sort fields). I don't know whether their tool preserves the OIDs or not, since that aspect is not mentioned (I have used your latest version with a unalterable table that only had OIDs as a unique join field and the new tool has solved the problems I have had trying to preserve the original OID valuse in an unsummarized copy of a Standalone Table). The most interesting thing about ESRI's tool is that it does Spatial Sorting (UR, UL, LR, LL, and PEANO) when the Shape field is chosen as (one of the or the only?) sort field. You may have already known about this tool, but I wanted to make sure.

A few other notes. GetValue and SetValue are the only methods that take a variable for a field to access field values in Python per the Geoprocessing Model diagram. Not sure if my suspicions under the hood are correct, but it looks likely at this point based on how these functions are described. The help for GetValue/SetValue says that cursors also can use a field name or a field index directly to get and set values where they are known, but the examples and Geoprocessing diagram only indicate the field name method in the form: cursor.fieldname = value. Nothing shown is similar to ArcObjects use of field index values.

I sort of get the examples of Lists and Dictionaries you posted, but I have not figured out exactly how best to loop a paired list/dictionary (like embedded cursors) to do the validation checks I want. Still working on it, but for now a two step process to a file geodatabase and then an export to .shp/.dbf is not that bad where I cannot risk data loss. The transformation works just fine when I already know all field names will still be unique after the 10 character truncation, but I would like it to be a bit more bullet proof for users that may not consider the effects of the field name conversion.

Lastly, have you noticed a significant amount of time gets spent creating the ouput feature layer table with the Create_Managment tool? At least last night it was taking 3 minutes on that step, and there was not anything special I could see that accounted for it. I don't recall that amount of lag the first times I used your tool. Ever experienced that issue and have any ideas on what may cause it?

KimOllivier · ‎08-07-2010

A very interesting thread which is raising a number of topics.

1. Storing a shape object in a dictionary

I wondered how to do this too to be able to compare shapes to create a 'delta' list. I eventually used a simple property such as centroid, length or area. Bruce Harold has gone the whole hog and converted the shape to gml, after rounding all vertices, then compressing it using binascii. See his just posted ChangeDetector.
http://resources.arcgis.com/gallery/file/geoprocessing/details?entryID=351BEE10-1422-2418-8815-82074...( By the way I challenge anyone to find this script themselves using the seach or browsing the circular menus)

2. Sorting is all very well but getting the sort field populated is the hard part.

I need to have a sort field for a polyline featureclass that is to be exported to KMZ. Otherwise the tree is randomly exported after edits.

So I need a route system first to be able to traverse, and even then I need to get access to the measures. This is very clumsy. My method is to build a new route (which goes into a NEW layer with no attributes -aagh) then create a new temp midpoints layer, then find the measures from the temporary route, then join that back to the original sections, so I finally have a sort field.

There may be a few shortcuts at 10 to use featuresets.

RichardFairhurst · ‎08-08-2010

Kim:

The script under your first item looks interesting. Still being new to Python I will have to take some time to absorb it, but it gives me some additional insigt into what can be done with a dictionary.

Up to now for my comparison of lines I maintain a copy with a field storing a 70 character text field in the format "{from.X}{From.Y}{To.X}{To.Y}{Length}". Each value is rounded to 4 digits and in state plan feet coordinates each bracket set is 14 characters long. I have a Model that renames the last download with stored FIDs, creates a new download with preserved FIDs, and recalculates the shape comparison field. A join on FID allows me to do the comparisons. That has worked well enough for me to do the change detection of my purposes (which is to make sure there are date stamp for all updates). But I am always looking for speed improvements and this script looks promising as an alternative for my model.

For your second item, you're talking over my head and outside my experience. I am unfamiliar with KMZ format and not sure that I followed your description to understand what the actual sort field you are deriving actually represents. Interested to know a bit more though.

Rich

AlekseyNaumov · ‎06-21-2012

Chris and Richard, thank you for what looks like a useful tool, and for an interesting thread. Is there a link for the current version of the script (or toolbox)?

Many thanks,
Aleksey

RichardFairhurst · ‎06-21-2012

Attached is the last version of the tool I had revised for ArcGIS 9.3. I have ArcInfo at ArcGIS 10.0, so I can use the Sort tool ESRI provides. I just wish it would preserve the ObjectID values without having to first calculate a field to store them and including that field in the Sort output. I hope you find it useful.