Get unique combinations of multiple field values

9721
17
Jump to solution
03-02-2016 02:00 AM
DanEvans
Occasional Contributor II

Hi,

I've written a geoprocessing script in Arcpy v10.2 that merges contiguous polylines into multipart polylines. At the moment it can take an SQL expression as an input that selects certain features of the input feature class to apply the algorithm to (e.g. I only want to merge intersecting pipes if they operate at the same pressure range, so I'd use the selection query to only select pipes of that pressure range).

I'm trying to modify this so instead of a selection query, you can select multiple fields from the input feature class, similar to the Dissolve tool. So for example, I could merge all intersecting pipes of the same material, diameter, and pressure range into separate, contiguous, multi-part polyline features.

I'm struggling to find a good way to identify all the unique combinations of values for the selected fields, and loop through each of these combinations, selecting them in turn and applying the algorithm to each of those selections.

The usual way to get unique values from one field is to add all the values to a list and then convert it into a set, but this gets tricky when you instead have multiple fields and you don't know how many fields the user will select! It doesn't seem possible to convert a list of lists to a set, as a set won't accept a list as an element as it's mutable. If I try converting the inner lists to tuples I get an error about tuples not being callable.

Generally the code I've written to try to achieve this is a hacked together mess! There has to be a better way of doing this!

Can anyone suggest how I can do this?

Finally, once I have each combination in a list (or maybe a dictionary, with the field name as the key?) is there a nice/efficient way to generate a SQL query to select the features with that combination of values?

Thanks

Dan

0 Kudos
1 Solution

Accepted Solutions
DanEvans
Occasional Contributor II

Thanks all,

Looks like I've solved it by using Summary Statistics to get the unique combinations, then looping over the unique combinations with a SearchCursor. Within this SearchCursor I have another SearchCursor looping over all my input features and an InsertCursor to write only the matches into a temporary feature class.

Thanks again for the help.

Dan

View solution in original post

0 Kudos
17 Replies
Luke_Pinner
MVP Regular Contributor

You can use the Frequency tool (if you have an Advanced/ArcInfo license) otherwise use the Summary Statistics tool with the fields you want to get unique combinations for as the "Case Fields".

You can then access the output with a search cursor.

DanEvans
Occasional Contributor II

Thanks, that'll work for getting the unique combinations, but I'm having some difficulty using those to select out one combination at a time to apply my process to.

At the moment I'm trying to use the SearchCursor to loop through each combination and build a SQL selection query for that combination by concatenating strings together. This isn't ideal though as I have to have lots of if statements to handle different data types and when it runs into NULL values it can't cope either - I could write more string concatenations to deal with NULLS but surely there must be a better way?

Here's what I have so far (is there a 'code' tag in this new forum??):

# Use a SearchCursor to loop through each combination, build a selection query, and apply the algorithm to that selection
with arcpy.da.SearchCursor(inputLines, dissolveFields) as cursor:

  selectionQuery = ''

  for row in cursor:


    for field in row:
     
      arcpy.AddMessage(field)
      count = 0
     
      if count < len(row):
        if type(field) == 'str':
          selectionQuery += arcpy.AddFieldDelimiters(arcpy.env.scratchGDB, dissolveFields[count]) + " = '" + field + "' AND "
        elif type(field) == 'int':
          selectionQuery += arcpy.AddFieldDelimiters(arcpy.env.scratchGDB, dissolveFields[count]) + " = " + int(field) + " AND "
        elif type(field) == 'float':
          selectionQuery += arcpy.AddFieldDelimiters(arcpy.env.scratchGDB, dissolveFields[count]) + " = " + float(field) + " AND "
      else:
        if type(field) == 'str':
          selectionQuery += arcpy.AddFieldDelimiters(arcpy.env.scratchGDB, dissolveFields[count]) + " = '" + field + "'"
        elif type(field) == 'int':
          selectionQuery += arcpy.AddFieldDelimiters(arcpy.env.scratchGDB, dissolveFields[count]) + " = " + int(field)
        elif type(field) == 'float':
          selectionQuery += arcpy.AddFieldDelimiters(arcpy.env.scratchGDB, dissolveFields[count]) + " = " + float(field)

      count += 1


  arcpy.AddMessage(selectionQuery)

  # Apply selection query and create working copy
  arcpy.FeatureClassToFeatureClass_conversion(inputLines, arcpy.env.scratchGDB, 'SuperStrings_temp_in', selectionQuery)
0 Kudos
Luke_Pinner
MVP Regular Contributor

Dan Evans​ see Code Formatting... the basics++

I edited your post to add the formatting.

RichardFairhurst
MVP Honored Contributor

I wrote the tool in this Blog specifically for use with the output of the Frequency Tool or Summary Statistics tool when multiple fields are used for the unique case values.  It creates a new Long field in both the original source and the frequency output to generate a single unique Long number to represent each multi-field unique combination.  It should handle all normal data field types and Null values.  The sort order of the unique field values should be correct for each field type (i.e., strings sort as strings, numbers sort as numbers, dates sort as dates, etc.) and the fields can be prioritized in any order.  The 10.3 version is better than the 10.2 version, but both work.

DarrenWiens2
MVP Honored Contributor
The usual way to get unique values from one field is to add all the values to a list and then convert it into a set, but this gets tricky when you instead have multiple fields and you don't know how many fields the user will select!

Can't you use the above method to build and filter a list of unique concatenated values, like [field1_field2_field3, field1_field2_field3, etc.]?

DanPatterson_Retired
MVP Emeritus

I do not understand why esri continues to make 'basic' tools only available at the Advanced level, yet provide the opportunity to do so via other means...

Free Frequency ... join the cause

code sample included, using a different approach than those already given

Luke_Pinner
MVP Regular Contributor

It's not unheard of that ESRI would change a tools license level Dan Patterson​, back at ArcGIS 9.0, the 'Add Join' tool was Advanced/ArcInfo only... it's now available with all licenses.

But that was a slightly different case where the same functionality was available in the menus/toolbars at any license level, but not as a GP tool and ESRI was sorting that out. Whereas with 'Frequency', I assume the license level is an artifact from its origin as the Arc/INFO (workstation) frequency command.

That said, it's still what I would consider a 'basic' tool, with the same functionality available via other methods even without resorting* to code...

* I say resorting, but code is often my first choice. That count_combos_demo.py is a neat script btw

DanPatterson_Retired
MVP Emeritus

the quest to change/uncover artifacts continues .... split layer by attributes was my first foray a decade or so ago... numpy now makes it so much easier.   Alternatives for 'basic' tools, have also been revealed... Voronoi/Delauney, Near, Statistics, Kappa coeff ... to name a  few, have been given.

0 Kudos
Luke_Pinner
MVP Regular Contributor

HolyGrail051.jpg