Get unique combinations of multiple field values

DanEvans · ‎03-02-2016

Hi,

I've written a geoprocessing script in Arcpy v10.2 that merges contiguous polylines into multipart polylines. At the moment it can take an SQL expression as an input that selects certain features of the input feature class to apply the algorithm to (e.g. I only want to merge intersecting pipes if they operate at the same pressure range, so I'd use the selection query to only select pipes of that pressure range).

I'm trying to modify this so instead of a selection query, you can select multiple fields from the input feature class, similar to the Dissolve tool. So for example, I could merge all intersecting pipes of the same material, diameter, and pressure range into separate, contiguous, multi-part polyline features.

I'm struggling to find a good way to identify all the unique combinations of values for the selected fields, and loop through each of these combinations, selecting them in turn and applying the algorithm to each of those selections.

The usual way to get unique values from one field is to add all the values to a list and then convert it into a set, but this gets tricky when you instead have multiple fields and you don't know how many fields the user will select! It doesn't seem possible to convert a list of lists to a set, as a set won't accept a list as an element as it's mutable. If I try converting the inner lists to tuples I get an error about tuples not being callable.

Generally the code I've written to try to achieve this is a hacked together mess! There has to be a better way of doing this!

Can anyone suggest how I can do this?

Finally, once I have each combination in a list (or maybe a dictionary, with the field name as the key?) is there a nice/efficient way to generate a SQL query to select the features with that combination of values?

Thanks

Dan

DanPatterson_Retired · ‎03-03-2016

yes... the quest... many great images there on that link

DanEvans · ‎03-04-2016

Thanks all,

Looks like I've solved it by using Summary Statistics to get the unique combinations, then looping over the unique combinations with a SearchCursor. Within this SearchCursor I have another SearchCursor looping over all my input features and an InsertCursor to write only the matches into a temporary feature class.

Thanks again for the help.

Dan

RichardFairhurst · ‎03-04-2016

Embedded cursors only works for small sets of data. The larger the data grows the inefficiency becomes exponentially bad. Loading a dictionary from the table with the data you want to read only with the unique combinations as the key and values of the fields to write in a list and then using an insert cursor based on the dictionary matches is incredibly fast. I discourage everyone from using embedded cursors. They are a poor solution.

DanEvans · ‎03-07-2016

I know it's inefficient (I got it working and it took 13 hours to run on my data!) but I spent several days last week tearing my hair out trying to get it to work the efficient way, without success!

DanPatterson_Retired · ‎03-07-2016

I didn't see a copy of sample data you are trying to use, is it possible to get one, so I can try the alternative approach

DanEvans · ‎03-08-2016

Hi Dan,

Sorry but I can't share any of the actual data. I can give you an idea of the schema though if that helps?

It's a pipe network (just a standard line feature class though, no geometric network stuff), each feature is a section of pipe (split at fairly arbitrary points). Each section has an 8-digit unique ID, and a number of attributes such as material (2-letter text code), diameter (floating point number), diameter units (either inches or millimetres), pressure tier (Low, Medium and High), etc.

What I want is to combine contiguous sections of pipes with the same attributes into multi-part features. I will then assign each one a unique 'subsystem' ID, and spatial join this back on to the original features.

The Dissolve tool isn't suitable even with 'create multipart features' or 'unsplit lines' checked, as this either combines ALL features with the selected attributes the same (not just those that are connected) in the former case, and will only combine lines that can be combined into single-part features in the latter case.

The main issue I'm having trying to do it with dicts etc. is building a SQL query to select out each unique combination of attributes. It gets very messy when I have to start dealing with different data types.

Don't know if that helps or makes sense. Let me know if you'd like any more info. Thanks for helping.

Dan

DanPatterson_Retired · ‎03-08-2016

Lets take it out of your field into concept stage... If you can extrapolate to this, then I think I have a solution at least for the reclassification to facilitate querying: Consider

Fields: prefix, state, pets,

you have 2 prefixes: (A, The)

4 states: (wet, dry, sleep, hungry)

3 pets: (cat, dog, fish)

you want all the combinations of the concatenation of the fields values. This will enable a reclassification of the data so that it makes it easier to query and do "stuff" with the results of the query. For "A hungry cat" or "A wet fish" may be class 1 and class 2, but you don't want to query the full slate and you may have numbers in one field and text in another. You just want the data reclassified so that you don't miss any possible combination.

RichardFairhurst · ‎03-08-2016

Dan:

Actually, the Dissolve tool will work to solve this problem, but not directly on the lines.

1. Buffer the lines a very small distance (0.01 meter/feet for instance).

2. Dissolve the buffers with the unique Case fields set for all fields that you want to group and uncheck the multi-part feature option.

3. Add a Long field called SUBSYSTEM to the dissolved buffers and calculate the ObjectID/FID into it.

4. Spatial Join the lines as the target and the dissolved buffer as the join features with the One to Many option.

5. Select all lines where the case fields of the original lines match the case fields of the dissolved buffers.

6. Dissolve the selected lines with the case fields including all of the original fields plus the SUBSYSTEM field and check the create multi-part feature option.

The lines will be combined into a single multi-part feature that all have the same case field values, but only as long as they touch within the small distance you specified. Separate features will be created for any set of case field values where the line groups do not touch each other within the small distance you specified. Each multi-part line feature will have a unique SUBSYSTEM field value you can use for your relate.

There is no need to loop through the separate case field value sets, since all sets are handled at once by the process above.