unique values returns only numbers not text

Nicole_Ueberschär · ‎03-07-2019

I have a python script that is supposed to return a unique value of a field.

def unique_values(table, field):
    with arcpy.da.SearchCursor(table, [field]) as cursor:
        return sorted({row[0] for row in cursor})‍‍‍

That works also very fine as long as the values are only numbers. Somehow when values contain also text then it only returns the values that are numbers not the text ones. I don't see why.

I'm using python 2.7 with ArcGIS 10.6.1.

DanPatterson_Retired · ‎03-07-2019

how can a field contain text and numbers? or are you talking about field being either numbers or text?

You can have a field with both or it would fail.

a = ['a', 'a', 'c', 'b', 1, 2]  # cursor equivalent

{i for i in a}  # a set comprehension
{1, 2, 'a', 'b', 'c'}

sorted({i for i in a})  # you can't sort a mixed field
Traceback (most recent call last):

  File "<ipython-input-9-78c584075683>", line 1, in <module>
    sorted({i for i in a})

TypeError: '<' not supported between instances of 'int' and 'str'‍‍‍‍‍‍‍‍‍‍‍‍

JoshuaBixby · ‎03-07-2019

I agree with Dan about having numbers and text in the same field. Assuming that was possible, sorted cannot work on a sequence containing strings and numbers because comparison operators are not defined between those data types.

>>> l = [1, 2, "hi"]
>>> sorted(l)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'str' and 'int'
>>>‍‍‍‍‍‍

DanPatterson_Retired · ‎03-07-2019

like in my lines 6 -12

JoshuaBixby · ‎03-07-2019

Ah, right. For some reason I missed what was in the code block even though it was right in front of me. I guess my speed reading lessons aren't working so well. : )

Nicole_Ueberschär · ‎03-07-2019

Thank you both for your replies.

I have a field of the type text that has profile numbers which can be 1, 2, 3 but also 2a, 2b, 2c, etc. that's why I store them as text. My script worked before for one file with 1, 2, 3b, 4, 5b, 6, 7, 8 so I guess it is working as long there is only once the number with a letter but not if the same number has many letters.

The script works when I just leave away the "sorted". Thanks!

Nicole_Ueberschär · ‎03-07-2019

In case someone else is tumbling over this problem: When I store now my unique values I get a set instead of a list which can't be indexed.

DanPatterson_Retired · ‎03-08-2019

So your numbers aren't numbers, but even if there was text, you can't have a mixed dtype, which a set accepts as does lists, set will handle the duplicates, but only if they are of the same dtype going in.

In short you want to remove duplicates and sort. The best way to sort text which contains numbers is to use a 'natural sort'

lst = {'1', 3, '3', '3', '7', '7', '8', '8a', 2, '3', '2a', '2b', '2c'}

import re
def natsort(lst):
    """natural sort"""
    lst = [str(i) for i in lst]
    import re
    convert = lambda text: int(text) if text.isdigit() else text
    a_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
    return sorted(lst, key=a_key)
    

natsort(lst)
Out[20]: ['1', '2', '2a', '2b', '2c', '3', '3', '7', '8', '8a']‍‍‍‍‍‍‍‍‍‍‍‍‍‍

notice that there are 2 '3's in the result. That is because the first 3 in the input is a number and the remaining are text representations. They would be retained since the input list was of a mixed dtype which can't occur in gdb tables, so that is a moot point.

But the sorting is done in a natural progression