Select to view content in your preferred language

Python to compare values for ranking

1819
5
Jump to solution
05-29-2022 08:05 PM
CPoynter
Frequent Contributor

Hi All,

I have a dataset with four 'types' and their 'counts' for example:

A = 4, B = 2, C = 0, D = 0

In order of ranking, D (is the most important) > C > B > A (lowest ranking).

If I have an output with:

A = 0, B = 1, C = 0 , D = 1

How do I script in Python to compare the string so that when B = D that the variable I capture will be D based on ranking?

Regards,

Craig

 

 

 

0 Kudos
1 Solution

Accepted Solutions
JohannesLindner
MVP Frequent Contributor
  • what is the type of your output? string, list, dict, something else?
  • are the types actually just letters or are you simplifying things?

Assuming that you return a string and that the types are just letters (getting more important in ascending alphabetical order):

 

output = "A = 0, B = 1, C = 0 , D = 1"

# change to [ [type, count] ]
converted_output = output.replace(" ", "").split(",")
converted_output = [tc.split("=") for tc in converted_output]
print(f"converted output: {converted_output}")

# sort by count and type (both descending), return first element
sorted_output = sorted(converted_output, key=lambda r: (r[1], r[0]), reverse=True)
print(f"sorted output: {sorted_output}")
most_significant_output = sorted_output[0]
print(f"most significant output: {most_significant_output}")


#converted output: [['A', '0'], ['B', '1'], ['C', '0'], ['D', '1']]
#sorted output: [['D', '1'], ['B', '1'], ['C', '0'], ['A', '0']]
#most significant output: ['D', '1']

 

 

If your types are actually not letters but e.g. species, you have to define how to rank them and then use the list.index(element) method in the sort:

 

output = "Pig = 1, Lamb = 1, Chicken = 0, Duck = 1, Cow = 0"

# specify the ranking of the types, starting from lowest
ranked_types = ["Cow", "Pig", "Duck", "Horse", "Lamb", "Chicken"]

# change to [ [type, count] ]
converted_output = output.replace(" ", "").split(",")
converted_output = [tc.split("=") for tc in converted_output]
print(f"converted output: {converted_output}")

# sort by count and type (both descending), return first element
sorted_output = sorted(converted_output, key=lambda r: (r[1], ranked_types.index(r[0])), reverse=True)
print(f"sorted output: {sorted_output}")
most_significant_output = sorted_output[0]
print(f"most significant output: {most_significant_output}")


#converted output: [['Pig', '1'], ['Lamb', '1'], ['Chicken', '0'], ['Duck', '1'], ['Cow', '0']]
#sorted output: [['Lamb', '1'], ['Duck', '1'], ['Pig', '1'], ['Chicken', '0'], ['Cow', '0']]
#most significant output: ['Lamb', '1']

 


Have a great day!
Johannes

View solution in original post

0 Kudos
5 Replies
JohannesLindner
MVP Frequent Contributor
  • what is the type of your output? string, list, dict, something else?
  • are the types actually just letters or are you simplifying things?

Assuming that you return a string and that the types are just letters (getting more important in ascending alphabetical order):

 

output = "A = 0, B = 1, C = 0 , D = 1"

# change to [ [type, count] ]
converted_output = output.replace(" ", "").split(",")
converted_output = [tc.split("=") for tc in converted_output]
print(f"converted output: {converted_output}")

# sort by count and type (both descending), return first element
sorted_output = sorted(converted_output, key=lambda r: (r[1], r[0]), reverse=True)
print(f"sorted output: {sorted_output}")
most_significant_output = sorted_output[0]
print(f"most significant output: {most_significant_output}")


#converted output: [['A', '0'], ['B', '1'], ['C', '0'], ['D', '1']]
#sorted output: [['D', '1'], ['B', '1'], ['C', '0'], ['A', '0']]
#most significant output: ['D', '1']

 

 

If your types are actually not letters but e.g. species, you have to define how to rank them and then use the list.index(element) method in the sort:

 

output = "Pig = 1, Lamb = 1, Chicken = 0, Duck = 1, Cow = 0"

# specify the ranking of the types, starting from lowest
ranked_types = ["Cow", "Pig", "Duck", "Horse", "Lamb", "Chicken"]

# change to [ [type, count] ]
converted_output = output.replace(" ", "").split(",")
converted_output = [tc.split("=") for tc in converted_output]
print(f"converted output: {converted_output}")

# sort by count and type (both descending), return first element
sorted_output = sorted(converted_output, key=lambda r: (r[1], ranked_types.index(r[0])), reverse=True)
print(f"sorted output: {sorted_output}")
most_significant_output = sorted_output[0]
print(f"most significant output: {most_significant_output}")


#converted output: [['Pig', '1'], ['Lamb', '1'], ['Chicken', '0'], ['Duck', '1'], ['Cow', '0']]
#sorted output: [['Lamb', '1'], ['Duck', '1'], ['Pig', '1'], ['Chicken', '0'], ['Cow', '0']]
#most significant output: ['Lamb', '1']

 


Have a great day!
Johannes
0 Kudos
CPoynter
Frequent Contributor

@Johannes,

Your script interprets exactly what I am after, but I have one final issue.

My list of tuple values 'lst' works for the majority of returned values, however I occasionally have the instance where the value sorted and ranked is incorrect such as below:

lst = [['Rabbit', '7'], ['Dog', '3'], ['Bird', '17'], ['Cat', '0']]
rnk_lst = ['Cat', 'Bird', 'Dog', 'Rabbit']
sorted_output = sorted(lst, key=lambda r: (r[1], rnk_lst.index(r[0])), reverse=True)
print(f"sorted output: {sorted_output}")

most_significant_output = sorted_output[0]
print(most_significant_output[0])

sorted output: [['Rabbit', '7'], ['Dog', '3'], ['Bird', '17'], ['Cat', '0']]
Rabbit

If the maximum value is ['Bird', 17] that is the value I expected, yet script equates Rabbit.

Not sure why it works for most of analysis process, but trips up on a few response. Any ideas?

Craig

 

0 Kudos
JohannesLindner
MVP Frequent Contributor

It's because Python sorts the counts as strings (because they are), not as numbers. And in string sorting, "7" is greater than "17".

To solve that, you have to cast the string to int in the sorted() call.

 

 

lst = [['Rabbit', '7'], ['Dog', '3'], ['Bird', '17'], ['Cat', '0']]
rnk_lst = ['Cat', 'Bird', 'Dog', 'Rabbit']
sorted_output = sorted(lst, key=lambda r: (int(r[1]), rnk_lst.index(r[0])), reverse=True)
print(f"sorted output: {sorted_output}")

most_significant_output = sorted_output[0]
print(most_significant_output[0])

 

sorted output: [['Bird', '17'], ['Rabbit', '7'], ['Dog', '3'], ['Cat', '0']]
Bird

Have a great day!
Johannes
0 Kudos
CPoynter
Frequent Contributor

That solves my issue perfectly. Thank you.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

You are asking about the situation when 2 of the types have the same value, but it is hard for people to provide any code samples when they don't know what is supposed to be returned in general.  Are you trying to return the type with the highest count from the data?  And when the highest count is shared between types, you want only the most important type?

0 Kudos