Hi folks,
I know how to use the inbuilt python function to derive the maximum value across a number of input fields.
What I am now trying to do is to derive the second largest value across a number of input fields, and to retain the name of the that field.
For example if I have a set of fields as below I'd like to return "Area 3" and 15 from this list...
Area 1 = 1
Area 2 = 5
Area 3 = 15
Area 4 = 16
Area 5 = 10
Any ideas? Is there some sort of combination of rank or sort I can use that will give me the answer?
Cheers,
Dave
Solved! Go to Solution.
If you know python and can adapt this, follow the example
>>> a = [3,5,2,4,1] # take your list >>> a.sort() # sort it >>> a # have a look-see [1, 2, 3, 4, 5] >>> a[-1] # max # get the max by indexing from the end 5 >>> a[-2] # 2nd largest...same old idea 4 >>>
If you know python and can adapt this, follow the example
>>> a = [3,5,2,4,1] # take your list >>> a.sort() # sort it >>> a # have a look-see [1, 2, 3, 4, 5] >>> a[-1] # max # get the max by indexing from the end 5 >>> a[-2] # 2nd largest...same old idea 4 >>>
Thanks Dan - will give this a whirl.
It looks straightforward enough!
Dave
Regarding Dan Patterson's suggestion, it is likely the most straightforward or simplest since it relies on using built-in list methods and slicing, but there are still some things you should think about.
One, can the second-maximum item be the same as the maximum?
>>> a = [5, 3, 5, 4, 1, 2] >>> a.sort() >>> a[-1] 5 >>> a[-2] 5
If not, collapsing the list into a set is one way to address the issue.
>>> a = [5, 3, 5, 4, 1, 2] >>> a = list(set(a)) >>> a.sort() >>> a[-1] 5 >>> a[-2] 4
Second, sorting lists in Python is generally O(n log n) while getting max or min is O(n) (TimeComplexity - Python Wiki) . If you are working with large lists, especially extremely large ones, the overhead of sorting the list to find the second highest maximum won't be trivial. If you are working with large lists and performance matters, there is a good discussion thread over at stackoverflow on finding the second largest value: Get the second largest number in a list in linear time.
Joshua Bixby It is the statistician in me...one never removes duplicates from a list since all observations are equal...I was simply accessing via slicing, the second largest in a list without changing the length of the list...which if you did...would mean that you would be working with a different list and not the one in question. And your question would have to be re-posed
>>> a = [5,5,5,5,5,5] >>> a = [5,5,5,5,5] >>> a[-2] 5 >>> N = len(a) >>> b = list(set(a)) >>> b[-2] Traceback (most recent call last): File "<interactive input>", line 1, in <module> IndexError: list index out of range >>> a[-2] 5 >>> b == a False >>>
I can see the need to deal with duplicate "second highest" values being found. If you need maintain those (ie. keep the duplicates) then the pandas Data Frame might be of use.
Just for an example I created a .csv from the OP's data source sample and added an extra row that is a duplicate of Area 3 (the 2nd highest value):
Area 1 = 1
Area 2 = 5
Area 3 = 15
Area 4 = 16
Area 5 = 10
Area 6 = 15
Edit: this is far more simple than I had originally posted. Just overlooked some of the powerful capability of this library. Just simply determine the second highest value then return the rows equal to that value:
data = r'H:\RankData.csv' df = pd.io.parsers.read_table(data, sep=',') secondval = df['Values'].max() new_frame = df[df['Values'] == secondval-1] print new_frame