I am new to panda dataframes and would really appreciate some guidance on how to transpose the GrothYrAbr for each Unit (3 records everytime) into the first row of each unit based on the dataframe structure shown below. Thanks kindly.
Unit | GrowthYr | GrowYrAbr | AvgHt | MaxHt | WstHt | AvgHt21 | MaxHt21 | WstHt21 | AvgHt22 | MaxHt22 | WstHt22 | AvgHt23 | MaxHt23 | WstHt23 |
65110260 | 2021 | 21 | 0.59 | 0.94 | 9.58 | 0.59 | 0.94 | 9.58 | 0.85 | 1.30 | 10.05 | 1.12 | 1.68 | 10.52 |
65110260 | 2022 | 22 | 0.85 | 1.30 | 10.05 | |||||||||
65110260 | 2023 | 23 | 1.12 | 1.68 | 10.52 | |||||||||
65110290 | 2021 | 21 | 1.41 | 2.05 | 5.61 | 1.41 | 2.05 | 5.61 | 1.71 | 2.43 | 6.14 | 2.01 | 2.81 | 6.67 |
65110290 | 2022 | 22 | 1.71 | 2.43 | 6.14 | |||||||||
65110290 | 2023 | 23 | 2.01 | 2.81 | 6.67 |
Hi @ejuser ,
Pivot method in pandas might do what you are looking for. See below:
import pandas as pd
df = pd.read_csv(path to csv file)
df.pivot(index='Unit', columns='GrowYrAbr', values=['AvgHt','MaxHt','WstHt','AvgHt21','MaxHt21','WstHt21','AvgHt22','MaxHt22','WstHt22','AvgHt23','MaxHt23','WstHt23'])
you can change the index and columns if this is not the result you are after.
Hope that's helpful.
Hi,
Thanks for the suggestion. When I attempt to run it, I get the following:
ValueError: Index contains duplicate entries, cannot reshape
This is due to the Unit not being unique. Each unit is repeated three times in the dataframe.
Thanks for the suggestion. When I attempt to run it, I get the following:
ValueError: Index contains duplicate entries, cannot reshape
Try to remove duplicate using drop_duplicates(). Pandas drop_duplicates() returns only the dataframe's unique values.
drop_duplicates(subset=None, keep="first", inplace=False)
Not sure the efficaccy of the solution you are after but this is how you go about it.
import pandas as pd
#Pivot df to send GrowthYrs to columns and come up with new frame
df1 = pd.pivot_table(df, values=['GrowYrAbr', 'AvgHt', 'MaxHt'], index=['Unit'],columns=['GrowthYr'])
#Step above results in mutliindex, so collapse it by renaming column using a list comprehension with the help of fstrings
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
#merge new frame to old
df2=pd.merge(df,df1, how ='left', on='Unit')
print(df2)
Unit GrowthYr GrowYrAbr AvgHt MaxHt AvgHt_21 AvgHt_22 AvgHt_23 \
0 2021 21 0.59 0.94 9.58 1.495 NaN NaN
1 2022 22 0.85 1.30 10.05 NaN 1.865 NaN
2 2023 23 1.12 1.68 10.52 NaN NaN 2.245
3 2021 21 1.41 2.05 5.61 1.495 NaN NaN
4 2022 22 1.71 2.43 6.14 NaN 1.865 NaN
5 2023 23 2.01 2.81 6.67 NaN NaN 2.245
GrowYrAbr_21 GrowYrAbr_22 GrowYrAbr_23 MaxHt_21 MaxHt_22 MaxHt_23
0 1.0 NaN NaN 7.595 NaN NaN
1 NaN 1.28 NaN NaN 8.095 NaN
2 NaN NaN 1.565 NaN NaN 8.595
3 1.0 NaN NaN 7.595 NaN NaN
4 NaN 1.28 NaN NaN 8.095 NaN
5 NaN NaN 1.565 NaN NaN 8.595
Hi,
I appreciate your insight. My goal is to flatten the table such that each unit has growth model heights for year 2021, 2022, 2023 on one row so it can be joined with a feature class in a one-to-one relationship.
I wonder if I can store the attributes for the three years of growth for each unit and write it out in another file as one record as opposed to three? I will continue to explore this. Thanks for taking the time to respond.
Depending on what aggregation function you want the 3 years for one unit code in the output, try this out. I've used sum and you can change the aggfunc to mean, ....
df.pivot_table(values=['GrowYrAbr','AvgHt','MaxHt','WstHt','AvgHt21','MaxHt21','WstHt21','AvgHt22','MaxHt22','WstHt22','AvgHt23','MaxHt23','WstHt23'],
columns='Unit',
aggfunc='sum',
fill_value=0,
margins=False,
dropna=True,)
Hi MehdiPira1,
It is extremely helpful being made aware of the aggregation function. I feel as though the groupby function might be of value to me also. I will continue to explore these approaches and update the thread with my findings - thank you.