Updating records in a dataframe

ejuser · ‎11-18-2020

I am new to panda dataframes and would really appreciate some guidance on how to transpose the GrothYrAbr for each Unit (3 records everytime) into the first row of each unit based on the dataframe structure shown below. Thanks kindly.

Unit	GrowthYr	GrowYrAbr	AvgHt	MaxHt	WstHt	AvgHt21	MaxHt21	WstHt21	AvgHt22	MaxHt22	WstHt22	AvgHt23	MaxHt23	WstHt23
65110260	2021	21	0.59	0.94	9.58	0.59	0.94	9.58	0.85	1.30	10.05	1.12	1.68	10.52
65110260	2022	22	0.85	1.30	10.05
65110260	2023	23	1.12	1.68	10.52
65110290	2021	21	1.41	2.05	5.61	1.41	2.05	5.61	1.71	2.43	6.14	2.01	2.81	6.67
65110290	2022	22	1.71	2.43	6.14
65110290	2023	23	2.01	2.81	6.67

MehdiPira1 · ‎11-18-2020

Hi @ejuser ,

Pivot method in pandas might do what you are looking for. See below:

import pandas as pd

df = pd.read_csv(path to csv file)
df.pivot(index='Unit', columns='GrowYrAbr', values=['AvgHt','MaxHt','WstHt','AvgHt21','MaxHt21','WstHt21','AvgHt22','MaxHt22','WstHt22','AvgHt23','MaxHt23','WstHt23'])

you can change the index and columns if this is not the result you are after.

Hope that's helpful.

ejuser · ‎11-18-2020

Hi,

Thanks for the suggestion. When I attempt to run it, I get the following:

ValueError: Index contains duplicate entries, cannot reshape

This is due to the Unit not being unique. Each unit is repeated three times in the dataframe.

ricslator · ‎05-10-2021

Thanks for the suggestion. When I attempt to run it, I get the following:

ValueError: Index contains duplicate entries, cannot reshape

Try to remove duplicate using drop_duplicates(). Pandas drop_duplicates() returns only the dataframe's unique values.

drop_duplicates(subset=None, keep="first", inplace=False)

wwnde · ‎11-18-2020

Not sure the efficaccy of the solution you are after but this is how you go about it.

import pandas as pd
#Pivot df to send GrowthYrs to columns and come up with new frame
df1 = pd.pivot_table(df, values=['GrowYrAbr', 'AvgHt', 'MaxHt'], index=['Unit'],columns=['GrowthYr'])

#Step above results in mutliindex, so collapse it by renaming column using a list comprehension with the help of fstrings
df1.columns = [f'{a}_{b}' for a, b in df1.columns]

#merge new frame to old
df2=pd.merge(df,df1, how ='left', on='Unit')

print(df2)

 Unit  GrowthYr  GrowYrAbr  AvgHt  MaxHt  AvgHt_21  AvgHt_22  AvgHt_23  \
0  2021        21       0.59   0.94   9.58     1.495       NaN       NaN   
1  2022        22       0.85   1.30  10.05       NaN     1.865       NaN   
2  2023        23       1.12   1.68  10.52       NaN       NaN     2.245   
3  2021        21       1.41   2.05   5.61     1.495       NaN       NaN   
4  2022        22       1.71   2.43   6.14       NaN     1.865       NaN   
5  2023        23       2.01   2.81   6.67       NaN       NaN     2.245   

   GrowYrAbr_21  GrowYrAbr_22  GrowYrAbr_23  MaxHt_21  MaxHt_22  MaxHt_23  
0           1.0           NaN           NaN     7.595       NaN       NaN  
1           NaN          1.28           NaN       NaN     8.095       NaN  
2           NaN           NaN         1.565       NaN       NaN     8.595  
3           1.0           NaN           NaN     7.595       NaN       NaN  
4           NaN          1.28           NaN       NaN     8.095       NaN  
5           NaN           NaN         1.565       NaN       NaN     8.595

ejuser · ‎11-19-2020

Hi,

I appreciate your insight. My goal is to flatten the table such that each unit has growth model heights for year 2021, 2022, 2023 on one row so it can be joined with a feature class in a one-to-one relationship.

I wonder if I can store the attributes for the three years of growth for each unit and write it out in another file as one record as opposed to three? I will continue to explore this. Thanks for taking the time to respond.

MehdiPira1 · ‎11-19-2020

@ejuser

Depending on what aggregation function you want the 3 years for one unit code in the output, try this out. I've used sum and you can change the aggfunc to mean, ....

df.pivot_table(values=['GrowYrAbr','AvgHt','MaxHt','WstHt','AvgHt21','MaxHt21','WstHt21','AvgHt22','MaxHt22','WstHt22','AvgHt23','MaxHt23','WstHt23'],
    columns='Unit',
    aggfunc='sum',
    fill_value=0,
    margins=False,
    dropna=True,)

ejuser · ‎11-20-2020

Hi MehdiPira1,

It is extremely helpful being made aware of the aggregation function. I feel as though the groupby function might be of value to me also. I will continue to explore these approaches and update the thread with my findings - thank you.