Updating records in a dataframe

1590
7
11-18-2020 04:40 PM
ejuser
by
New Contributor III

I am new to panda dataframes and would really appreciate some guidance on how to transpose the GrothYrAbr for each Unit (3 records everytime) into the first row of each unit based on the dataframe structure shown below.  Thanks kindly.

UnitGrowthYrGrowYrAbrAvgHtMaxHtWstHtAvgHt21MaxHt21WstHt21AvgHt22MaxHt22WstHt22AvgHt23MaxHt23WstHt23
651102602021210.590.949.580.590.949.580.851.3010.051.121.6810.52
651102602022220.851.3010.05         
651102602023231.121.6810.52         
651102902021211.412.055.611.412.055.611.712.436.142.012.816.67
651102902022221.712.436.14         
651102902023232.012.816.67         
Tags (3)
0 Kudos
7 Replies
MehdiPira1
Esri Contributor

Hi @ejuser ,

Pivot method in pandas might do what you are looking for. See below:

import pandas as pd

df = pd.read_csv(path to csv file)
df.pivot(index='Unit', columns='GrowYrAbr', values=['AvgHt','MaxHt','WstHt','AvgHt21','MaxHt21','WstHt21','AvgHt22','MaxHt22','WstHt22','AvgHt23','MaxHt23','WstHt23'])

MehdiPira1_0-1605750163573.png

you can change the index and columns if this is not the result you are after.

Hope that's helpful.

 

ejuser
by
New Contributor III

Hi,

Thanks for the suggestion. When I attempt to run it, I get the following:

ValueError: Index contains duplicate entries, cannot reshape

 

This is due to the Unit not being unique. Each unit is repeated three times in the dataframe. 

0 Kudos
ricslator
New Contributor

Thanks for the suggestion. When I attempt to run it, I get the following:

ValueError: Index contains duplicate entries, cannot reshape

Try to remove duplicate using drop_duplicates(). Pandas drop_duplicates() returns only the dataframe's unique values.

drop_duplicates(subset=None, keep="first", inplace=False)

 

0 Kudos
wwnde
by
Occasional Contributor

Not sure the efficaccy of the solution you are after but this is how you go about it.

 

 

import pandas as pd
#Pivot df to send GrowthYrs to columns and come up with new frame
df1 = pd.pivot_table(df, values=['GrowYrAbr', 'AvgHt', 'MaxHt'], index=['Unit'],columns=['GrowthYr'])

#Step above results in mutliindex, so collapse it by renaming column using a list comprehension with the help of fstrings
df1.columns = [f'{a}_{b}' for a, b in df1.columns]

#merge new frame to old
df2=pd.merge(df,df1, how ='left', on='Unit')

print(df2)

 Unit  GrowthYr  GrowYrAbr  AvgHt  MaxHt  AvgHt_21  AvgHt_22  AvgHt_23  \
0  2021        21       0.59   0.94   9.58     1.495       NaN       NaN   
1  2022        22       0.85   1.30  10.05       NaN     1.865       NaN   
2  2023        23       1.12   1.68  10.52       NaN       NaN     2.245   
3  2021        21       1.41   2.05   5.61     1.495       NaN       NaN   
4  2022        22       1.71   2.43   6.14       NaN     1.865       NaN   
5  2023        23       2.01   2.81   6.67       NaN       NaN     2.245   

   GrowYrAbr_21  GrowYrAbr_22  GrowYrAbr_23  MaxHt_21  MaxHt_22  MaxHt_23  
0           1.0           NaN           NaN     7.595       NaN       NaN  
1           NaN          1.28           NaN       NaN     8.095       NaN  
2           NaN           NaN         1.565       NaN       NaN     8.595  
3           1.0           NaN           NaN     7.595       NaN       NaN  
4           NaN          1.28           NaN       NaN     8.095       NaN  
5           NaN           NaN         1.565       NaN       NaN     8.595 

 

ejuser
by
New Contributor III

Hi,

I appreciate your insight. My goal is to flatten the table such that each unit has growth model heights for year 2021, 2022, 2023 on one row so it can be joined with a feature class in a one-to-one relationship.

I wonder if I can store the attributes for the three years of growth for each unit and write it out in another file as one record as opposed to three? I will continue to explore this. Thanks for taking the time to respond.

 

 

0 Kudos
MehdiPira1
Esri Contributor

@ejuser 

Depending on what aggregation function you want the 3 years for one unit code in the output, try this out. I've used sum and you can change the aggfunc to mean,  .... 

 

df.pivot_table(values=['GrowYrAbr','AvgHt','MaxHt','WstHt','AvgHt21','MaxHt21','WstHt21','AvgHt22','MaxHt22','WstHt22','AvgHt23','MaxHt23','WstHt23'],
    columns='Unit',
    aggfunc='sum',
    fill_value=0,
    margins=False,
    dropna=True,)

 

MehdiPira1_0-1605855269765.png

 

ejuser
by
New Contributor III

Hi MehdiPira1,

It is extremely helpful being made aware of the aggregation function. I feel as though the groupby function might be of value to me also. I will continue to explore these approaches and update the thread with my findings - thank you.

0 Kudos