Cumulative Sum by year?

4560
13
01-09-2013 06:27 AM
SpencerMeyer
New Contributor
Hello,

I have a table that is something like:

Object    Year    Acres
1            1980   5
2            1979   3
3            1978   2
4            1980   8
5            1979   2
6            1976   6

I would like to use python to create a cumulative sum so that I end up with a table that looks like:
1976       6
1978       2
1979       5
1980       13

I'm going to be graphing this using matplotlib, showing trends in acres over time.

I'm a beginner with python. So far I've tried to use FeatureClasstoNumPyArray and numpy's cumsum, but I can't get the array structured correctly to sum by year. I think part of my problem is that I don't really understand how an attribute table gets converted to an array.

If numpy is the wrong approach, pleases let me know. My requirements are that I can graph cumulative acres over time from a dataset that has polygons of various sizes occuring in specific years. If it matters for processing time, this dataset has about 140,000 records. Also, if you have any suggestions for python graphing modules or sample code that will do with kind of thing well, I'd like to know about them.

thanks!
Tags (2)
0 Kudos
13 Replies
T__WayneWhitley
Frequent Contributor
You're making this harder than it needs to be... use the Summary Statistics tool (or the Frequency tool if you prefer and licensed for it).

You can simply execute the tool from Toolbox or script it, up to you, but it's the only tool you will need to perform this function...could also use a cursor, but it likely isn't necessary...

EDIT-  ...so Year would be your 'case' and Acres would be your 'statistic' to sum for.
MinhazulIslam919
New Contributor II

Thanks  a lot for the tip. It worked for me. Easy and neat.

0 Kudos
SpencerMeyer
New Contributor
Hello,


I would like to use python to create a cumulative sum so that I end up with a table that looks like:
1976       6
1978       2
1979       5
1980       13



Actually, I meant THIS is what I want:
1976       6
1978       8
1979       13
1980       26

Summary stats gets me the first part, but then I need to do a cumulative sum by year so I can graph such that the x-axis is the year, and y-axis is total, cumulative acres.

I'm sorry I didn't explain it correctly the first time. Thanks for your quick response.
0 Kudos
T__WayneWhitley
Frequent Contributor
Oh I see, looks like you are sorting by year (which would make sense on an axis)....this is a bit of a workaround, but it will work if you take the table result of the Summary Statistics output and input that into a sorted update cursor (by year), then in a new field calculate the previous record acreage + the current record acreage... make sense?

In other words, use a variable to 'hold' the last record's acreage value + the cummulative acreage to this point in the processing --- so in your example initiating the update cursor to the 1st record, you can fetch the 1st acreage value:  6  (initiating the variable to 0, setvalue for the new 'cummulative value' field to 0)

Iterate over the records, the next record value is for 1978 is 2 (using getvalue) --- add that to your current var value of 6, that's 6 + 2 = 8.  setvalue, update the var to 8, go to the next row...

Next row is for 1979 is 5, so 8 + 5 = 13...

Get it?  The code is easy to write, see the webhelp examples for cursor processing...post back if you get stuck.

-Wayne
0 Kudos
T__WayneWhitley
Frequent Contributor
...in short, see the code below (also the webhelp link below that will help explain).  Note that comments to aid you in following the code are preceded by the '#' symbol.  Sorry, I forgot you said you were just beginning to use Python.
# import the module, arcpy...
import arcpy

# Set the current workspace environment variable.
arcpy.env.workspace = r'C:\Documents and Settings\whitley-wayne\My Documents\ArcGIS\Default.gdb'

# I set up a 'mock' table representing the summary output table,
# called 'SummaryStatsOutput'.  Set the table variable 'tbl' to that...
tbl = 'SummaryStatsOutput'

# Define another variable 'accumVal' to represent accumulated ACREAGE.
# Initiate it to zero (0).
accumVal = 0

# Establish the 'rows' cursor variable on tbl, which takes 5 parameters -
# The 3 middle ones are 'blank'...
# The last one is the sort parameter - sorting by YEAR...
# (since my mock table was not in order).
rows = arcpy.UpdateCursor(tbl, '', '', '', 'YEAR A')

# looping on 'row' objects contained in the 'rows' cursor object...
for row in rows:
     # Adding the current 'getValue' fetch from the ACREAGE field to accumVal.
     # (This is '0' for 1st record).
     accumVal = accumVal + row.getValue('ACREAGE')
     # Setting the ACREAGE field to the 'new' accumVal.
     # (a new field wasn't necessary)
     row.setValue('ACREAGE', accumVal)
     # Updating the row object within the cursor object
     # (committing the changes)
     rows.updateRow(row)

# Outside the loop, delete the objects to remove lock reference on table
del row, rows


Here is the 10 webhelp for the update cursor:
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//000v0000003m000000
0 Kudos
JamesCrandall
MVP Frequent Contributor
Check out the pandas library -- it is very slick for time-series work.  Plus, you can integrate with your matplotlib work too.

http://pandas.pydata.org/pandas-docs/dev/timeseries.html
0 Kudos
JamesCrandall
MVP Frequent Contributor
Tip: if you plan to use matplotlib to generate many figures from inside of a loop, make sure to close the figure each time you are finished saving it out! 

plt.figure()

#...plotting code here

plt.savefig(out)
plt.close()



If you don't you will get severe memory leak and the ArcGIS software you are running the python script from will bomb out.  In my case I am running this as an ArcToolbox script and ArcCatalog would shutdown after a few hundred figures were generated.
0 Kudos
ChrisSnyder
Regular Contributor III
In Python (and using the data access cursor model in ArcGIS v10.1) the code would look like this... Basically a re-write of the 'SUM' functionality of summary statistics tool using da cursors and a Python Dictionary. You would then use an insertCursor to write the values from 'summaryDict' back to a table on disk (not shown) or maybe you could hand the values in the dictionary straight to matplotlib?


myTable = r"C:\test.gdb\my_table"
summaryDict = {}
searchRows = arcpy.da.SearchCursor(myTable, ["YEAR","ACRES"])
for searchRow in searchRows:
    yearValue, acresValue = searchRow
    if yearValue in summaryDict:
        summaryDict[yearValue] = summaryDict[yearValue] + acresValue
    else:
        summaryDict[yearValue] = acresValue
yearKeys = summaryDict.keys()
yearKeys.sort() #sort the years in ascending order
#Print some output
for yearValue in yearKeys:
    print str(yearValue) + " = " + str(summaryDict[yearValue]) + " acres"
MatthewHowe
New Contributor III

Great stuff thanks! I'm hoping to use this to create creaming curves for oil and gas exploration. What I have noticed is that it creates totals of a value per yer but I think the main aim was to create cumulative totals. As an example I used this code on my data:

2011 = 35.0 mmboe

2012 = 616.666667 mmboe

2013 = 389.166667 mmboe

2014 = 348.333333 mmboe

2015 = 107.5 mmboe

2016 = 433.333333 mmboe

Of which is correct as it creates totals for each year. Is it possible to create a cumulative total of this so each previous year is added to the total of the following year? I then want to plot cumulative curves using matplotlib.

0 Kudos