I am working with COVID19 case data and created a dashboard. I have a Jupyter Notebook inside ArcGIS Pro to process the heath dept.'s CSV file everyday. That said, I am a complete novice with Python and fumbled my way though but got something working.
I now have a request to show the daily change in cases from the previous day. The source data table just lists the cumulative cases each day over time and lumps the dates together:
ID Date FIPS Cases
1 5/7/20 001 25
2 5/7/20 002 13
3 5/6/20 001 23
4 5/6/20 002 9
5 5/5/20 001 21
6 5/5/20 002 8
7 5/4/20 001 21
8 5/4/20 002 6
I would like to add a field where it contains the change in value from the previous day:
ID Date FIPS Cases Difference
1 5/7/20 001 25 2
2 5/6/20 001 23 2
3 5/5/20 001 21 0
4 5/4/20 001 21 0 (because this is the starting value)
5 5/7/20 002 13 4
6 5/6/20 002 9 1
7 5/5/20 002 8 2
8 5/4/20 002 6 0 (start)
The goal is a time series chart showing the sum of the daily changes for all FIPS by date (but might need to show them by FIPS as well). I know others are doing it but maybe their source data is supplied that way.
This seems like it should be fairly simple but I don't know where to start. Right now I am downloading the csv, truncating the table in my GDB, then appending the csv data to the table to refresh it every day. I think I need a bit of code to run daily to recalculate the difference after I grab the new day's values.
Appreciate any direction. Thanks!
Sara, if you have Data Interoperability extension then change detection between next/previous values in a series is available in the AttributeManager. I don't want to send you down this path if its all new to you though as you're working on response data. If you need to pursue this let me know.
Thanks for the quick reply, but I'm afraid I do not have that extension. Sounds ideal though.
You could do it with dictionaries. Use the FIPS as the key. Each value could be a list of tuples with 2 values (date, case_num) After you build the dictionaries sort each list by the date part of the tuple. Then take the top two list elements and subtract the case number. Output data done!
Maybe something like this... (not jupyter notebook)
code:
import csv with open('case.csv', 'r') as f: reader = csv.reader(f) data = list(reader) data.pop(0) data = [x[1:] for x in data] data_dict = {x[1]: [] for x in data} for row in data: data_list = data_dict[row[1]] data_list.append((row[0], row[2])) data_dict[row[1]] = data_list for key, value in data_dict.items(): value.sort(reverse=True) for i in range(0, len(value) - 1): diff = int(value[1]) - int(value[i + 1][1]) print '{},{},{},{}'.format(key, value[0], value[1], diff) print '{},{},{},{}'.format(key, value[-1][0], value[-1][1], 0)
csv:
ID,Date,FIPS,Cases 7,5/4/20,001,21 8,5/4/20,002,6 1,5/7/20,001,25 2,5/7/20,002,13 3,5/6/20,001,23 4,5/6/20,002,9 5,5/5/20,001,21 6,5/5/20,002,8
output:
002,5/7/20,13,4 002,5/6/20,9,1 002,5/5/20,8,2 002,5/4/20,6,0 001,5/7/20,25,2 001,5/6/20,23,2 001,5/5/20,21,0 001,5/4/20,21,0
Thanks for providing this sample - I'll see if I can make it work. I had some real hope today that Esri's new Coronavirus Recovery Dashboard was going to take care of this for me but alas, they require that the data already has the daily increase to calculate the trends.
you're welcome... I updated the code sample so the output will do the calc for all dates, not just the top two.