Calculate daily change between unique values

786
5
05-07-2020 09:50 AM
SaraKidd
Occasional Contributor

I am working with COVID19 case data and created a dashboard. I have a Jupyter Notebook inside ArcGIS Pro to process the heath dept.'s CSV file everyday. That said, I am a complete novice with Python and fumbled my way though but got something working.

I now have a request to show the daily change in cases from the previous day. The source data table just lists the cumulative cases each day over time and lumps the dates together:

ID   Date   FIPS   Cases

1   5/7/20   001   25

2   5/7/20   002   13

3   5/6/20   001   23

4   5/6/20   002   9

5   5/5/20   001   21

6   5/5/20   002   8

7   5/4/20   001   21

8   5/4/20   002   6

I would like to add a field where it contains the change in value from the previous day:

ID   Date   FIPS   Cases   Difference

1   5/7/20   001   25            2

2   5/6/20   001   23            2

3   5/5/20   001   21            0

4   5/4/20   001   21            0 (because this is the starting value)

5   5/7/20   002   13            4

6   5/6/20   002   9              1

7   5/5/20   002   8              2

8   5/4/20   002   6              0 (start)

The goal is a time series chart showing the sum of the daily changes for all FIPS by date (but might need to show them by FIPS as well). I know others are doing it but maybe their source data is supplied that way. 

This seems like it should be fairly simple but I don't know where to start. Right now I am downloading the csv, truncating the table in my GDB, then appending the csv data to the table to refresh it every day. I think I need a bit of code to run daily to recalculate the difference after I grab the new day's values. 

Appreciate any direction. Thanks!

5 Replies
BruceHarold
Esri Regular Contributor

Sara, if you have Data Interoperability extension then change detection between next/previous values in a series is available in the AttributeManager.  I don't want to send you down this path if its all new to you though as you're working on response data.  If you need to pursue this let me know.

0 Kudos
SaraKidd
Occasional Contributor

Thanks for the quick reply, but I'm afraid I do not have that extension. Sounds ideal though.

0 Kudos
forestknutsen1
MVP Regular Contributor

You could do it with dictionaries. Use the FIPS as the key. Each value could be a list of tuples with 2 values (date, case_num)  After you build the dictionaries sort each list by the date part of the tuple. Then take the top two list elements and subtract the case number. Output data done!

Maybe something like this... (not jupyter notebook)

code:

import csv

with open('case.csv', 'r') as f:
    reader = csv.reader(f)
    data = list(reader)
    data.pop(0)
    data = [x[1:] for x in data]
    data_dict = {x[1]: [] for x in data}
    for row in data:
        data_list = data_dict[row[1]]
        data_list.append((row[0], row[2]))
        data_dict[row[1]] = data_list
    for key, value in data_dict.items():
        value.sort(reverse=True)
        for i in range(0, len(value) - 1):
            diff = int(value[1]) - int(value[i + 1][1])
            print '{},{},{},{}'.format(key, value[0], value[1], diff)
        print '{},{},{},{}'.format(key, value[-1][0], value[-1][1], 0)

csv:

ID,Date,FIPS,Cases
7,5/4/20,001,21
8,5/4/20,002,6
1,5/7/20,001,25
2,5/7/20,002,13
3,5/6/20,001,23
4,5/6/20,002,9
5,5/5/20,001,21
6,5/5/20,002,8

output:

002,5/7/20,13,4
002,5/6/20,9,1
002,5/5/20,8,2
002,5/4/20,6,0
001,5/7/20,25,2
001,5/6/20,23,2
001,5/5/20,21,0
001,5/4/20,21,0
SaraKidd
Occasional Contributor

Thanks for providing this sample - I'll see if I can make it work. I had some real hope today that Esri's new Coronavirus Recovery Dashboard was going to take care of this for me but alas, they require that the data already has the daily increase to calculate the trends.

0 Kudos
forestknutsen1
MVP Regular Contributor

you're welcome... I updated the code sample so the output will do the calc for all dates, not just the top two.

0 Kudos