Data Management Blog

Showing results for 
Search instead for 
Did you mean: 

Latest Activity

(37 Posts)
MVP Esteemed Contributor

I've been wrestling with the Address Data Management Solution for some time, and today, I finally got the toughest rule to work with my attributes: Split Intersecting Roads.  This rule is a really powerful editing tool as it will split an existing centerline into two segments when you add a new street that intersects it.  In essence it:

  1. Copies attributes from the original existing street to the 'new segment'
  2. Does a proportional division of the address ranges so you don't have to

To begin with, you need to  create a list of the attributes you want to copy during the split.  This is done near the top of the script:

The orginal looks like this:

var centerline_field_names = ["rclnguid", "discrpagid", "rangeprefixleft", "fromleft",
"toleft", "parityleft", "rangeprefixright", "fromright", "toright", "parityright",
"fullname","fedroute", "fedrtetype", "afedrte", "afedrtetype", "stroute", "strtetype",
"astrte", "astrtetype", "ctyroute", "onewaydir", "roadlevel", "inwater", "roadclass",
"countryleft", "countryright", "stateleft", "stateright", "countyleft",
"countyright","munileft", "muniright", "zipleft", "zipright", "msagleft", "msagright",
"esnleft", "esnright"]

Mine looks like this:

var centerline_field_names = ["FROMADDR_L","TOADDR_L","FROMADDR_R","TOADDR_R","PREDIR",

I used the arcpy.ListFields() function and a little text formatting in python to generate the list.  Something to take note of is the attribute names provided in the solution are all lower case and you can see mine are upper case as well as proper case. More on this further down. 

Any reference to the feature class name needs to be adjusted: 

There are two references to the feature class name:

var intersectingRoads = Intersects(FeatureSetByName($datastore, "RoadCenterline"), geom);

and down at the bottom of the script:

'edit': [{'className': 'RoadCenterline', 'adds': adds, 'updates': updates}]

Mine looks like this:

var intersectingRoads = Intersects(FeatureSetByName($datastore, "MSD.SLCOMSD.CenterlinesMSD"), geom);


edit': [{'className': 'MSD.SLCOMSD.CenterlinesMSD', 'adds': adds, 'updates': updates}]

My lines 11 and 15 show the full feature class name with Owner.Database as part of the name since I'm doing the edits in an Enterprise GBD (aka SDE); if you are using a file gdb, you'll only need the feature class name.

About midway down in the script, a series of variables are set to the field names where your address range info is stored:

The original references these attributes:

// Get the address range of the intersecting road
var fromRight = road.fromright;
var toRight = road.toright;
var fromLeft = road.fromleft;
var toLeft = road.toleft;

While I reference:

var fromRight = road.FROMADDR_R;
var toRight = road.TOADDR_R;
var fromLeft = road.FROMADDR_L;
var toLeft = road.TOADDR_L;

A little further down in the script, there are references again to the address range variables that get written to a dictionary:

The original:

var attributes = {}
if (newToFromRight[0] != null) attributes['toright'] = newToFromRight[0];
if (newToFromLeft[0] != null) attributes['toleft'] = newToFromLeft[0];

And mine:

var attributes = {}
if (newToFromRight[0] != null) attributes['TOADDR_R'] = newToFromRight[0];
if (newToFromLeft[0] != null) attributes['TOADDR_L'] = newToFromLeft[0];

As mentioned earlier, how you have your attribute names with respect to upper, lower, or proper case is an issue you need to deal with.  If all your field names are lower case, this isn't an issue for you.  But it is for me:

The original script uses a Lower() function three times:

for (var k in featureAttributes) {
if (Lower(k) == "fromright" && newToFromRight[1] != null) {
newAttributes['fromright'] = newToFromRight[1];
} else if (Lower(k) == "fromleft" && newToFromLeft[1] != null) {
newAttributes['fromleft'] = newToFromLeft[1];
} else if (IndexOf(centerline_field_names, Lower(k)) > -1 && featureAttributes != null) {
newAttributes = featureAttributes;
} else {

But the Lower() function trips with my schema, so I just remove them
while again referencing my address range fields:

var newAttributes = {};
for (var k in featureAttributes) {
if (k == "FROMADDR_R" && newToFromRight[1] != null) {
newAttributes['FROMADDR_R'] = newToFromRight[1];
} else if (k == "FROMADDR_L" && newToFromLeft[1] != null) {
newAttributes['FROMADDR_L'] = newToFromLeft[1];
} else if (IndexOf(centerline_field_names, k) > -1 && featureAttributes != null) {
newAttributes = featureAttributes;
} else {

Finally, I don't have a field called centerlineid but I do have field that is used in a similar fashion called UNIQUE_ID.  In the solution, the centerlines feature class related to the Alias Road Name table through the centerlineid field. And, that centerlineid field is updated with a Database Sequence.  If you plan to use centerlineid in your application, you are just fine.  However, if you are using another field name you'll need to perform a search and replace on centerlineid in the scipt and replace it with your attribute name.  You'll also need to add that attribute name to the Alias Road Name table so the relationship class works.

None of this would have been possible without a boat load of help and even more patience from Chris Fox‌.  Thanks Chris!

0 5 121
MVP Honored Contributor

The process described in this Blog requires an Advanced license.

Some polygon feature classes contain a set of overlapping features that accurately represent the boundaries of the attribute(s) that each polygon contains.  For example, polygon boundaries could each represent the area covered by a different policy and if more than one policy applied to a property then multiple policy area boundaries would overlap. Typically they can't be symbolized with solid fills and have to use hatch symbols and any classification of the features as being overlapped cannot identify which portion is overlapped.


This configuration is not useful if you want to select just the overlapping portions of the policy areas or you need to intersect this feature class with another feature class without duplicating portions of the other feature class where these features overlap.

This Blog provides a method for transforming overlapping polygons like this into a new polygon feature class that contains separate contiguous polygons with no overlaps and combines any attribute values that originally overlapped into a list that defines each new feature.  The steps of the process are:


  1. Run the Union tool in the Analysis toolbox under the Overlay toolset on just the feature class that contains the overlapping polygons
  2. Run the Multipart to Singlepart tool in the Data Management toolbox under the Features toolset on the Union output to separate all multipart polygons into individual polygons for each part.
  3. Run the Feature to Point Tool in the Data Management toolbox under the Features toolset on the Multipart to Singlepart output with the Inside option checked to extract the attributes to a set of points that fall inside of the polygon that they came from.  NOTE: If slivers are extremely small or narrow the tool can fail to run.  You may want to select features that have large areas or that are small with a larger thinness ratios.  If your length field is in feet and your area field is in square feet you could 1000 square feet as the cut off for larger features and do the selection with this SQL:  (SHAPE_Area >= 1000 OR 4 * 3.14 * Shape_Area / (Shape_Length * Shape_Length) > 0.3 AND SHAPE_Area < 1000))
  4. Run the Feature to Line tool in the Data Management toolbox under the Features toolset on the Multipart to Singlepart output with the no relationship option and do not preserve attributes.
  5. Run the Feature to Polygon tool in the Data Management toolbox under the Features toolset on the Feature to Line output to create unique single-part, contiguous polygons for all the areas enclosed by the lines.
  6. Run the Spatial Join tool in the Analysis toolbox under the Overlay toolset with the following settings: 
    1. Make the Feature to Polygon output the Target Features
    2. Make the Feature to Point output the Join Features. 
    3. Use the JOIN_ONE_TO_ONE option
    4. Use the Keep all target features option.
    5. For the attributes you want to transfer to the Polygons from the Point, in the field map right click each field and access its properties.  Change the following in the Output Field Properties:
      1. Make sure to change the field Type is set to Text if it not Text already
      2. Change the field Length to 255 characters (possibly more if the tool fails because the output field is too small and you are outputting to a geodatabase)
      3. Change the field Merge Rule to Join
      4. Set the Delimiter to a character or set characters that are not contained in the original attributes, like a semicolon (";").
  7. The final output features will have the following characteristics:
    1. Features with attributes that do not contain the delimiter character(s) were not overlapping and the attribute is identical to the original feature.  These features can be selected with an SQL expression similar to:
    2. Features with attributes that contain the delimiter character(s) were originally overlapping, but now are a single feature that has a list of all the attributes of the original overlapping features separated by the delimiter.  These features can be selected with an SQL expression similar to:
      POLICY_NAME LIKE '%;%'
    3. Features with Null attributes were not originally features, but they were empty spaces fully enclosed by the original features that can be eliminated or merged into surrounding features if they are slivers.  These features can be selected with an SQL expression similar to:
  8. The features in the picture below with solid fills are just the portions of the original polygons that overlapped.  The features are contiguous without any overlaps and the attributes associated with each portion list all of the overlapping values in a single feature.  They can be symbolized or selected separately and won't cause feature duplication due to overlaps when they are intersected with other feature classes. 


I hope you find this useful.

0 0 84
Esri Contributor



After upgrading a SQL Server database should you leave the compatibility level at the lowest level?




No, after upgrading a SQL Server database you should upgrade the database compatibility level.



"A Note About Compatibility Level"


For example, if upgrading the database to SQL Server 2014 change the following setting on each database:


SQL Server Management Studio > Databases > Right-click your database > Properties > Options > Compatibility Level -> SQL Server 2014 (120)




1 0 67
Esri Esteemed Contributor

This blog rolls out version 2.0 of my utility.

The folks at Johns Hopkins University have done an awesome job at maintaining their Ops Dashboard site.  They even share a folder of data snapshots as a CSV files at a minute before midnight, Greenwich time (2359 UTC).  However, the data itself is anything but static, and anyone who needs the CSV files at a more frequent pitch, if not real-time, can be frustrated by the snapshot frequency.

Fortunately, the data is all available with real-time updates -- It's in the feature service layers hosted on (ncov_cases and ncov_cases_US). But there's one complication: The file format changed with the 2020-03-23.csv file, so anyone who had tools to read the old format was left in the lurch.

I've written a Python utility which can:

  • Export feature service data in either the old (Province/State,...) or new (FIPS,...) CSV format
  • Export near real-time updates every 2 minutes, or as slowly as once every 12 hours
  • Choose to skip export if the number of confirmed cases, deaths, or recoveries don't change in any particular time-slice (sometimes the Last_Update value changes, but the values reported don't)

Attached to this blog post is a zipfile of the utility.  The usage looks like this:

D:\covid-19>python -h
usage: [-h] [--adminLevel {0,1,2}] [--verbose VERBOSE]
                      [--interval INTERVAL] [--folder FOLDER]
                      [--csvFormat CSVFORMAT] [--stopPath STOPPATH]
                      [--skipTrivial SKIPTRIVIAL] [--usOnly USONLY]
                      [--confirmedOnly CONFIRMEDONLY]
                      [--topStates {5,10,15,20,25,30,35,40,45,all}]

Emulate JHU COVID-19 data file (v2.0)

positional arguments:
                        Execution mode

optional arguments:
  -h, --help            show this help message and exit
  --adminLevel {0,1,2}  (default = 2)
  --verbose VERBOSE     Verbose reporting flag (default = False)
  --interval INTERVAL   Data retrieval interval (default = 60m)
  --folder FOLDER       Folder path for data files (default = 'data')
  --csvFormat CSVFORMAT
                        strftime format for data files
  --stopPath STOPPATH   File that indicates loop execution (default =
  --skipTrivial SKIPTRIVIAL
                        Defer writing insignificant changes flag (default =
  --usOnly USONLY       Only export US data (default = False)
  --confirmedOnly CONFIRMEDONLY
                        Only export rows with confirmed cases (default =
  --topStates {5,10,15,20,25,30,35,40,45,all}
                        Display sorted Confirmed/Deaths by US state (default =

Can generate both CSV formats (before/after) 23-Mar-2020‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The simplest use is a one-time execution (ONCE) mode:

D:\covid-19>python ONCE
  1910:   2369 rows written (  838061 /  41261 /  175737 )

D:\covid-19\demo>head data\2020-03-31_1910Z.csv
45001,Abbeville,South Carolina,US,2020-03-31 18:31:52,34.223334,-82.461707,3,0,0,0,"Abbeville, South Carolina, US"
22001,Acadia,Louisiana,US,2020-03-31 18:31:52,30.295065,-92.414197,11,1,0,0,"Acadia, Louisiana, US"
51001,Accomack,Virginia,US,2020-03-31 18:31:52,37.767072,-75.632346,7,0,0,0,"Accomack, Virginia, US"
16001,Ada,Idaho,US,2020-03-31 18:31:52,43.452658,-116.241552,163,3,0,0,"Ada, Idaho, US"
19001,Adair,Iowa,US,2020-03-31 18:31:52,41.330756,-94.471059,1,0,0,0,"Adair, Iowa, US"
29001,Adair,Missouri,US,2020-03-31 18:31:52,40.190586,-92.600782,1,0,0,0,"Adair, Missouri, US"
40001,Adair,Oklahoma,US,2020-03-31 18:31:52,35.884942,-94.658593,4,0,0,0,"Adair, Oklahoma, US"
08001,Adams,Colorado,US,2020-03-31 18:31:52,39.874321,-104.336258,152,0,0,0,"Adams, Colorado, US"
17001,Adams,Illinois,US,2020-03-31 18:31:52,39.988156,-91.187868,2,0,0,0,"Adams, Illinois, US"‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The default is --adminLevel=2 (new-style), but levels 1 and 0 are also supported (zero is the same format as one, but without any state/province data for the US, Canada, China, or Australia):

D:\covid-19>python ONCE --adminLevel=1
  1911:    315 rows written (  838061 /  41261 /  175737 )

D:\covid-19>head data\2020-03-31_1910Z.csv
Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude
New York,US,2020-03-31T18:31:52,75795,1550,0,42.165726,-74.948051
,United Kingdom,2020-03-31T18:31:40,25150,1789,135,55.378100,-3.436000
New Jersey,US,2020-03-31T18:31:52,17126,198,0,40.298904,-74.521011

D:\covid-19>python ONCE --adminLevel=0
  1911:    180 rows written (  838061 /  41261 /  175737 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Note that successive executions in the same time window will overwrite the output file!

The IMMEDIATE and DELAY options are nearly the same, except the IMMEDIATE makes a new snapshot without delay, while the DELAY mode only operates at regular intervals (all with a 1-20 second random delay, to prevent  slamming the service with synchronized queries).  The next two quote blocks were collected from two sessions running at the same time (in different directories):


D:\Projects\covid-19>python IMMEDIATE --interval=15m --adminLevel=1
  1917:    315 rows written (  838061 /  41261 /  175737 )‍‍
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍  1930:    315 rows written (  838061 /  41261 /  175737 )
  1945:    315 rows written (  838061 /  41261 /  175737 )
  2000:    315 rows written (  846156 /  41494 /  176171 )
  2015:    315 rows written (  846156 /  41494 /  176171 )
  2030:    315 rows written (  846156 /  41494 /  176171 )
  2045:    315 rows written (  846156 /  41494 /  176171 )
  2100:    315 rows written (  846156 /  41494 /  176171 )
  2115:    315 rows written (  846156 /  41494 /  176171 )
  2130:    315 rows written (  850583 /  41654 /  176714 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  2145:    315 rows written (  850583 /  41654 /  176714 )
  2200:    315 rows written (  850583 /  41654 /  176714 )
  2215:    315 rows written (  850583 /  41654 /  176714 )
  2230:    315 rows written (  855007 /  42032 /  177857 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

D:\covid-19>python DELAY --interval=15m --adminLevel=1 --skipTrivial=True‍
  1930:    315 rows written (  838061 /  41261 /  175737 )
  2000:    315 rows written (  846156 /  41494 /  176171 )
  2130:    315 rows written (  850583 /  41654 /  176714 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  2230:    315 rows written (  855007 /  42032 /  177857 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Note that the --skipTrivial=True flag is the mechanism to skip data export if no significant changes have occurred (changes to the Last_Updated field will not write a new file, but changes to any case count field will).

Assembling the new-style data format is tricky, because it needs to aggregate thousands of records from two different services (and remove the US duplicates from the ncov_cases service), the verbose mode gives an indication of what's happening:

D:\covid-19>python ONCE --verbose=True

Querying 'ncov_cases_US' service (1/5)...
   382.2 KB retrieved (476ms elapsed)
Querying 'ncov_cases_US' service (2/5)...
   281.9 KB retrieved (250ms elapsed)
Querying 'ncov_cases_US' service (3/5)...
   200.1 KB retrieved (176ms elapsed)
Querying 'ncov_cases_US' service (4/5)...
   262.2 KB retrieved (204ms elapsed)
Querying 'ncov_cases_US' service (5/5)...
   169.5 KB retrieved (161ms elapsed)
Querying 'ncov_cases' service...
   163.6 KB retrieved (169ms elapsed)
Creating datafile '2020-03-31_1936Z.csv'...
     2369 rows written (  838061 /  41261 /  175737 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

For the US-centric audience, I added a --topStates flag (which must be used with --verbose=Y) to print a summary 

D:\covid-19>python ONCE --verbose=True --topStates=15

Querying 'ncov_cases_US' service (1/5)...
   390.0 KB retrieved (429ms elapsed)
Querying 'ncov_cases_US' service (2/5)...
   286.2 KB retrieved (168ms elapsed)
Querying 'ncov_cases_US' service (3/5)...
   200.7 KB retrieved (189ms elapsed)
Querying 'ncov_cases_US' service (4/5)...
   268.2 KB retrieved (163ms elapsed)
Querying 'ncov_cases_US' service (5/5)...
   172.5 KB retrieved (157ms elapsed)

                      ===== Top 15 States =====

                State     Confirmed   Deaths  Counties
                New York     75798     1550        56
              New Jersey     18696      267        22
              California      8077      163        48
                Michigan      7615      259        69
           Massachusetts      6620       89        14
                 Florida      6338       77        54
                Illinois      5994       99        54
              Washington      5305      222        35
               Louisiana      5237      239        61
            Pennsylvania      4963       63        60
                 Georgia      3815      111       138
                   Texas      3726       53       131
             Connecticut      3128       69         9
                Colorado      2627       51        48
               Tennessee      2391       23        82
                 *Others     25475      462      1247

Querying 'ncov_cases' service...
   163.6 KB retrieved (132ms elapsed)
Creating datafile '2020-03-31_2220Z.csv'...
     2405 rows written (  855007 /  42032 /  177857 )

D:\covid-19>python ONCE --verbose=True --topStates=15 --adminLevel=1

Querying 'ncov_cases' service...
   163.6 KB retrieved (442ms elapsed)

                 ===== Top 15 States =====

                State     Confirmed   Deaths
                New York     75798     1550
              New Jersey     18696      267
              California      8077      163
                Michigan      7615      259
           Massachusetts      6620       89
                 Florida      6338       77
                Illinois      5994       99
              Washington      5305      222
               Louisiana      5237      239
            Pennsylvania      4963       63
                 Georgia      3815      111
                   Texas      3726       53
             Connecticut      3128       69
                Colorado      2627       51
               Tennessee      2391       23
                 *Others     25475      462

Creating datafile '2020-03-31_2221Z.csv'...
      315 rows written (  855007 /  42032 /  177857 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The format difference is due to the fact that Admin2 reporting is by county, but doesn't populate Recovered and Active, while the Admin1 reporting includes Recovered (which would allow Active to be computed, but Recovered is now a lump-sum record in the ncov_cases service, so in practice, Active and Recovered are not available).

So, how do I know this script populates the same data shared on GitHub?  Well, I wrote a validator utility to create two CovidSummary objects, then iterate from one searching the other for duplicate keys, and reporting missing rows, data mismatches, and unmatched rows, and the output was:


0 errors / 3429 lines‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Okay, I cheated a bit, since the daily JHU CSV reports that the Diamond Princess and Grand Princess cruise ships are docked on Null Island, I treated 0.0 degrees latitude/longitude as a wildcard that matches any coordinate, and I only compare coordinates to 5 places, since string comparison doesn't work well with floating-point values, but everything else has aligned perfectly. 

UPDATE @ 2100 EST: Unfortunately, sometime today, the US Admin2 jurisdictions without any confirmed cases were deleted from the ncov_cases_US service feed, the FIPS code displayed for the Northern Mariana Islands and US Virgin Islands disappeared, and the US territory without any confirmed cases (American Samoa) also disappeared (along with its FIPS code).  I've tweaked the exporter to conform to this, and it runs cleanly now:

sum1 = data2\2020-03-31.csv
sum2 = data2\2020-04-01_0110Z.csv

0 errors / 2434 lines‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

But I'm expecting Guam to have its FIPS code removed in the coming days.

Since the validation tool implements Python classes to parse and search both old-style and new-style CSV files, I've attached that as well (as

My next task is to write some code to exploit this near real-time data resource, and use it to maintain PostgreSQL tables to produce clones of the ncov_cases* services' data.

Now attached is -- Changelog:

  • Implemented adminLevel=2 with multiple queries to replicate
    three-level results in GitHub CSV files
  • Added --confirmedOnly flag to skip counties without
    Confirmed cases (obviated by feed change)
  • Added --topStates to report ordered state-wide impact
    (requires --verbose=True and non-zero --adminLevel)
  • Added FIPS lookup for US Possessions (Note: Puerto Rico = None)
  • Ignored US admin1 summary values in admin2 feed (2-digit FIPS)
  • Added validateKey to correct Combined_Key with missing spaces
    after commas
  • Removed FIPS from US possessions and suppressed US possessions
    without confirmed cases (as per 2020-03-31.csv)

== Update 01-Apr @ 0100 EST ==

Long-running service execution is great for finding bugs...

Now attached is -- Changelog:

  • Fixed significantChange field comparison error with Admin1
    field names in Admin2 service

== Update 01-Apr @ 0920 EST ==

Overnight service execution is great for finding bugs, but errors that arise after several hours are less fun...

Now attached is -- Changelog:

  • Fixed significantChange field comparison error for *all*
    Admin1 field names in the Admin2 service
  • Fixed paren alignment defect in error handling in pullData

Also attached is -- Changelog:

  • Tightened loop
  • Fixed range check defect in main

== Update 01-Apr @ 2030 EST ==

The server glitched, dropping all US data, and I got to exercise some error code that doesn't normally see traffic, which I've now tweaked to be more resilient.

Attached is -- Changelog:

  • Fixed TypeError when service result is empty (None)

== Update 08-Apr @ 1600 EST ==

I tweaked the exporter to always write data at UTC midnight, so that the validator has the same Last_Update value for comparison with the JHU published file. I had to improve date parsing in the CovidReport class (the corrected 2020-04-06.csv has a different date format). I'm still seeing some JHU data with incorrect 000xx FIPS codes, and the Admin1 and Admin2 feeds are out of sync for the Northern Mariana Islands, but otherwise looking good.

Attached is -- Changelog:

  • Tweak to force 0000Z download, even when --skipTrivial is
    enabled (for comparison with GitHub published files)

Also attached is -- Changelog:

  • Cleaned up CovidReport.extract() to handle bad formatting
    gracefully and to support the 'm/d/y HH:MM' date used in
    the revised 2020-04-06.csv datafile
  • Loosened the Last_Update comparison code so that second-
    truncated timestamps can compare successfully


== Update 12-Apr @ 2320 EST ==

The JHU daily CSV format changed again, adding five more fields (and moving FIPS later in the display order). I've got the handling the new format, but am not yet generating it...

Attached is -- Changelog:

  • Count, but don't display, Combined_Key mismatches
  • Added parsing (but not yet validation) of fields added
    to 04-12-2020.csv

- V

3 5 975
Esri Esteemed Contributor

Like many, I've been using the daily data from the Johns Hopkins University Ops Dashboard site, but I've been frustrated by the periodicity and timing -- Midnight UTC is all well and good, but then you've got 12+ hour old data before folks start their workday on the US East Coast.

But there is another route to the data, since it's published as a feature service. So Sunday night I whipped up a Python script to pull the full feature feed from AGOL, and write an hourly file in the same format.  This script is plain-vanilla Python (no arcpy), and has been tested on both Python 2.7.16 and 3.8.2. Execution looks looks something like this:

D:\covid-19>python DELAY --verbose=True --interval=2h

Sleeping 35.67 minutes...

Querying feature service...
155.6 KB retrieved (310ms elapsed)
Creating datafile '2020-03-25_0600Z.csv'...
300 rows written ( 422989 / 18916 / 108578 )

Sleeping 2.00 hours...

Querying feature service...
155.6 KB retrieved (179ms elapsed)
Creating datafile '2020-03-25_0800Z.csv'...
300 rows written ( 423670 / 18923 / 108629 )

Sleeping 2.00 hours...

Querying feature service...
155.6 KB retrieved (180ms elapsed)
Creating datafile '2020-03-25_1000Z.csv'...
300 rows written ( 425493 / 18963 / 109191 )

Sleeping 2.00 hours...

Querying feature service...
156.1 KB retrieved (146ms elapsed)
Creating datafile '2020-03-25_1200Z.csv'...
301 rows written ( 435006 / 19625 / 111822 )

Sleeping 2.00 hours...

Exiting on '' request

D:\covid-19>dir/b data

D:\covid-19>head data\2020-03-25_1200Z.csv
Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude
New York,US,2020-03-25T07:24:54,26376,271,0,42.165726,-74.948051
,Korea, South,2020-03-25T07:24:40,9137,126,3730,35.907757,127.766922‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Then I worried about system load (One ought not pound the free service that one wants to use into oblivion) and update frequency (Wouldn't it be nice to get updates as they happen?), so I started working on tweaks:

  • Added argparse command-line switches to change execution parameters (vice hacking the source)
  • Added 0-20 random seconds delay for execution of the query (so synchronized servers don't overload the feature service, yet with plenty of time to finish within the targeted minute)
  • Permitted a directory tree archiving style (creating directories if necessary)

I started working on a way to only write a dataset which has had a substantive change over the previous timestamp (so the timing could be shortened to, say, 5 minutes, without needlessly transmitting files), but I discovered I can't use the Last_Update field for this (since the timestamp changes without case count changes), and then the CSV data format posted on GitHub changed Monday night to include Admin2 (county-level) data in the US (at lower resolution -- no Recovered/Active values), so I'll post this as-is for reference (or for those that now need an alternate source for data in the old format) and figure out how to model the new format...

Note that there are three modes (you need to choose one):

  • ONCE - Create one file, now, and exit (suitable for a cron job or other batch-oriented invocation at an interval greater than one hour)
  • IMMEDIATE - Create one file now, and then continue file creation based on the interval modulus (default hourly)
  • DELAY - Create first file in next expected time window (if the old job was killed, and you just want to pick up where left off)

It's possible to place files in a directory tree based on the time format.  For example, if you specify

Then the yYYYY\mMM\dDD folder will be created if necessary, and the "tHHMMZ.csv" data file added to that.

You can gracefully exit the application by creating a "" file in the active directory (or creating the file referenced by the optional --stopPath flag).

== Update 25-Mar @ 0800 EST ==

Now attached is -- Changelog:

  • Parameterized URL assembly for future US feed support
  • Added mapping for missing fields (to None value)
  • Fixed stopPath utilization defect
  • Fixed sys.exit() defect
  • Reduced wait for stopPath detection
  • Expanded interval to even divisions of hour and day

I should probably re-emphasize that this tool can be used to preserve the "old format" feed until your tools have the ability to process the new format with FIPS and Admin2 fields.

== Update 25-Mar @ 0840 EST ==

Now attached is -- Changelog:

  • Fixed IMMEDIATE computeDelay() defect

Note that the stop file will now (as of v11)  gracefully exit the app within a minute of creation.

== Update 25-Mar @ 1520 EST ==

Now attached is -- Changelog:

  • Detect low-volume data pulls and skip write operation
  • Fixed defect in byVolume sorting (for/else)
  • Fixed defect in pull error logic
  • 0.4 millisecond/existence test timing compensation
  • Made Interrupt handling in sleep cleaner

Working on nominal change detection and US counties feed support next...

== Update 25-Mar @ 1820 EST ==

Now attached is -- Changelog:

  • Added substantive change detection (only write if case
    counts or Lat/Lon are different, not if just Last_Update
    changed) - Uses new --skipTrivial flag (default False)
  • Fixed naming defect in main loop
  • Self-initializing record count expectation

The --skipTrivial flag is what I had been looking to achieve in near-realtime feed support. With it, you can set a short interval (though not less than 2 minutes) and have CSV files only written when "substantive change" is detected (row count or changes to Confirmed, Deaths, or Reported field values, mostly).  Executing it with --interval=2m (which is probably way too often -- 5, 6, 10, 12, or 15 would be better) results in output like this:

D:\covid-19>python IMMEDIATE ^
More? --verbose=True --skipTrivial=True --interval=2m

Querying feature service...
156.6 KB retrieved (232ms elapsed)
Creating datafile '2020-03-25_2207Z.csv'...
302 rows written ( 464026 / 20946 / 113691 )

Sleeping 0.15 minutes...

Querying feature service...
156.6 KB retrieved (152ms elapsed)
Skipping write of trivial change to '2020-03-25_2208Z.csv'...

Sleeping 2.07 minutes...

Querying feature service...
156.6 KB retrieved (195ms elapsed)
Skipping write of trivial change to '2020-03-25_2210Z.csv'...

Sleeping 2.16 minutes...

Querying feature service...
156.6 KB retrieved (176ms elapsed)
Skipping write of trivial change to '2020-03-25_2212Z.csv'...

Sleeping 1.80 minutes...

Querying feature service...
156.6 KB retrieved (187ms elapsed)
Skipping write of trivial change to '2020-03-25_2214Z.csv'...

Sleeping 2.20 minutes...

Querying feature service...
157.2 KB retrieved (185ms elapsed)
Creating datafile '2020-03-25_2216Z.csv'...
303 rows written ( 466836 / 21152 / 113769 )

Sleeping 1.88 minutes...

Querying feature service...
157.2 KB retrieved (179ms elapsed)
Skipping write of trivial change to '2020-03-25_2218Z.csv'...‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

My next goal is to write data in the new (post 23-Mar) JHU CSV format from the ..._US service.  I'll likely set it up to be both global and US-only.

Please let me know if you use this or encounter problems...

== Update 25-Mar @ 2200 EST ==

Blog title updated.

== Update 26-Mar @ 1800 EST ==

Now attached is -- Changelog:

  • Added time prefix to non-verbose output
  • Added --adminLevel=[0,1] flag to permit country summaries
  • Added --usOnly support to restrict results to US data
  • Fixed ONCE defect when retrieval fails
  • Fixed CSV quoting bug with Python 3
  • Note: adminLevel=2 is partially functioning, but disabled
    until a work-around for the 1000 feature limit is added

Good news: jhuEmulator has code to export new-style format files

Bad news:   Only 1000 rows can be received per query

I did add the --adminLevel flag, but right now it only functions to summarize the countries that are usually distributed with province/state level data (US, Canada, China, and Australia).

Python 3 users: Note that v1.5 fixes a formatting defect that failed to place double-quotes around values with embedded quotes.

The next snapshot should be v2.0, which will assemble multiple queries into a single file which approximates the new-format CSV style.

== Update 29-Mar @ 2330 EST ==

Now attached is -- Changelog:

  • Fixed timezone defect in datetime.datetime.fromtimestamp call
    (Last_Update had been captured in local timezone, not UTC)
  • Added date display in non-verbose operation at midnight UTC

I'm still working on the Admin2 format support. It's nearly done, but not ready for release. I discovered an ugly bug in the v15 and earlier extraaction -- the UTC dates in Last_Update are recorded in the local timezone, not UTC, so I wanted to get this fix out there, BUT I'm now showing far more cases than the JHU Ops Dashboard, so the code that was supposed to accumulate admin2 data into an admin1 summary seems to be double-counting.  The same issue exists in v15 and earlier (and v20, right now), so I'll need to figure out what went wrong...

== Update 29-Mar @ 2330 EST ==

Heh. My Chrome session needed refresh. The total in the website correlates to the extraction feed.

For what it's worth, here's the log of my last 26 hours of data extraction, pulling data every 15 minutes, but only saving if if there was non-trivial change:

D:\covid-19>python DELAY --interval=15m --skipTrivial=True
0245: 312 rows written ( 663828 / 30822 / 139451 )
0300: 312 rows written ( 664608 / 30846 / 140156 )
0400: 312 rows written ( 664695 / 30847 / 140156 )
0515: 312 rows written ( 664924 / 30848 / 140222 )
0630: 312 rows written ( 665164 / 30852 / 140225 )
0745: 312 rows written ( 665616 / 30857 / 141746 )
0845: 312 rows written ( 666211 / 30864 / 141789 )
0945: 312 rows written ( 669312 / 30982 / 142100 )
1100: 312 rows written ( 678720 / 31700 / 145609 )
1200: 312 rows written ( 679977 / 31734 / 145625 )
1315: 312 rows written ( 681706 / 31882 / 145696 )
1430: 312 rows written ( 684652 / 32113 / 145696 )
1530: 312 rows written ( 685623 / 32137 / 145706 )
1645: 312 rows written ( 691867 / 32988 / 146613 )
1815: 312 rows written ( 704095 / 33509 / 148824 )
1900: 312 rows written ( 710918 / 33551 / 148900 )
2000: 312 rows written ( 713171 / 33597 / 148995 )
2130: 312 rows written ( 716101 / 33854 / 149071 )
2230: 312 rows written ( 718685 / 33881 / 149076 )
2345: 312 rows written ( 720117 / 33925 / 149082 )
----> Mon 2020-03-30 UTC
0045: 312 rows written ( 721584 / 33958 / 149122 )
0200: 312 rows written ( 721817 / 33968 / 151204 )
0300: 312 rows written ( 722289 / 33984 / 151901 )
0400: 312 rows written ( 722435 / 33997 / 151991 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

== Update 31-Mar @ 2120 EST ==

The Admin2 format emulator is now published in the new blog post: 

There were output changes only today, so v2.0 is likely to to have some updates to keep in sync. The most significan change for anyone using the v1.x tools is that the new default file format is the post-03/23 format, but that can be overridden with --adminVersion=1.

- V

1 0 260
New Contributor II

Esri's 39th user conference is around the corner! As you are making travel plans and scheduling your time, consider sessions offered by the Geodatabase team. We've organized them into several learning paths that you can explore in this story map : Data Management Learning Paths at UC2019

For Enterprise related workshops checkout their blog:  ArcGIS Enterprise at 2019 UC 

For Utility users , visit  Electric Gas and Telecom Industry Activities UC2019 

0 0 317
New Contributor III

Ran into something interesting that might be worth sharing with the group. One of the developers on staff ran into an issue where he couldn't pull lat/long from the database because the shape fields are stored as Geometry (not geography) and we didn't store Lat/Long within the tables. For those who weren't aware you can put a SQL query in to generate X,Y for you from the shape fields present in our databases. The syntax will be a bit different based on database version (not SDE version). For MS SQL coming out of a Geometry Column you would use something like Example 1.

So how do you pull that information out for an application to use/reference? Here is what I came up with for him.

Option 1 Using Arcade:

For those using ArcGIS Pro and/or ArcGIS Online, a increasingly useful capability would be to use the Arcade scripting language. In this case, build a function. As a proof of concept, I looped it into a URL could be easily parsed via Google Maps. They can change the URL to match a Collector, Survey 123, or other application at will. See examples 2A and 2B(accidently labeled again 2A to keep everyone on their toes).

Option 2 Using SQL:

With ArcGIS you can create query layers, but when published via ArcGIS Server, which this developer was doing, there is a performance hit to the server each time it pulls one of these query layers. Sure it is more of a pain, but your Database Administrator can create and adjust views just as fast and they put the burden of rendering on the database server which usually has capacity to spare. That said, you can add columns in queries and even do math along the way. See Example 3. Keep in mind the syntax of Example 3 is for MS SQL. As most of you are using Oracle, it may be slightly different for some of the operators like PI, though that is a bad example as PI is called the same way in Oracle. The moral of the story though is double check that all the operators (functions) are correct.

declare @shift numeric
declare @x numeric
declare @y numeric

set @shift = 2.0 * PI() * 6378137.0 / 2.0

,structure.DSTYPE AS [Type]
,structure.shape.STY as Y_Coord
,structure.shape.STX as X_Coord
, (
180.0 / PI() * (2.0 * Atan( Exp( (((select structure2.shape.STY FROM WebGIS.UTIL.swDrainageStructure as structure2 where structure2.objectid = structure.OBJECTID) / @shift) * 180.0) * PI() / 180.0)) - PI() / 2.0)
) as [Latitude]
,((select structure1.shape.STX FROM WebGIS.UTIL.swDrainageStructure as structure1 where structure1.OBJECTID = structure.objectid) / @shift) * 180.0 as [Longitude]
FROM WebGIS.UTIL.swDrainageStructure as structure

1 0 165
MVP Honored Contributor

I work for an agency that maintains a Land Management System (LMS) for tracking land use cases, like subdivisions and permits, processed by our Planning and Building and Safety Departments.  For 20 years our LMS system has been integrated with GIS based on the Assessor’s Parcel numbers entered by our users.  The parcel shapes associated with these parcel numbers are used to transfer information from GIS into our LMS and to create GIS features representing our LMS cases.  However, our reliance on Assessor’s Parcel Numbers alone has produced far from perfect results due to many factors.  We recently replaced our LMS and now have a system that also integrates the Address Points feature class that my agency maintains.  This will help us solve many problems for our new cases as we go forward, but we did not attempt to create a direct association of our historic LMS address data with our GIS address points during conversion.   Fortunately I have found that there is third integration option for solving these historic LMS/GIS problems:  Subdivision and Lot descriptions.

When it applies, the Assessors of my jurisdiction enters subdivision and lot information that describes each parcel and normally my agency’s LMS users have entered the subdivision and lot number descriptions of the cases that fall on these parcels.  I have been discovering ways I can efficiently apply this data relationship to dramatically improve my agency’s historic LMS case shapes on a large scale.  This Blog will provide an overview of one of the processes I have developed for leveraging this data relationship where it exists to significantly transform my LMS case shapes in GIS.

An example of what can be accomplished in an afternoon using my techniques is shown below.

Case shapes based on original parcel numbers - 26,105 parcels countywide


Case shapes based on subdivision and lot description parcels - 124,643 parcels countywide


A Basic Overview of the Process

Without going into a lot of detail, here are the broad steps I followed to transform the case shapes below that were created from parcels that existed at the time of application before the property was subdivided.  None of the shapes below represent the true boundaries of the case applications.

1.   For the Assessor’s parcel feature class, which must be within a geodatabase, I concatenated the separate fields that make up the subdivision and lot description or each parcel into a single field using a standardized description format

2.   Extract the LMS case number and field(s) containing each case’s subdivision and lot descriptions into a file geodatabase table

3.   For the LMS data select the sets of records that contain the most common forms of a subdivision and lot description from within a field that may contain one.

4.   Calculate the subdivision and lot portions of the description into separate fields using Python parsing methods.

5.    Summarize all unique values in the field containing the lot portions of the description into a new summary table in the geodatabase containing the LMS data.

6.    Add the set of pairs of fields required to store all of the lot ranges that can be parsed from the original lot description and parse the range values into them. This is probably the hardest step, depending on the number of lot descriptions you have and the format variations you encounter.

7.   Run a script (see the end of this post for the script code) that populates a new table stored within the LMS geodatabase with the expanded set of records required to populate a new field with each individual lot number defined in the ranges of every lot description field pair in the summary table.  This step makes all of the time spend doing the previous step well worth the effort.

8.   Summarize the original LMS case data into a new table using the Summary Statistics tool so that it only has one record per case containing the case number and the legal description fields added in step 4 at minimum, although you can include as many fields in the output that you want so long as they do not cause more than one record per case to be created in the output.

9.   Make the summarized LMS case data created in step 8 into a Table view and join its lot description field populated in step 4 with same field contained in the expanded table populated by the script in step 7.

10.    Export the joined LMS case data table into a new table stored in the same geodatabase as the parcel feature class. With ArcGIS 10.3.1 and above this will cause the case records to expand to cover all of the records required to describe each individual lot for all of the case subdivision and lot descriptions.

11.   Add a new field to the export output of step 10 and calculate the concatenation of the subdivision name with the field containing the individual lot values of each record into a new subdivision and lot concatenation field. These concatenated descriptions must match the format of the Assessors subdivision and lot concatenation field populated in step 1.

12.   Make the Assessor’s parcel feature class into a layer and join its concatenated subdivision and lot field to the new concatenated subdivision and lot field of the expanded case record calculated in step 11.

13.   Export the joined Parcel features to a new feature class to expand the set of parcel features to cover every individual lot in every joined case.

14.   Here is the final result.  This is a true representation of the parcels and property each case actually should have covered.

The outcome is an LMS case feature class that has features in the shape of each and every parcel that matched the lots described in the case’s original overall subdivision and lot description. The more cases I need to be process, the more efficient this process becomes.  I have applied it to nearly 1 million LMS cases in an afternoon and created a parcel output that would have required approximately 5 years of man hours to manually enter all of them individually into the LMS system.

Based on questions and requests posted to this Blog I will add sections to this post to give further details that explain how I approached any step I have listed .  There is a great deal more I could say about each aspect of this process, but for now I will post these general steps as a template that can at least help me make the process even more efficient or automated in the future.

Below is the script I use in step 7.

from time import strftime  
print( "Start script: " + strftime("%Y-%m-%d %H:%M:%S")  )
import arcpy 
sourceFC = r"\\agency\AgencyDFS\Tran\FILES\GISData\rfairhur\Layers\Plus_Conversion_Data\Plus_Conversion_Data.gdb\PLUS_LOT_FIELD"
sourceFieldsList = ["LOT","START_1","END_1","START_2","END_2","START_3","END_3","START_4","END_4","START_5","END_5","START_6","END_6","START_7","END_7","START_8","END_8","START_9","END_9","START_10","END_10","START_11","END_11"]

insertFC = r"\\agency\AgencyDFS\Tran\FILES\GISData\rfairhur\Layers\Plus_Conversion_Data\Plus_Conversion_Data.gdb\PLUS_LOT_RANGES_FULL" 
insertFieldsList = ["LOT","RANGE_COUNT","EACH_LOT","START_1","END_1","START_2","END_2","START_3","END_3","START_4","END_4","START_5","END_5","START_6","END_6","START_7","END_7","START_8","END_8","START_9","END_9","START_10","END_10","START_11","END_11"]


icursor = arcpy.da.InsertCursor(insertFC, insertFieldsList)

countDict = {}
with arcpy.da.SearchCursor(sourceFC, sourceFieldsList) as sourceRows: 
    for sourceRow in sourceRows:
        for j in range(0,22,2):
            if j == 0 and sourceRow[1] == None:
                # print(sourceRow[j+1])
                l1 = [sourceRow[0],0,sourceRow[0]]
                l = l1 + list(sourceRow[1:])
                # print(l)
                del l1
                del l
                if sourceRow[0] in countDict:
                    countDict[sourceRow[0]] += 1
                    countDict[sourceRow[0]] = 1
            elif sourceRow[j+1] != None:
                # print(sourceRow[j+1])
                for i in range(int(sourceRow[j+1]),int(sourceRow[j+2])+1):
                    l1 = [sourceRow[0],0,str(i)]
                    l = l1 + list(sourceRow[1:])
                    # print(l)
                    del l1
                    del l
                    if sourceRow[0] in countDict:
                        countDict[sourceRow[0]] += 1
                        countDict[sourceRow[0]] = 1

del icursor

print( "Expanded Lot Records: " + strftime("%Y-%m-%d %H:%M:%S")  )

with arcpy.da.UpdateCursor(insertFC, insertFieldsList) as updateRows:
    for updateRow in updateRows:
        updateRow[1] = countDict[updateRow[0]]

sourceFieldsList = ["LOT","RANGE_COUNT"]

with arcpy.da.UpdateCursor(sourceFC, sourceFieldsList) as updateRows:
    for updateRow in updateRows:
        updateRow[1] = countDict[updateRow[0]]

print( "Set Range_Count field: " + strftime("%Y-%m-%d %H:%M:%S")  )

print( "Finish script: " + strftime("%Y-%m-%d %H:%M:%S")  )

1 0 219
Esri Esteemed Contributor

I had a project where I needed to publish data at a service provider location, with the publishing to be done by a novice Desktop user.  My solution was to make a Python toolbox to simplify the data import and ArcGIS Server service publishing steps, and then I just went ahead and made the export and cleanup steps tools in that same toolbox as well. This was looking like a great solution until network issues at the end-user site caused the data import step to take seven hours (instead of the usual 20 minutes).  Since this was an Amazon solution, the utility could have been run on a VM from inside the mission, but there were issues accessing the license server from the VM, and from there, well, let's just say it didn't work out.

Since my ArcGIS Server node was actually on a Linux box, I didn't have the option of running a graphical utility like a toolbox tool, but the code didn't really need graphical access (just a working ArcPy, a path to the source file geodatabase, and an enterprise connection file (.sde)). I could have ported the app to a command-line utility, then invoked the command-line from the toolbox UI, but this additional development would take time and, as I recently discovered, it wasn't strictly necessary, because Python toolbox ( .pyt ) files can be invoked from Python!

For example, lets say we have this trivial toolbox:

import arcpy

class Toolbox(object):
    def __init__(self):
        """Define the toolbox (the name of the toolbox is the name of the
        .pyt file)."""

        self.label = "Toolbox"
        self.alias = ""

        # List of tool classes associated with this toolbox = [BlogTool]

class BlogTool(object):
    def __init__(self):
        """Define the tool (tool name is the name of the class)."""
        self.label = "BlogTool"
        self.description = "Trivial tool example"
        self.canRunInBackground = False

    def getParameterInfo(self):
        """Define parameter definitions"""
        param0 = arcpy.Parameter(
        return [param0]

    def isLicensed(self):
        """Set whether tool is licensed to execute."""
        return True

    def updateParameters(self, parameters):
        """Modify the values and properties of parameters before internal
        validation is performed.  This method is called whenever a parameter
        has been changed."""


    def updateMessages(self, parameters):
        """Modify the messages created by internal validation for each tool
        parameter.  This method is called after internal validation."""


    def execute(self, parameters, messages):
        # invoke helper
        return doExecute(parameters[0].valueAsText,messages)

def doExecute(param1,messages):
    messages.addMessage("Tool: Executing with parameter '{:s}'".format(param1))
    return None

It works like you'd expect:




Now, if you run it from the command line, despite it not having a .py suffix, you get no error, but it doesn't do anything either:

toolbox execution

Ahh, but if you tweak the toolbox to add an extra two lines of code to the end:

if (__name__ == '__main__'):
    arcpy.AddMessage("Whoo, hoo! Command-line enabled!")‍‍

Then you can use the toolbox from a console:


"Still, they're not connected, and what about messaging?" you ask.  Well, this is pretty cool:  You can fake the messages parameter in the tool's execute method with a stub class that does everything you need messages to do, and you can grab parameters from the sys.argv array, which makes for a command-line capable toolbox:

if (__name__ == '__main__'):
    arcpy.AddMessage("Whoo, hoo! Command-line enabled!")

    class msgStub:
        def addMessage(self,text):
        def addErrorMessage(self,text):
        def addWarningMessage(self,text):

    import sys


Obviously, you'd want to do the sort of validation/verification for arguments that Desktop provides before invoking the helper with parameters, but that is, as they say, "A small matter of coding."

- V

2 0 182