Data Management Blog

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Other Boards in This Place

Latest Activity

(43 Posts)
MarceloMarques
Esri Contributor

Intended for database administrators as recommendations for establishing the product workspaces in an Enterprise Geodatabase (Oracle®, SQL Server®, PostgreSQL®).

Production Mapping (Topographic Mapping / Defense Mapping), Data Reviewer, Workflow Manager, Maritime, Aviation

Read more...

more
0 0 43
VinceAngelo
Esri Esteemed Contributor

arcpy.da.InsertCursor into an unregistered PostgreSQL table can function without the dreaded "No support for this geometry type" if you keep ArcGIS in the loop (or keep it in the dark).

Read more...

more
0 3 390
AndresCastillo
MVP Regular Contributor

This article is useful for the Oracle Enterprise Geodatabase Admin to work closely with the Oracle Database Admin when configuring an Esri Enterprise Geodatabase.

Read more...

more
0 0 225
Diana_Muresan
Esri Contributor

Check out the questions users had during Improve Data Quality with Attribute Rules and Branch Versioning session at Esri User Conference 2021.

Read more...

more
0 2 705
AndresCastillo
MVP Regular Contributor

Here are some resources for achieving this in ArcGIS Desktop, Pro, and python:

 

How To: Calculate latitude and longitude for point features

https://support.esri.com/en/technical-article/000008944 

 

Calculate/Add Geometry attributes tools:

https://desktop.arcgis.com/en/arcmap/10.6/tools/data-management-toolbox/calculate-geometry-attribute...

https://desktop.arcgis.com/en/arcmap/10.6/tools/data-management-toolbox/add-geometry-attributes.htm 

Note that if you are trying to script it out, it appears that you cannot specify the desired units.

 

 

via ArcPy:

https://gis.stackexchange.com/questions/114416/calculating-geometry-area-in-arcgis-desktop-using-arcpy 

https://gis.stackexchange.com/questions/8163/getting-latitude-and-longitude-of-projected-point-using-arcpy 

From the National Wildfire Coordinating Group:

https://www.nwcg.gov/sites/default/files/publications/936-1-calculate-geometry-using-python.pdf

 

https://community.esri.com/t5/python-questions/how-to-automate-quot-calculate-geometry-quot-degrees-...

 

https://gis.stackexchange.com/questions/155476/using-python-to-field-calculate-lat-long-with-decimal...

 

https://community.esri.com/t5/python-questions/calculate-field-with-code-block-python-script/td-p/71...

 

more
0 0 338
VickyWang
Esri Contributor

A workflow for using SQL Server's backup file and restore to the new SQL server and upgrade the geodatabase. SQL Server 2012 to SQL server 2019. upgrade geodatabase 10.3 to 10.9.x

Read more...

more
5 0 942
JoeBorgione
MVP Esteemed Contributor

I've been wrestling with the Address Data Management Solution for some time, and today, I finally got the toughest rule to work with my attributes: Split Intersecting Roads.  This rule is a really powerful editing tool as it will split an existing centerline into two segments when you add a new street that intersects it.  In essence it:

 

  1. Copies attributes from the original existing street to the 'new segment'
  2. Does a proportional division of the address ranges so you don't have to

 

To begin with, you need to  create a list of the attributes you want to copy during the split.  This is done near the top of the script:

The orginal looks like this:

var centerline_field_names = ["rclnguid", "discrpagid", "rangeprefixleft", "fromleft",
 "toleft", "parityleft", "rangeprefixright", "fromright", "toright", "parityright",
 "fullname","fedroute", "fedrtetype", "afedrte", "afedrtetype", "stroute", "strtetype",
 "astrte", "astrtetype", "ctyroute", "onewaydir", "roadlevel", "inwater", "roadclass",
 "countryleft", "countryright", "stateleft", "stateright", "countyleft",
 "countyright","munileft", "muniright", "zipleft", "zipright", "msagleft", "msagright",
 "esnleft", "esnright"]

Mine looks like this:

var centerline_field_names = ["FROMADDR_L","TOADDR_L","FROMADDR_R","TOADDR_R","PREDIR",
"NAME","POSTTYPE","POSTDIR","FULLNAME","AN_NAME","AN_POSTDIR","ALIAS_NAME",
"LABEL","CITYCODE_L","CITYCODE_R","ZIPCODE_L","ZIPCODE_R","INCMUNI_L",
"INCMUNI_R","UNINCCOM_L","UNINCCOM_R","ONEWAY","SPEED_LMT","STATUS",
"CARTOCODE","DOT_RD","DOT_HWYNAM","DOT_SRFTYP","DOT_CLASS","VERT_LEVEL",
"SOURCE","COMMENTS","asset_id","Condition","Year_Installed","Carto_ID",
"Pave_Area","Pave_Width"]
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I used the arcpy.ListFields() function and a little text formatting in python to generate the list.  Something to take note of is the attribute names provided in the solution are all lower case and you can see mine are upper case as well as proper case. More on this further down. 

 

Any reference to the feature class name needs to be adjusted: 

There are two references to the feature class name:

var intersectingRoads = Intersects(FeatureSetByName($datastore, "RoadCenterline"), geom);

and down at the bottom of the script:

'edit': [{'className': 'RoadCenterline', 'adds': adds, 'updates': updates}]

Mine looks like this:

var intersectingRoads = Intersects(FeatureSetByName($datastore, "MSD.SLCOMSD.CenterlinesMSD"), geom);

and

edit': [{'className': 'MSD.SLCOMSD.CenterlinesMSD', 'adds': adds, 'updates': updates}]
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

My lines 11 and 15 show the full feature class name with Owner.Database as part of the name since I'm doing the edits in an Enterprise GBD (aka SDE); if you are using a file gdb, you'll only need the feature class name.

 

About midway down in the script, a series of variables are set to the field names where your address range info is stored:

 

The original references these attributes:


        // Get the address range of the intersecting road
        var fromRight = road.fromright;
        var toRight = road.toright;
        var fromLeft = road.fromleft;
        var toLeft = road.toleft;

While I reference:

        var fromRight = road.FROMADDR_R;
        var toRight = road.TOADDR_R;
        var fromLeft = road.FROMADDR_L;
        var toLeft = road.TOADDR_L;‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 

A little further down in the script, there are references again to the address range variables that get written to a dictionary:

 

The original:

  var attributes = {}
  if (newToFromRight[0] != null) attributes['toright'] = newToFromRight[0];
  if (newToFromLeft[0] != null) attributes['toleft'] = newToFromLeft[0];

And mine:

  var attributes = {}
  if (newToFromRight[0] != null) attributes['TOADDR_R'] = newToFromRight[0];
  if (newToFromLeft[0] != null) attributes['TOADDR_L'] = newToFromLeft[0];‍‍‍‍‍‍‍‍‍‍‍

 

As mentioned earlier, how you have your attribute names with respect to upper, lower, or proper case is an issue you need to deal with.  If all your field names are lower case, this isn't an issue for you.  But it is for me:

 

The original script uses a Lower() function three times:

        for (var k in featureAttributes) {
            if (Lower(k) == "fromright" && newToFromRight[1] != null) {
                newAttributes['fromright'] = newToFromRight[1];
            } else if (Lower(k) == "fromleft" && newToFromLeft[1] != null) {
                newAttributes['fromleft'] = newToFromLeft[1];
            } else if (IndexOf(centerline_field_names, Lower(k)) > -1 && featureAttributes != null) {
                newAttributes = featureAttributes;
            } else {
    continue;

But the Lower() function trips with my schema, so I just remove them 
while again referencing my address range fields:

        var newAttributes = {};
        for (var k in featureAttributes) {
            if (k == "FROMADDR_R" && newToFromRight[1] != null) {
                newAttributes['FROMADDR_R'] = newToFromRight[1];
            } else if (k == "FROMADDR_L" && newToFromLeft[1] != null) {
                newAttributes['FROMADDR_L'] = newToFromLeft[1];
            } else if (IndexOf(centerline_field_names, k) > -1 && featureAttributes != null) {
                newAttributes = featureAttributes;
            } else {
    continue;‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 

Finally, I don't have a field called centerlineid but I do have field that is used in a similar fashion called UNIQUE_ID.  In the solution, the centerlines feature class related to the Alias Road Name table through the centerlineid field. And, that centerlineid field is updated with a Database Sequence.  If you plan to use centerlineid in your application, you are just fine.  However, if you are using another field name you'll need to perform a search and replace on centerlineid in the scipt and replace it with your attribute name.  You'll also need to add that attribute name to the Alias Road Name table so the relationship class works.

 

None of this would have been possible without a boat load of help and even more patience from Chris Fox‌.  Thanks Chris!

more
2 5 2,313
RichardFairhurst
MVP Honored Contributor

The process described in this Blog requires an Advanced license.

Some polygon feature classes contain a set of overlapping features that accurately represent the boundaries of the attribute(s) that each polygon contains.  For example, polygon boundaries could each represent the area covered by a different policy and if more than one policy applied to a property then multiple policy area boundaries would overlap. Typically they can't be symbolized with solid fills and have to use hatch symbols and any classification of the features as being overlapped cannot identify which portion is overlapped.

 

This configuration is not useful if you want to select just the overlapping portions of the policy areas or you need to intersect this feature class with another feature class without duplicating portions of the other feature class where these features overlap.

This Blog provides a method for transforming overlapping polygons like this into a new polygon feature class that contains separate contiguous polygons with no overlaps and combines any attribute values that originally overlapped into a list that defines each new feature.  The steps of the process are:

 

  1. Run the Union tool in the Analysis toolbox under the Overlay toolset on just the feature class that contains the overlapping polygons
  2. Run the Multipart to Singlepart tool in the Data Management toolbox under the Features toolset on the Union output to separate all multipart polygons into individual polygons for each part.
  3. Run the Feature to Point Tool in the Data Management toolbox under the Features toolset on the Multipart to Singlepart output with the Inside option checked to extract the attributes to a set of points that fall inside of the polygon that they came from.  NOTE: If slivers are extremely small or narrow the tool can fail to run.  You may want to select features that have large areas or that are small with a larger thinness ratios.  If your length field is in feet and your area field is in square feet you could 1000 square feet as the cut off for larger features and do the selection with this SQL:  (SHAPE_Area >= 1000 OR 4 * 3.14 * Shape_Area / (Shape_Length * Shape_Length) > 0.3 AND SHAPE_Area < 1000))
  4. Run the Feature to Line tool in the Data Management toolbox under the Features toolset on the Multipart to Singlepart output with the no relationship option and do not preserve attributes.
  5. Run the Feature to Polygon tool in the Data Management toolbox under the Features toolset on the Feature to Line output to create unique single-part, contiguous polygons for all the areas enclosed by the lines.
  6. Run the Spatial Join tool in the Analysis toolbox under the Overlay toolset with the following settings: 
    1. Make the Feature to Polygon output the Target Features
    2. Make the Feature to Point output the Join Features. 
    3. Use the JOIN_ONE_TO_ONE option
    4. Use the Keep all target features option.
    5. For the attributes you want to transfer to the Polygons from the Point, in the field map right click each field and access its properties.  Change the following in the Output Field Properties:
      1. Make sure to change the field Type is set to Text if it not Text already
      2. Change the field Length to 255 characters (possibly more if the tool fails because the output field is too small and you are outputting to a geodatabase)
      3. Change the field Merge Rule to Join
      4. Set the Delimiter to a character or set characters that are not contained in the original attributes, like a semicolon (";").
  7. The final output features will have the following characteristics:
    1. Features with attributes that do not contain the delimiter character(s) were not overlapping and the attribute is identical to the original feature.  These features can be selected with an SQL expression similar to:
      NOT POLICY_NAME LIKE '%;%'
    2. Features with attributes that contain the delimiter character(s) were originally overlapping, but now are a single feature that has a list of all the attributes of the original overlapping features separated by the delimiter.  These features can be selected with an SQL expression similar to:
      POLICY_NAME LIKE '%;%'
    3. Features with Null attributes were not originally features, but they were empty spaces fully enclosed by the original features that can be eliminated or merged into surrounding features if they are slivers.  These features can be selected with an SQL expression similar to:
      POLICY_NAME IS NULL  
  8. The features in the picture below with solid fills are just the portions of the original polygons that overlapped.  The features are contiguous without any overlaps and the attributes associated with each portion list all of the overlapping values in a single feature.  They can be symbolized or selected separately and won't cause feature duplication due to overlaps when they are intersected with other feature classes. 

 

I hope you find this useful.

more
1 0 521
mdonnelly
Esri Contributor

Hello,

Problem

After upgrading a SQL Server database should you leave the compatibility level at the lowest level?

 

Solution

 

No, after upgrading a SQL Server database you should upgrade the database compatibility level.

 

Reference:

https://bousteadgis.sharepoint.com/teams/au-ps/SupportCenter/Knowledge%20Library/Administering%20SQL...

"A Note About Compatibility Level"

 

For example, if upgrading the database to SQL Server 2014 change the following setting on each database:

 

SQL Server Management Studio > Databases > Right-click your database > Properties > Options > Compatibility Level -> SQL Server 2014 (120)

 

Reference:

https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-database-transact-sql-compatibility-leve...

Mark

more
1 0 418
VinceAngelo
Esri Esteemed Contributor

This blog rolls out version 2.0 of my jhuEmulator.py utility.

The folks at Johns Hopkins University have done an awesome job at maintaining their Ops Dashboard site.  They even share a folder of data snapshots as a CSV files at a minute before midnight, Greenwich time (2359 UTC).  However, the data itself is anything but static, and anyone who needs the CSV files at a more frequent pitch, if not real-time, can be frustrated by the snapshot frequency.

Fortunately, the data is all available with real-time updates -- It's in the feature service layers hosted on services1.arcgis.com (ncov_cases and ncov_cases_US). But there's one complication: The file format changed with the 2020-03-23.csv file, so anyone who had tools to read the old format was left in the lurch.

I've written a Python utility which can:

  • Export feature service data in either the old (Province/State,...) or new (FIPS,...) CSV format
  • Export near real-time updates every 2 minutes, or as slowly as once every 12 hours
  • Choose to skip export if the number of confirmed cases, deaths, or recoveries don't change in any particular time-slice (sometimes the Last_Update value changes, but the values reported don't)

Attached to this blog post is a zipfile of the jhuEmulator.py utility.  The usage looks like this:

D:\covid-19>python jhuEmulator.py -h
usage: jhuEmulator.py [-h] [--adminLevel {0,1,2}] [--verbose VERBOSE]
                      [--interval INTERVAL] [--folder FOLDER]
                      [--csvFormat CSVFORMAT] [--stopPath STOPPATH]
                      [--skipTrivial SKIPTRIVIAL] [--usOnly USONLY]
                      [--confirmedOnly CONFIRMEDONLY]
                      [--topStates {5,10,15,20,25,30,35,40,45,all}]
                      {IMMEDIATE,DELAY,ONCE}

Emulate JHU COVID-19 data file (v2.0)

positional arguments:
  {IMMEDIATE,DELAY,ONCE}
                        Execution mode

optional arguments:
  -h, --help            show this help message and exit
  --adminLevel {0,1,2}  (default = 2)
  --verbose VERBOSE     Verbose reporting flag (default = False)
  --interval INTERVAL   Data retrieval interval (default = 60m)
  --folder FOLDER       Folder path for data files (default = 'data')
  --csvFormat CSVFORMAT
                        strftime format for data files
  --stopPath STOPPATH   File that indicates loop execution (default =
                        'stop.now')
  --skipTrivial SKIPTRIVIAL
                        Defer writing insignificant changes flag (default =
                        False)
  --usOnly USONLY       Only export US data (default = False)
  --confirmedOnly CONFIRMEDONLY
                        Only export rows with confirmed cases (default =
                        False)
  --topStates {5,10,15,20,25,30,35,40,45,all}
                        Display sorted Confirmed/Deaths by US state (default =
                        0)

Can generate both CSV formats (before/after) 23-Mar-2020‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The simplest use is a one-time execution (ONCE) mode:

D:\covid-19>python jhuEmulator.py ONCE
  1910:   2369 rows written (  838061 /  41261 /  175737 )

D:\covid-19\demo>head data\2020-03-31_1910Z.csv
FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
45001,Abbeville,South Carolina,US,2020-03-31 18:31:52,34.223334,-82.461707,3,0,0,0,"Abbeville, South Carolina, US"
22001,Acadia,Louisiana,US,2020-03-31 18:31:52,30.295065,-92.414197,11,1,0,0,"Acadia, Louisiana, US"
51001,Accomack,Virginia,US,2020-03-31 18:31:52,37.767072,-75.632346,7,0,0,0,"Accomack, Virginia, US"
16001,Ada,Idaho,US,2020-03-31 18:31:52,43.452658,-116.241552,163,3,0,0,"Ada, Idaho, US"
19001,Adair,Iowa,US,2020-03-31 18:31:52,41.330756,-94.471059,1,0,0,0,"Adair, Iowa, US"
29001,Adair,Missouri,US,2020-03-31 18:31:52,40.190586,-92.600782,1,0,0,0,"Adair, Missouri, US"
40001,Adair,Oklahoma,US,2020-03-31 18:31:52,35.884942,-94.658593,4,0,0,0,"Adair, Oklahoma, US"
08001,Adams,Colorado,US,2020-03-31 18:31:52,39.874321,-104.336258,152,0,0,0,"Adams, Colorado, US"
17001,Adams,Illinois,US,2020-03-31 18:31:52,39.988156,-91.187868,2,0,0,0,"Adams, Illinois, US"‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The default is --adminLevel=2 (new-style), but levels 1 and 0 are also supported (zero is the same format as one, but without any state/province data for the US, Canada, China, or Australia):

D:\covid-19>python jhuEmulator.py ONCE --adminLevel=1
  1911:    315 rows written (  838061 /  41261 /  175737 )

D:\covid-19>head data\2020-03-31_1910Z.csv
Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude
,Italy,2020-03-31T18:31:40,105792,12428,15729,41.871940,12.567380
,Spain,2020-03-31T18:31:40,94417,8269,19259,40.463667,-3.749220
New York,US,2020-03-31T18:31:52,75795,1550,0,42.165726,-74.948051
,Germany,2020-03-31T18:31:40,68180,682,15824,51.165691,10.451526
Hubei,China,2020-03-31T01:06:37,67801,3187,63153,30.975600,112.270700
,France,2020-03-31T18:31:40,52128,3523,9444,46.227600,2.213700
,Iran,2020-03-31T18:31:40,44605,2898,14656,32.427908,53.688046
,United Kingdom,2020-03-31T18:31:40,25150,1789,135,55.378100,-3.436000
New Jersey,US,2020-03-31T18:31:52,17126,198,0,40.298904,-74.521011

D:\covid-19>python jhuEmulator.py ONCE --adminLevel=0
  1911:    180 rows written (  838061 /  41261 /  175737 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Note that successive executions in the same time window will overwrite the output file!

The IMMEDIATE and DELAY options are nearly the same, except the IMMEDIATE makes a new snapshot without delay, while the DELAY mode only operates at regular intervals (all with a 1-20 second random delay, to prevent  slamming the service with synchronized queries).  The next two quote blocks were collected from two sessions running at the same time (in different directories):

 

D:\Projects\covid-19>python jhuEmulator.py IMMEDIATE --interval=15m --adminLevel=1
  1917:    315 rows written (  838061 /  41261 /  175737 )‍‍
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍  1930:    315 rows written (  838061 /  41261 /  175737 )
  1945:    315 rows written (  838061 /  41261 /  175737 )
  2000:    315 rows written (  846156 /  41494 /  176171 )
  2015:    315 rows written (  846156 /  41494 /  176171 )
  2030:    315 rows written (  846156 /  41494 /  176171 )
  2045:    315 rows written (  846156 /  41494 /  176171 )
  2100:    315 rows written (  846156 /  41494 /  176171 )
  2115:    315 rows written (  846156 /  41494 /  176171 )
  2130:    315 rows written (  850583 /  41654 /  176714 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  2145:    315 rows written (  850583 /  41654 /  176714 )
  2200:    315 rows written (  850583 /  41654 /  176714 )
  2215:    315 rows written (  850583 /  41654 /  176714 )
  2230:    315 rows written (  855007 /  42032 /  177857 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

D:\covid-19>python jhuEmulator.py DELAY --interval=15m --adminLevel=1 --skipTrivial=True‍
  1930:    315 rows written (  838061 /  41261 /  175737 )
  2000:    315 rows written (  846156 /  41494 /  176171 )
  2130:    315 rows written (  850583 /  41654 /  176714 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  2230:    315 rows written (  855007 /  42032 /  177857 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Note that the --skipTrivial=True flag is the mechanism to skip data export if no significant changes have occurred (changes to the Last_Updated field will not write a new file, but changes to any case count field will).

Assembling the new-style data format is tricky, because it needs to aggregate thousands of records from two different services (and remove the US duplicates from the ncov_cases service), the verbose mode gives an indication of what's happening:

D:\covid-19>python jhuEmulator.py ONCE --verbose=True

Querying 'ncov_cases_US' service (1/5)...
   382.2 KB retrieved (476ms elapsed)
Querying 'ncov_cases_US' service (2/5)...
   281.9 KB retrieved (250ms elapsed)
Querying 'ncov_cases_US' service (3/5)...
   200.1 KB retrieved (176ms elapsed)
Querying 'ncov_cases_US' service (4/5)...
   262.2 KB retrieved (204ms elapsed)
Querying 'ncov_cases_US' service (5/5)...
   169.5 KB retrieved (161ms elapsed)
Querying 'ncov_cases' service...
   163.6 KB retrieved (169ms elapsed)
Creating datafile '2020-03-31_1936Z.csv'...
     2369 rows written (  838061 /  41261 /  175737 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

For the US-centric audience, I added a --topStates flag (which must be used with --verbose=Y) to print a summary 

D:\covid-19>python jhuEmulator.py ONCE --verbose=True --topStates=15

Querying 'ncov_cases_US' service (1/5)...
   390.0 KB retrieved (429ms elapsed)
Querying 'ncov_cases_US' service (2/5)...
   286.2 KB retrieved (168ms elapsed)
Querying 'ncov_cases_US' service (3/5)...
   200.7 KB retrieved (189ms elapsed)
Querying 'ncov_cases_US' service (4/5)...
   268.2 KB retrieved (163ms elapsed)
Querying 'ncov_cases_US' service (5/5)...
   172.5 KB retrieved (157ms elapsed)

                      ===== Top 15 States =====

                State     Confirmed   Deaths  Counties
                New York     75798     1550        56
              New Jersey     18696      267        22
              California      8077      163        48
                Michigan      7615      259        69
           Massachusetts      6620       89        14
                 Florida      6338       77        54
                Illinois      5994       99        54
              Washington      5305      222        35
               Louisiana      5237      239        61
            Pennsylvania      4963       63        60
                 Georgia      3815      111       138
                   Texas      3726       53       131
             Connecticut      3128       69         9
                Colorado      2627       51        48
               Tennessee      2391       23        82
                 *Others     25475      462      1247

Querying 'ncov_cases' service...
   163.6 KB retrieved (132ms elapsed)
Creating datafile '2020-03-31_2220Z.csv'...
     2405 rows written (  855007 /  42032 /  177857 )

D:\covid-19>python jhuEmulator.py ONCE --verbose=True --topStates=15 --adminLevel=1

Querying 'ncov_cases' service...
   163.6 KB retrieved (442ms elapsed)

                 ===== Top 15 States =====

                State     Confirmed   Deaths
                New York     75798     1550
              New Jersey     18696      267
              California      8077      163
                Michigan      7615      259
           Massachusetts      6620       89
                 Florida      6338       77
                Illinois      5994       99
              Washington      5305      222
               Louisiana      5237      239
            Pennsylvania      4963       63
                 Georgia      3815      111
                   Texas      3726       53
             Connecticut      3128       69
                Colorado      2627       51
               Tennessee      2391       23
                 *Others     25475      462

Creating datafile '2020-03-31_2221Z.csv'...
      315 rows written (  855007 /  42032 /  177857 )‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The format difference is due to the fact that Admin2 reporting is by county, but doesn't populate Recovered and Active, while the Admin1 reporting includes Recovered (which would allow Active to be computed, but Recovered is now a lump-sum record in the ncov_cases service, so in practice, Active and Recovered are not available).

So, how do I know this script populates the same data shared on GitHub?  Well, I wrote a validator utility to create two CovidSummary objects, then iterate from one searching the other for duplicate keys, and reporting missing rows, data mismatches, and unmatched rows, and the output was:

D:\covid-19>python validate.py‍‍
========================================================================

0 errors / 3429 lines‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Okay, I cheated a bit, since the daily JHU CSV reports that the Diamond Princess and Grand Princess cruise ships are docked on Null Island, I treated 0.0 degrees latitude/longitude as a wildcard that matches any coordinate, and I only compare coordinates to 5 places, since string comparison doesn't work well with floating-point values, but everything else has aligned perfectly. 

UPDATE @ 2100 EST: Unfortunately, sometime today, the US Admin2 jurisdictions without any confirmed cases were deleted from the ncov_cases_US service feed, the FIPS code displayed for the Northern Mariana Islands and US Virgin Islands disappeared, and the US territory without any confirmed cases (American Samoa) also disappeared (along with its FIPS code).  I've tweaked the exporter to conform to this, and it runs cleanly now:

D:\covid-19>python validate.py‍‍
========================================
sum1 = data2\2020-03-31.csv
sum2 = data2\2020-04-01_0110Z.csv

0 errors / 2434 lines‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

But I'm expecting Guam to have its FIPS code removed in the coming days.

Since the validation tool implements Python classes to parse and search both old-style and new-style CSV files, I've attached that as well (as validator-v10.zip)

My next task is to write some code to exploit this near real-time data resource, and use it to maintain PostgreSQL tables to produce clones of the ncov_cases* services' data.

Now attached is jhuEmulator-v20.zip -- Changelog:

  • Implemented adminLevel=2 with multiple queries to replicate
    three-level results in GitHub CSV files
  • Added --confirmedOnly flag to skip counties without
    Confirmed cases (obviated by feed change)
  • Added --topStates to report ordered state-wide impact
    (requires --verbose=True and non-zero --adminLevel)
  • Added FIPS lookup for US Possessions (Note: Puerto Rico = None)
  • Ignored US admin1 summary values in admin2 feed (2-digit FIPS)
  • Added validateKey to correct Combined_Key with missing spaces
    after commas
  • Removed FIPS from US possessions and suppressed US possessions
    without confirmed cases (as per 2020-03-31.csv)

== Update 01-Apr @ 0100 EST ==

Long-running service execution is great for finding bugs...

Now attached is jhuEmulator-v21.zip -- Changelog:

  • Fixed significantChange field comparison error with Admin1
    field names in Admin2 service

== Update 01-Apr @ 0920 EST ==

Overnight service execution is great for finding bugs, but errors that arise after several hours are less fun...

Now attached is jhuEmulator-v22.zip -- Changelog:

  • Fixed significantChange field comparison error for *all*
    Admin1 field names in the Admin2 service
  • Fixed paren alignment defect in error handling in pullData

Also attached is covidValidate_v11.zip -- Changelog:

  • Tightened CovidReport.compare() loop
  • Fixed range check defect in main

== Update 01-Apr @ 2030 EST ==

The server glitched, dropping all US data, and I got to exercise some error code that doesn't normally see traffic, which I've now tweaked to be more resilient.

Attached is jhuEmulator-v23.zip -- Changelog:

  • Fixed TypeError when service result is empty (None)

== Update 08-Apr @ 1600 EST ==

I tweaked the exporter to always write data at UTC midnight, so that the validator has the same Last_Update value for comparison with the JHU published file. I had to improve date parsing in the CovidReport class (the corrected 2020-04-06.csv has a different date format). I'm still seeing some JHU data with incorrect 000xx FIPS codes, and the Admin1 and Admin2 feeds are out of sync for the Northern Mariana Islands, but otherwise looking good.

Attached is jhuEmulator-v24.zip -- Changelog:

  • Tweak to force 0000Z download, even when --skipTrivial is
    enabled (for comparison with GitHub published files)

Also attached is covidValidate_v12.zip -- Changelog:

  • Cleaned up CovidReport.extract() to handle bad formatting
    gracefully and to support the 'm/d/y HH:MM' date used in
    the revised 2020-04-06.csv datafile
  • Loosened the Last_Update comparison code so that second-
    truncated timestamps can compare successfully

 

== Update 12-Apr @ 2320 EST ==

The JHU daily CSV format changed again, adding five more fields (and moving FIPS later in the display order). I've got the covidValidate.py handling the new format, but am not yet generating it...

Attached is covidValidate_v30.zip -- Changelog:

  • Count, but don't display, Combined_Key mismatches
  • Added parsing (but not yet validation) of fields added
    to 04-12-2020.csv

- V

more
3 5 2,779
107 Subscribers
Labels