Reporting data quality using Attribute Rules and Python in ArcGIS Pro 3.3

BaileyAOhlson · ‎07-10-2024

Summary
Updates to the Data Quality Reporting Script
Automated Review vs Visual Review in the Data Quality Report
Automated Review – Attribute Rules
Semi-Automated Review – Visual Review
Script Instructions & Download

Summary

In 2021, the Data Reviewer team authored a Python Toolbox that leverages ArcPy and OpenPyXL to generate an Excel data quality report by extracting data from attribute rule errors and feature classes. In the 2024 release of ArcGIS Pro 3.3, this Python script has been updated to account for not only file but also mobile geodatabases and reports data quality metrics for Visual Review Rules, among other additional enhancements. This reporting tool can be leveraged to quickly assess data quality and inform stakeholders or other non-GIS professionals using an easy-to-read table report.

This blog post explains the updates made to the data quality reporting script and defines key terms used in the report. To learn more about how data is extracted from the feature classes and error layers in your geodatabases, please visit this blog post detailing our methodology.

Updates to the Data Quality Reporting Script

The original data quality reporting script was written for the ArcGIS Pro 3.0 release, since then ArcGIS Data Reviewer added new functionalities and, with this update, the stand-alone script and Python Toolbox now incorporates them. A list of the updates and enhancements are as follows:

The reporting script now accepts mobile geodatabases in addition to file geodatabases.

Find Polygons with Holes check is accounted for in the Automated Review Rules error report.

Visual Review errors have been added to the report in a separate worksheet.

Users can choose the level of detail they want to see in the Visual Review report by choosing to hide or display the severity rows.

With the additional .xml file, users of the Python Toolbox can access tool tips for the Data Quality Report tool.

Output from running the Data Quality Report script from the stand-alone script or from the Python Toolbox.

Automated Review vs Visual Review in the Data Quality Report

Since Semiautomated Review workflows behave slightly differently from Automated Review workflows, their data quality report is stored separately and uses different metrics to summarize your data. The Visual Review Report tab stores the Semiautomated Review data quality summary while the Automated Check Report stores the Automated Review data quality summary.

With the evaluation of Attribute Rules, at maximum, each unique rule can flag one error per feature in a feature class. Therefore, the number of errors associated with a unique Attribute Rule cannot exceed the number of features in a feature class. However, when a user commits Visual Review errors in a Semiautomated Review workflow, the errors written are not necessarily associated with a singular feature from a feature class. Errors written using the Browse Features tool can have multiple Browse Features errors written to the same feature, and errors written using the Flag Missing Features tool are not associated with any specific feature from a feature class.

Notice how there are two Browse Features errors written to Feature ObjectID 20 in the USAStates layer. Also, notice how the Feature ObjectID and Feature GlobalID for the Flag Missing Features Error are 0.

Automated Review – Attribute Rules

Since Automated Review Rules checks are tied to features in a feature class, the Data Quality Report worksheet for Automated Review reports data quality for each check based on the Validation Status field of the feature itself and the Error Status field of the error layers. The Automated Check Report reports the following data quality metrics:

Column Header	Description
Total Validated Features	The number of features in a feature class that have been evaluated. Have Validation Statuses where there is no validation required (codes 0, 1, 4, and 5).
Total Error Features	The number of features that are in error for a given Attribute Rule. Must have Error Statuses of Reviewed or Unacceptable (codes 1 or 6 respectively). Error features that have Error Statuses of Acceptable, Resolved, or Exception are not included in this count.
Percentage Accuracy	(1 - (Total Error Features / Total Number of Features)) x 100
Total Unvalidated Features	The number of features in a feature class that have not yet been evaluated. Have Validation Statuses where validation is required (codes 2, 3, 6, and 7).
Percent Unvalidated	(Total Unvalidated Features / Total Number of Features) x 100

* Does not include records associated with Visual Review errors

Semi-Automated Review – Visual Review

Since Visual Review checks are not necessarily tied to a feature in a feature class, the Data Quality Report worksheet for Visual Review reports slightly different metrics for data quality based solely on the Error Status fields from the error layers. The Visual Review Report worksheet reports the following data quality metrics:

Column Header	Description
Total Error Records	The total number of Visual Review error records found.
Total Errors Unresolved	The total number of errors that have Error Statuses of Reviewed or Unacceptable (codes 1 or 6 respectively).
Total Errors Resolved	The total number of errors that have Error Statuses of Resolved, Acceptable, or Exception (codes 2, 4, or 9 respectively).
Percentage Resolved	(Total Errors Resolved / Total Error Records) x 100

* Does not include records associated with Attribute Rule errors

Script Instructions & Download

The Data Quality Reporting scripts can be downloaded from this blog and run as a Python Toolbox in ArcGIS Pro (DataReviewerDataQuality.pyt) or in the command line interface (CLI) using the stand-alone python script (AttributeRulesReport.py). The following CLI options are supported when calling the python script:

Options	Name	Description	Required/Optional
-h or --help	Help	Show this help message and exit	Optional
-i	INPUT	File location of your geodatabase containing attribute rules (only FGDB and MGDB are currently supported)	Required
-o	OUTPUT	File location to save the data quality report .xlsx file	Required
-hide {True,False}	HIDDEN	Enter True/False to choose if you would like the severity rows hidden from the Visual Review Rules worksheet	Required
-rn	REPORTNAME	Enter the name you would like to give the data quality report, otherwise the default format will be used (AttributeRulesReport_MMDDYYYY-TTTTTT)	Optional

List of required and optional inputs in Command Prompt.

Python Toolbox UI in ArcGIS Pro 3.3. Note: The tooltips and metadata will not appear for the custom python toolbox Data Quality Report tool unless the DataReviewerDataQualityReport.ReviewerErrorReport.pyt.xml is stored in the same folder as the DataReviewerDataQuality.pyt.

To learn more about how ArcPy and OpenPyXL were used to write this script please read this blog post detailing our methodology.

Contents of DQReport_Pro3.3.zip:

AttributeRulesReport.py – The stand-alone python script.

DataReviewerDataQuality.pyt – The Python Toolbox.

DataReviewerDataQualityReport.ReviewerErrorReport.pyt.xml – Contains the metadata and tooltips for the Python Toolbox. Must be stored in the same folder as DataReviewerDataQuality.pyt for tool tips to appear.

SumitMishra_016 · ‎07-23-2024

Hi @BaileyAOhlson ,

If a geodatabase contains feature datasets, the code will throw an error at line 660 in your script tool:

# Check for VALIDATIONSTATUS field in dataset
for field in arcpy.ListFields(feature_class): //this line will throw error
if("VALIDATIONSTATUS" in field.name.upper()):

To correct this, ensure you provide the full path to the feature class as shown below:

BaileyAOhlson

Hi @SumitMishra_016,

Thank you for reaching out with your observation and suggestion. I was not able to reproduce an error message from the script when using feature datasets. Perhaps you can share the specific error message thrown when you run the script to help me pinpoint the issue you are observing?

Regardless, in trying to replicate your error I did observe an edge case issue with the report that occurs when two feature layers in a database are given the same alias. I have updated both scripts posted in this blog to utilize full file pathways and fix the edge case observation. I hope this update resolves the issue you are observing.