Help writing a Python script to validate use domains (in ArcGIS 10)

5929
4
09-19-2012 09:12 AM
DianaUmpierre
New Contributor II
Ok, I could help coming up w/ best way to write a python script to identify tables fields that violate domain values. I got over 100 feature classes using about 60 domains in a file geodatabase to check. My feature classes are NOT using subtypes (nor am I planning to use that in the near future).

Here are my limitiations:
> No, we can't afford Data Reviewer extension at this time (which I know has functionality to do this).
> I'm still in ArcGIS 10 and it doesn't look like my organization will upgrade to 10.1 until "maybe" early 2013.
> No, using ArcSDE is not an option at this time. (Nor does our IT group allow me to use personal SQL server instances). So, I'm stuck with using File geodatabases and my geodb is too big for a personal (Access) geodb.


Below are some of thoughts, but I'm wondering if there's something simpler I could do.
> I already generated a script that exports all the domains to tables in a file geodb. Ex. a domain called CodeStatus is exported to a table called "tb_CodeStatus". (Let's say these are my lookup tables).
> I could generate a table that stores the feature class, field and domain being used. )Let's pretend I did that and the table is called "tb_DomainUsage").

I'm just not sure what's the best way to do what I want next.
> Should I cycle thru each row in the tb_DomainUsage table, run something like the Frequency tool to create a temp table to store the unique values and then use that temp table to see if any of those values do NOT exist in the corresponding lookup table for that domain? And store the validation check results in another table, similar to the tb_DomainUsage table, but listing the domain values that did not match?
> Other ideas.. My idea above seems like a lot of steps to try come up w/ gp tool that I wish ESRI had thought of in the first place.

Thanks...
Tags (2)
0 Kudos
4 Replies
JakeSkinner
Esri Esteemed Contributor
Hi Diana,

Here is a script that I was able to get to work:

import arcpy
from arcpy import env
env.overwriteOutput = 1
env.workspace = r"C:\temp\python\test.gdb"

lstFCs = arcpy.ListFeatureClasses("*")
for fc in lstFCs:
    lstFields = arcpy.ListFields(fc)
    for field in lstFields:
        if field.domain:
            print fc + ": " + field.name + ": " + field.domain
            arcpy.DomainToTable_management(env.workspace, field.domain, "tb_" + field.domain, "code", "description")
            rows = arcpy.SearchCursor(fc, "", "", field.name)
            listValues = []
            for row in rows:
                fieldValues = row.getValue(field.name)
                listValues.append(fieldValues)
            del row, rows
            rows = arcpy.SearchCursor("tb_" + field.domain, "", "", "code")
            listCodes = []
            for row in rows:
                fieldCodes = row.getValue("code")
                listCodes.append(fieldCodes)
            del row, rows
            for n in listValues:
                if n not in listCodes:
                    print fc + " contains field " + field.name + " with domain " + field.domain + " containing value " + n + " that is not a coded value"


lstTables = arcpy.ListTables("tb_*")
for table in lstTables:
    arcpy.Delete_management(table)


What the script does is create a table for any domain found being used by a feature class within the geodatabase.  It then appends the field values to a list and compares these values to the values within the table previously created.  If the field contains a value that is not a coded domain value, it will report the feature class, field name, domain, and the corresponding value.
0 Kudos
DianaUmpierre
New Contributor II
Hi Jake, thanks for your help. Using cursors and python lists is a good idea. I got your code ALMOST to work, but let me explain what I had to do and what I still need to figure out.

Basically, a lot of my fc's are inside datasets, so I need to cycle thru those. I added that and works ok. Then I realized I needed to exclude from results nulls and zero strings, I did by adding a SQL statement to the SearchCursor and then in the for row loop, I told it to not append values that are zero strings (using an expression I found in another forum post (len(fieldValues.strip()) <> 0).

The code itself works, but the part I need to figure out is to avoid getting duplicate messages for invalid domain values. The issue is that since the first cursor (to populate listValues()) goes thru each row, if an invalid domain value is used more than once, I get duplicates messages (and I have a lot of that in my feature classes as they were populated by consultants with limited experience using geodatabases). I supposed something I could do is test if that value already exists in my listValues() list before adding it, but that seems like a lot of checking over and over again for the same value as it cycles row to row. Instead, what I think might be "cleaner" and faster code is to populate the listValues() by somehow doing what in SQL we call a SELECT DISTINCT query, not sure how to do that using ArcGIS python code. The other idea could be populating list by extracting the results of the Frequency gp tool, but I think the Frequency tool only exports to actual tables, no virtual tables like table views. So, I would have to add/ delete lots of temp tables in the process.

any other thoughts... Below is the code with the modifications I made so far. Besides figuring out how to avoid dup messages, I need to re-modify code so it also cycles thru fc's that are NOT inside datasets, but that part is easy.

thanks again...

import arcpy
from arcpy import env
env.overwriteOutput = 1
gdb = "C:/MyTempTossDir/test.gdb"
env.workspace = gdb

lstFDs = arcpy.ListDatasets("*", "Feature")
for fd in lstFDs:
 env.workspace = gdb + "/" + fd
 lstFCs = arcpy.ListFeatureClasses("*")
 for fc in lstFCs:
  lstFields = arcpy.ListFields(fc)
  for field in lstFields:
   if field.domain:
    print fc + ": " + field.name + ": " + field.domain
    arcpy.DomainToTable_management(gdb, field.domain, gdb + "/" + "tb_" + field.domain, "code", "description")
    rows = arcpy.SearchCursor(fc, field.name + " IS NOT NULL", "", field.name)
    listValues = []
    for row in rows:
     fieldValues = row.getValue(field.name)
     if len(fieldValues.strip()) <> 0:
      listValues.append(fieldValues)
    del row, rows
    rows = arcpy.SearchCursor(gdb + "/" + "tb_" + field.domain, "", "", "code")
    listCodes = []
    for row in rows:
     fieldCodes = row.getValue("code")
     listCodes.append(fieldCodes)
    del row, rows
    for n in listValues:
     if n not in listCodes:
      print fc + " contains field " + field.name + " with domain " + field.domain + " containing value " + n + " that is not a coded value"
0 Kudos
DianaUmpierre
New Contributor II
Ok, I got it to work. I found a really easy solution. Basically to ensure my list have only unique values, I put the list in a python set and then back to a list (and then sorted it). It sounds silly, but that works fast and I avoided having to create a bunch of temp tables each time I ran the Frequency command on a particular field.  (My thanks to JSkinn3 🙂 for getting me unstuck, after seeing his example with cursors and python lists, I was able to figure out the rest).

Although the code below works perfectly for me and meets my inmediate needs, I recommend doing a few extra things.

>This code doesn't take into account feature classes with subtypes. I am not fond of subtypes, but that's just me. Since I don't use them, I didn't add any code to test domains in subtypes. You are welcomed to enhance the code and add that test.
>In reality, I already have a fgdb with all the domains as lookup tables. That's because I have a fgdb with over 100 feature classes using over 60 domains (and I plan to maintain the lookup tables, not the domains). The code below recreates all of those tables and then deletes them. If you have a lot of domains re-doing that process in the code is overkill. I plan to modify the code later so I can do the comparison against the lookup tables.
>The code can be easily modified to export the results into a table in your geodb or other workspace. I opted to just export the invalid values to a log file and use tabs to make it easy for me to copy to Excel.
>There's minimum error handling in this. My intent was not to create a stand-alone gp tool for sharing, but a quick way to do some QA checks since a lot of our data is maintained by consultants.

Using my test.gdb, the code populate a text file with the following:

******************************************************************
Checking for Invalid Domain Values in Feature Classes Inside Datasets
******************************************************************
Dataset FeatureClass Field Domain InvalidValue
Geodetic AirportControlPoint GPSSUIT CodeBoolean S
Geodetic AirportControlPoint SPCSZONE CodeCoordinateZone FLHP_EF
******************************************************************
Checking for Invalid Domain Values in Standalone Feature Classes
******************************************************************
FeatureClass Field Domain InvalidValue

-------------------------------------
Below is my working code:

#==========================
#**** Check_InvalidDomains *****
# This code creates a text log to identify feature classes
# that has fields that violate domain values
#---------------------------------------------------
# Limitations/ What to Improve:
# > It does not account for subtypes.
# > There's always room to improve error handling
#==========================
import arcpy, os
from arcpy import env

banner = "******************************************************************"
basepath =  "C:/MyTempTossDir"
gdb =  basepath + "/test.gdb"
env.overwriteOutput = 1

try:

 env.workspace = gdb

 logbasepath = basepath
 logname = "GeoDB_DomainValueCheck.txt"
 logpath = logbasepath + "/" + logname
 if os.path.exists(logpath):
  os.remove(logpath) 
 logfile = open(logpath, "w")
  
 #==========================
 logfile.write(banner + "\n")
 msgline = "Checking for Invalid Domain Values in Feature Classes Inside Datasets"
 logfile.write(msgline + "\n")  
 logfile.write(banner + "\n") 
 print banner
 print msgline

 msgline = "Dataset" + "\t" + "FeatureClass" + "\t" + "Field" + "\t" + "Domain" + "\t" + "InvalidValue"
 logfile.write(msgline + "\n") 
 
 lstFDs = arcpy.ListDatasets("*", "Feature")
 for fd in lstFDs:
  env.workspace = gdb + "/" + fd
  lstFCs = arcpy.ListFeatureClasses("*")
  for fc in lstFCs:
   lstFields = arcpy.ListFields(fc)
   for field in lstFields:
    if field.domain:
     arcpy.DomainToTable_management(gdb, field.domain, gdb + "/" + "tb_" + field.domain, "code", "description")
     rows = arcpy.SearchCursor(fc, field.name + " IS NOT NULL", "", field.name)
     listValues = []
     for row in rows:
      fieldValues = row.getValue(field.name)
      if len(fieldValues.strip()) <> 0:
       listValues.append(fieldValues)
     del row, rows
     listUniqValues = list(set(listValues))
     listUniqValues.sort()
     rows = arcpy.SearchCursor(gdb + "/" + "tb_" + field.domain, "", "", "code")
     listCodes = []
     for row in rows:
      fieldCodes = row.getValue("code")
      listCodes.append(fieldCodes)
     del row, rows
     if len(listUniqValues)>1:
      for n in listUniqValues:
       if n not in listCodes:
        msgline = fd + "\t" + fc + "\t" + field.name + "\t" + field.domain +  "\t" + n
        logfile.write(msgline + "\n") 
        msgline = fd + "/" + fc + " contains field " + field.name + " with domain " + field.domain + " with the following invalid domain value: " + n
        print msgline
      
 #==========================
 logfile.write(banner + "\n")  
 msgline = "Checking for Invalid Domain Values in Standalone Feature Classes"
 logfile.write(msgline + "\n")  
 logfile.write(banner + "\n") 
 print banner
 print msgline
 
 msgline = "FeatureClass" + "\t" + "Field" + "\t" + "Domain" + "\t" + "InvalidValue"
 logfile.write(msgline + "\n") 
 
 env.workspace = gdb
 lstFCs = arcpy.ListFeatureClasses("*")
 for fc in lstFCs:
  lstFields = arcpy.ListFields(fc)
  for field in lstFields:
   if field.domain:
    arcpy.DomainToTable_management(env.workspace, field.domain, "tb_" + field.domain, "code", "description")
    rows = arcpy.SearchCursor(fc, field.name + " IS NOT NULL", "", field.name)
    listValues = []
    for row in rows:
     fieldValues = row.getValue(field.name)
     if len(fieldValues.strip()) <> 0:
      listValues.append(fieldValues)
    del row, rows
    listUniqValues = list(set(listValues))
    listUniqValues.sort()
    rows = arcpy.SearchCursor("tb_" + field.domain, "", "", "code")
    listCodes = []
    for row in rows:
     fieldCodes = row.getValue("code")
     listCodes.append(fieldCodes)
    del row, rows
    if len(listUniqValues)>1:
     for n in listUniqValues:
      if n not in listCodes:
       msgline = fc + "\t" + field.name + "\t" + field.domain +  "\t" + n
       logfile.write(msgline + "\n") 
       msgline = fc + " contains field " + field.name + " with domain " + field.domain + " with the following invalid domain value: " + n
       print msgline
      
 env.workspace = gdb     
 lstTables = arcpy.ListTables("tb_*")
 for table in lstTables:
  arcpy.Delete_management(table)     

 del lstFCs
 del lstFDs
 del lstFields
 del listValues
 del listUniqValues
 del listCodes

 # ISSUE:  RefreshCatalog is not working, I don't know why.. Kept this to remind me to figure it out later...
 arcpy.RefreshCatalog(env.workspace)

except:
 print arcpy.GetMessages(2)
finally:
 env.overwriteOutput = ""
 env.workspace = ""
 logfile.close()
0 Kudos
HarshalGharat
New Contributor III

Diana Umpierre​ I am worried that you missed to include a check if a particular domain is Range type or Coded values. Since this check is not in place, your script will result in invalid values for a domain if it is Range type. Here is an example, Let's say you have a range domain on a particular field ranging from 500 to 1000, now the values inside this field may be anything between 500 and 1000 so all values except 500 and 1000 will be marked as invalid values, the reason being when you export such a range domain, your exported table will have only two rows with values i.e. 500 and 1000 and hence rest of the values become invalid.

In short, once you identify that a particular domain is Range, you simply need to check with an if statement if given value is in the range of that range domain or not.

Cheers!

0 Kudos