Parse Invalid Characters Arcpy

5811
12
Jump to solution
06-24-2015 08:32 AM
RusselKlueg
New Contributor III

Hi all,

I'm back at it again. I've written a script to Iterate through an MXD and export all features within 50 miles of my state to a feature class. There are over 550 FCs to process and it keeps failing due to invalid characters. Does arcpy offer a way to parse out invalid characters when naming a conversion output?

Below is the code i'm referencing

import arcpy
import glob
import os
files = glob.glob(r'C:\Users\JOC-001\Documents\GIS\HSIP\IL_Infrastructure2015\*')
for f in files:
    os.remove(f)

mxd = arcpy.mapping.MapDocument(r'C:\Users\JOC-001\Documents\GIS\HSIP\Infrastructure\HSIP_Gold_2015_Infrastructure.mxd')  
  
layers = arcpy.mapping.ListLayers(mxd)  
  
for lyr in layers:  
    if lyr.isGroupLayer:
        pass
    else:
        print lyr
        arcpy.SelectLayerByLocation_management(lyr, "WITHIN_A_DISTANCE", r"C:\Users\JOC-001\Documents\ArcGIS\Default.gdb\Illinois", "50 Miles", "NEW_SELECTION")  
        arcpy.FeatureClassToFeatureClass_conversion(lyr, r'C:\Users\JOC-001\Documents\GIS\HSIP\IL_Infrastructure2015', str(lyr.name))
Tags (3)
0 Kudos
1 Solution

Accepted Solutions
DarrenWiens2
MVP Honored Contributor

Could you post the entire error message? What is an example of an invalid character? Is it always the same character that you could simply replace or ignore?

You might try Copy Features rather than Feature Class to Feature Class, as well.

View solution in original post

12 Replies
DarrenWiens2
MVP Honored Contributor

Could you post the entire error message? What is an example of an invalid character? Is it always the same character that you could simply replace or ignore?

You might try Copy Features rather than Feature Class to Feature Class, as well.

RusselKlueg
New Contributor III

I actually found a pretty nice method to handle converting a string to Alpha Numeric characters

''.join(ch for ch in str(lyr) if ch.isalnum())

as for the copy features, thats a much better idea than FC2FC.

JoshuaBixby
MVP Esteemed Contributor

Since you didn't give a specific example, I am guessing you are running into layer names that contain Windows file system reserve characters, e.g., a colon, double quote, etc....  Feature Class to Feature Class fails because the underlying call to create the new feature class fails at the OS level.

If you are willing to limit output names to only alphanumeric ASCII characters, you could also try the sub method in the regular expression module.

re.sub('[^A-Za-z0-9]+', '', lyr.name)

XanderBakker
Esri Esteemed Contributor

That is a nice method that Joshua Bixby shows to obtain a valid file name, although in special cases it may produce problems, but I like it (clean and short code). Probably going into to much detail, but there is a slugify project at GitHub that goes more into detail (also for URLs):

python-slugify/slugify.py at master · un33k/python-slugify · GitHub 

and some discussions here:

Turn a string into a valid filename in Python - Stack Overflow

python - Create (sane/safe) filename from any (unsafe) string - Stack Overflow

Validate a filename in python - Stack Overflow

RusselKlueg
New Contributor III

I'm still fairly new to python and I'm not sure how that method works. I would say i know just enough to casue myself a lot of problems! Would you mind giving a quick explanation? If not i'll dive into the python help pages! I did get my code working though. I'll post it below if you want to see it.

import arcpy
import glob
import os


#glob is used to use the wildcard option for anything under this file  to allow for removal of all files
files = glob.glob(r'C:\Users\JOC-001\Documents\GIS\HSIP\IL_Infrastructure2015\*')
for f in files:
    os.remove(f)
#Defines 'mxd' as the Map Document to be referenced
mxd = arcpy.mapping.MapDocument(r'C:\Users\JOC-001\Documents\GIS\HSIP\Infrastructure\HSIP_Gold_2015_Infrastructure.mxd') 
#Defines the layers to be used as everything under Layers dataframe in the table of contents for the mxd
layers = arcpy.mapping.ListLayers(mxd) 
#Iterates over each layer in the list of 'layers'
for lyr in layers:
    #checks to see if any file is a Group layer rather than a single layer
    if lyr.isGroupLayer:
        pass
    #Once layers pass the group layer check this initiates the geoprocessing.
    else:
        lyr2 =  ''.join(ch for ch in str(lyr) if ch.isalnum())
        print lyr2
       
        arcpy.SelectLayerByLocation_management(lyr, "WITHIN_A_DISTANCE", r"C:\Users\JOC-001\Documents\ArcGIS\Default.gdb\Illinois", "50 Miles", "NEW_SELECTION") 
        arcpy.FeatureClassToFeatureClass_conversion(lyr, r'C:\Users\JOC-001\Documents\GIS\HSIP\IL_Infrastructure2015', lyr2)
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

If you plan on continuing with any programming/scripting, I strongly encourage you to learn more about regular expressions.  They are an extremely powerful form of pattern matching, and regular expressions are either built-in or available through libraries in many programming languages, including Python.  : )

Regarding the specific code snippet I gave, let's look at the Python syntax first:  re.sub(pattern, repl, string, count=0, flags=0).  We are not using count or flags, so we need three things:  1)  a regular expression pattern, 2) a replacement string, and 3) a string to apply the regular expression.

Looking next at the regular expression:  [^A-Za-z0-9]+.

  • A bracket expression, [ ], matches a single character that is within the brackets, sort of an implicit or operator.
  • A bracket expression that starts with a caret, [^ ], matches a single character that is not within the brackets.
  • Bracket expressions allow for ranges, e.g., A-Z means any uppercase character from A through Z.
  • The plus sign, +, matches the preceding occurrence/element one or more times.

So, the bracket expression I provided, [^A-Za-z0-9], matches any character that isn't A through Z, a through z, or 0 through 9.  The plus sign is used to match one or more occurrences.  Finally, the matched characters are replaced with an empty string to remove them from the original string.

The regular expression I provided is rather simplistic in that valid non-alphanumeric characters will also be removed, e.g., hyphens and underscores.  If one knew his/her code was only going to be run on Windows, an assumption I wouldn't make myself, a regular expression could be make to only remove Windows file system reserved characters.

As Grant Herbert​ pointed out, you could also use ArcPy's ValidateTableName​ method.  Using that approach, though, the invalid characters are replaced with an underscore and there is no way to change the replacement character.

RusselKlueg
New Contributor III

Oh sweet that's pretty nice and super easy too. I was lost at the bracket. I caught on that you were using ranges but was still kind of confused. This is pretty sweet! I think i'll update my code to use this! Thanks:D

0 Kudos
XanderBakker
Esri Esteemed Contributor

Hi Russel Klueg , I do have some considerations. Have a look at the TOC of a dummy document that I created:

It contains nested grouplayers, some featurelayers a raster layer, special characters.

If I run the following code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import re
import arcpy

fldr = r"C:\Forum"

mxd = arcpy.mapping.MapDocument(r"C:\Forum\ValidName\test.mxd")
df = arcpy.mapping.ListDataFrames(mxd)[0]
for lyr in arcpy.mapping.ListLayers(mxd, '*', df):
    if not lyr.isGroupLayer:
        try:
            print "  - using str(lyr)  : ", ''.join(ch for ch in str(lyr) if ch.isalnum())
        except:
            print "... produced an error"
        print "  - using lyr.name  : ", ''.join(ch for ch in lyr.name if ch.isalnum())
        print "  - using re        : ", re.sub('[^A-Za-z0-9]+', '', lyr.name)
        print "  - using validate TN: ", arcpy.ValidateTableName(lyr.name, fldr)

It will output this:

  - using str(lyr)  :  ... produced an error
  - using lyr.name  :  wielokąty
  - using re        :  wielokty
  - using validate TN:  wielokąty
  - using str(lyr)  :  ... produced an error
  - using lyr.name  :  інфармацыя
  - using re        : 
  - using validate TN:  інфармацыя
  - using str(lyr)  :  ... produced an error
  - using lyr.name  :  Coöperativerésumé
  - using re        :  Coperativersum
  - using validate TN:  Coöperative_résumé
  - using str(lyr)  :  ContinentACountryEThisisaraster
  - using lyr.name  :  Thisisaraster
  - using re        :  Thisisaraster
  - using validate TN:  This_is_a_raster
  - using str(lyr)  :  ContinentACountryEProvinceFLimits
  - using lyr.name  :  Limits
  - using re        :  Limits
  - using validate TN:  Limits
  - using str(lyr)  :  ContinentACountryEProvinceFMinicipalityGConstructions
  - using lyr.name  :  Constructions
  - using re        :  Constructions
  - using validate TN:  Constructions
  - using str(lyr)  :  ContinentACountryBProvinceCMinicipalityDConstructions
  - using lyr.name  :  Constructions
  - using re        :  Constructions
  - using validate TN:  Constructions

What I want to explain with this (that only shows part of the problems one can encounter) is:

  • using str(lyr) instead of lyr.name will include the name of the (nested) grouplayer(s) it resides in
  • using str(lyr) will produce an error when you have special characters (diacritical marks)
  • a name is valid although it contains special characters. The re expression though can result in eliminating all characters
  • When you have a layer name that is used twice in your TOC, you will have to handle what should be done to create a unique file name

These are just some considerations. Maybe none of them apply for your specific case, you know your data and you are the only one to run the code, then this will not be a problem. However, when create a tool that others will run, then you should account for all these cases.

Kind regards, Xander

RusselKlueg
New Contributor III

Xander Bakker​, The way my code is written, that is part of the reason I chose to use str(lyr) i'm not exactly a strong scripter and using the group layer name to create unique files was easiest for me . I'm self taught and only over the last couple months.

I've still got a lot to learn. Especially in arcpy. I really appreciate your advice:D

0 Kudos