Unicode error

7498
11
07-29-2015 02:22 PM
AmyKlug
Occasional Contributor III

Hi,

When i am running this code i am getting a Unicode error half way through. Not sure what mxd/layer file name is hanging it up (line 33) on but not sure where to put the fix (line 7 - or if this is the correct fix) either.

unicodeError.JPG

import arcpy, os

#code adds mxd name and layer path name to text file separated by a comma
arcpy.env.overwriteOutput = True


#def Utf8EncodeArray(oldArray):
    #newArray = []
    #for element in oldArray:
        #if isinstance(element, unicode):
        #newArray.append(element.encode("utf-8"))
    #else:
        #newArray.append(element)
    #return newArray


path = "////serverpath"
#path2 =
mxdlst = []
txt = open("text file path", 'w')
print "making mxd list"
for root, dirs, files in os.walk(path):
    for fname in files:
        if fname.endswith(".mxd"):
            mxd = root + '\\' + fname
            mxdlst.append(mxd)
del mxd, fname
for mapdoc in mxdlst:
    mxd = arcpy.mapping.MapDocument(mapdoc)
    for df in arcpy.mapping.ListDataFrames(mxd, "*"):
        for lyrlst in arcpy.mapping.ListLayers(mxd, "*", df):
            if lyrlst.supports("DATASOURCE"):
                txt.write(mapdoc + "," + lyrlst.workspacePath + "\\" + lyrlst.name + "\n")
                print "adding" + mapdoc + "," + lyrlst.workspacePath + "\\" + lyrlst.name + "\n"
            else:
                txt.write(mapdoc + "," + lyrlst.name + "\n")
                print "adding" + mapdoc + "," + lyrlst.name + "\n"
txt.close()
del mxd, df, lyrlst, mapdoc, mxdlst
Tags (3)
0 Kudos
11 Replies
DanPatterson_Retired
MVP Emeritus

What is the layer name etc?  If it contains  characters that need to be converted, then you will have to do so

0 Kudos
AmyKlug
Occasional Contributor III

I need code to check for that and fix it just not sure where to put it

0 Kudos
DanPatterson_Retired
MVP Emeritus

haven't had the unicode issues yet, but apparently, I will have to and one suggestion is to specify encoding at the top of the script with

# -*- coding: utf-8 -*-

​but some unicode types should step up since it appears that you have a character that can't be represented by the ASCII chars in the range 0-127

0 Kudos
AmyKlug
Occasional Contributor III

I have seen that before. does it work with the # sign in front?

0 Kudos
DanPatterson_Retired
MVP Emeritus

apparently that is what is supposed be done, first line.  but I seriously haven't played with encoding other than ascii ... I really should given accented characters etc, but so far, haven't had to deal with.  My only suggestion is find someone that works with such data and or look at at one of the files and see what characters are present there.  Sorry I can't help more, but searching on your error message on GeoNet may turn up more

0 Kudos
XanderBakker
Esri Esteemed Contributor

An interesting article to read would be:

Solving Unicode Problems in Python 2.7 | Azavea Labs

And to give an example of what works and fails:

# -*- coding: utf-8 -*-
myText = u"example of únì¢ødë"

# this works:
print myText

# UnicodeEncodeError: 'ascii' codec can't encode character u'\xfa' in position 11: ordinal not in range(128)
print "{0}".format(myText)
# inserting unicode in a str

# This works 
print u"{0}".format(myText)
# inserting unicode into a unicode

# UnicodeEncodeError: 'ascii' codec can't encode character u'\xfa' in position 11: ordinal not in range(128)
print u"{0}".format(myText.decode('utf-8'))
# inserting utf-8 into unicode
0 Kudos
DanPatterson_Retired
MVP Emeritus

what a kludge...

# -*- coding: utf-8 -*-
  1. chars = [unichr(i) for i in range(0,256) if (32 < i < 128) or (i > 161)]
print "Unicode characters 33-127 and 161-255\n" + ("{:5}"*len(chars)).format(*chars)

Unicode characters 33-127 and 161-255

!    "    #    $    %    &    '    (    )    *    +    ,    -    .    /    0    1    2    3    4    5    6    7    8    9    :    ;    <    =    >    ?    @    A    B    C    D    E    F    G    H    I    J    K    L    M    N    O    P    Q    R    S    T    U    V    W    X    Y    Z    [    \    ]    ^    _    `    a    b    c    d    e    f    g    h    i    j    k    l    m    n    o    p    q    r    s    t    u    v    w    x    y    z    {    |    }    ~        ¢    £    ¤    ¥    ¦    §    ¨    ©    ª    «    ¬    ­    ®    ¯    °    ±    ²    ³    ´    µ    ¶    ·    ¸    ¹    º    »    ¼    ½    ¾    ¿    À    Á        à   Ä    Å    Æ    Ç    È    É    Ê    Ë    Ì    Í    Π   Ï    Р   Ñ    Ò    Ó    Ô    Õ    Ö    ×    Ø    Ù    Ú    Û    Ü    Ý    Þ    ß    à    á    â    ã    ä    å    æ    ç    è    é    ê    ë    ì    í    î    ï    ð    ñ    ò    ó    ô    õ    ö    ÷    ø    ù    ú    û    ü    ý    þ    ÿ 

anything not visible, isn't...can't wait to learn all of them

0 Kudos
Luke_Pinner
MVP Regular Contributor

Dan Patterson:

...one suggestion is to specify encoding at the top of the script with # -*- coding: utf-8 -*-

This only applies to literal characters in the python script file itself, not any string variables when the script is run.

test_noenc.py

somestr = u'über còól'
print somestr

test_enc.py

# -*- coding: utf-8 -*-
somestr = u'über còól'
print somestr

C:\Temp>python test_noenc.py
  File "test_noenc.py", line 1
SyntaxError: Non-ASCII character '\xfc' in file test_noenc.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

C:\Temp>python test_enc.py
über còól

0 Kudos
DanPatterson_Retired
MVP Emeritus

Thanks Luke...fixed