'ascii' codec can't encode character u'\u201c'

17790
28
02-12-2019 02:22 PM
JoeBorgione
MVP Emeritus

Database qa/qc would be so much easier when no data is entered....

Last week I was dealing with newline characters. (see Where clause for '\n' ).  Today I'm getting the following error:

Runtime error 
Traceback (most recent call last):
  File "<string>", line 26, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 10: ordinal not in range(128)

Okay...  Best I can tell u201c is a left double quote.(http://www.fileformat.info/info/unicode/char/201C/index.htm ). I put

# -*- coding: utf-8 -*-   

as the first line of my script and it still errors out.  Am I doomed, or is there a way past these special characters?

That should just about do it....
0 Kudos
28 Replies
Luke_Pinner
MVP Regular Contributor

Mixing and matching is going to cause you grief and confuse us .  I said it was Python 2.7 as I'd read you were running your code in in 10.6.1.

You can install Spyder for Desktop (python 2.7) and Spyder for Pro (python 3)

0 Kudos
JoeBorgione
MVP Emeritus

Agreed.  It can be confusing and often times is. We have a rather large collection of 2.7 scripts that we are in the process of migrating, finding gotchas all along the way. 

This particular exercise may be one of futility as it seems everytime I check for one embedded special character, I error out on another regardless of python version...

That should just about do it....
0 Kudos
JoeBorgione
MVP Emeritus

Nice.  Thanks.  I can use a distraction...

That should just about do it....
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

If you are working in Python 3, try specifying the encoding when you open the file.  From Open - Built-in Functions — Python 3.7.2 documentation 

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

In Python 2, you can use the 15.2. io — Core tools for working with streams — Python 2.7.15 documentation module that allows specifying the text file encoding.

RandyBurton
MVP Alum

On occasion, I've used a conversion dictionary as a "hammer":

conversion = {
    u"\u2019": "'", # apostrophe
    u"\u201c": "\"", # double quote
    u"\u00a0": " ", # non breaking space
    "\n" : "=>" # newline
    }

def convert(data):
    for k, v in conversion.items():
        # https://stackoverflow.com/questions/14156473
        data = data.replace(k,v)
    return data.encode('ascii')

print convert("hello\nworld" )
# hello=>world

print convert(u"hello \u201cworld\u201c" )
# hello "world"
JoeBorgione
MVP Emeritus

Hammers... Fixing the world's problems one smack at a time!    😉

That should just about do it....
0 Kudos
Luke_Pinner
MVP Regular Contributor
# -*- coding: utf-8 -*-   

This only tells the python interpreter that string literals in your script are utf-8, it doesn't apply to string data that you use in the script.

i.e

status = "It's -10°C here in Überwald and I'm sitting outside at the Café"
print status
C:\Python27\ArcGIS10.3\python.exe test.py
  File "test.py", line 1
SyntaxError: Non-ASCII character '\xc2' in file test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
# -*- coding: utf-8 -*-

status = "It's -10°C here in Überwald and I'm sitting outside at the Café"
print status
C:\Python27\ArcGIS10.3\python.exe test.py
It's -10°C here in Überwald and I'm sitting outside at the Café

Process finished with exit code 0
JoeBorgione
MVP Emeritus

I took Randy's hammer idea and got the following.  I can actually get a handle as to which records are the offenders:

""" Run in an ArcMap 10.6.1 Python window.  I got the same output in
    a Spyder console
"""
>>> import arcpy
arcpy.env.workspace = r'J:\WaterQuality\test_tables.gdb'
#arcpy.env.workspace = r'I:\GIS\ArcSDE\SuperUser\pwengfc\SLCOen@pweng.sde'
fields = ['OBJECTID', 'SITENOTES']
table = 'MacrosSamples'

with arcpy.da.SearchCursor(table,fields)as cursor:
    for row in cursor:
        if row[1] == None:
            pass
        elif u"\u201c" in row[1]:
            print(row)
(3, u'Substrate \u201co\u201d = bedrock\nUpper reaches definitely take caution, high flow at this time. Lower reaches nice and easy to sample. Slick rocks. Better with multiple samplers.')
(16, u'Park at Spruces campground lot.\nX site (A) is just downstream of bridge. Work downstream from \u201cA\u201d site.\nSome rocks are very slippery\nSeveral large logs/debris jams in reach')
(33, u'\u201cBedrock\u201d is a calcified structure.\nSafe for 1-2 samplers\nGolfers utilizing course that river runs through\nCulvert at start and middle of reach\nEasy parking in golf course lot')
(65, u'Substrate \u201cother\u201d = bedrock')
(66, u'Substrate \u201cother\u201d = bedrock')
(69, u'Substrate \u201co\u201d = bedrock\nSafe for 1 sampler')
(81, u'Lambs Canyon Creek enters Parleys at beginning of \u201cA\u201d transect. Long hike to x site.  Safe for 1 person.\n')
That should just about do it....
0 Kudos
Luke_Pinner
MVP Regular Contributor

Can you explain a bit more about what you want to do with this text data. Your errors are happening when you print it out and this is expected (in Python 2, though in Python 3 you don't need to worry as much as strings are unicode objects).

For example: I can read in some data with non-ascii characters and write it out to another table with no issues, but if I try to print it (or write it to a text file/spreadsheet) without encoding it, I get the dreaded UnicodeEncodeError.  If you want to print it/output it you need to encode it.

table,field = 'c:/temp/default.gdb/test', 'testfield'

with arcpy.da.SearchCursor(table,field) as rows:
    for row in rows:
        data = row[0]
        break
        
print('{} = tablename; {} = fieldvalue'.format(table,data))

# Runtime error 
# Traceback (most recent call last):
#   File "<string>", line 4, in <module>
# UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 8: ordinal not in range(128)

with open('c:/temp/test.txt', 'w') as t:
    t.write(data)

# Runtime error
# Traceback (most recent call last):
#   File "<string>", line 2, in <module>
# UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 8: ordinal not in range(128)

print('{} = tablename; {} = fieldvalue'.format(table,data.encode('utf-8')))
# c:/temp/default.gdb/test = tablename; It's -10°C here in Überwald and I'm sitting outside at the Café = fieldvalue

Hammers sometimes miss and hit you on the thumb.  You can try and replace most of the usual non-ASCII characters, but you'll always run across another...

Your best bet is deal with it as unicode then encode it on output.

If you absolutely must force it to ascii, t ry the hammer (manual replacement), if that fails, just strip them out:

data.encode('ascii', 'ignore')

You could also look at a 3rd party library that takes Unicode data and tries to represent it in ASCII characters - Unidecode · PyPI