'ascii' codec can't encode character u'\u201c'

JoeBorgione · ‎02-12-2019

Database qa/qc would be so much easier when no data is entered....

Last week I was dealing with newline characters. (see Where clause for '\n' ). Today I'm getting the following error:

Runtime error 
Traceback (most recent call last):
  File "<string>", line 26, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 10: ordinal not in range(128)
‍‍‍‍

Okay... Best I can tell u201c is a left double quote.(http://www.fileformat.info/info/unicode/char/201C/index.htm ). I put

# -*- coding: utf-8 -*-   ‍

as the first line of my script and it still errors out. Am I doomed, or is there a way past these special characters?

That should just about do it....

Luke_Pinner · ‎02-13-2019

Mixing and matching is going to cause you grief and confuse us . I said it was Python 2.7 as I'd read you were running your code in in 10.6.1.

You can install Spyder for Desktop (python 2.7) and Spyder for Pro (python 3)

JoeBorgione · ‎02-14-2019

Agreed. It can be confusing and often times is. We have a rather large collection of 2.7 scripts that we are in the process of migrating, finding gotchas all along the way.

This particular exercise may be one of futility as it seems everytime I check for one embedded special character, I error out on another regardless of python version...

That should just about do it....

DanPatterson_Retired · ‎02-14-2019

Some distractions for you Joe

http://ptgmedia.pearsoncmg.com/imprint_downloads/informit/promotions/python/python2python3.pdf

JoeBorgione · ‎02-14-2019

Nice. Thanks. I can use a distraction...

That should just about do it....

JoshuaBixby · ‎02-13-2019

If you are working in Python 3, try specifying the encoding when you open the file. From Open - Built-in Functions — Python 3.7.2 documentation

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

In Python 2, you can use the 15.2. io — Core tools for working with streams — Python 2.7.15 documentation module that allows specifying the text file encoding.

RandyBurton · ‎02-12-2019

On occasion, I've used a conversion dictionary as a "hammer":

conversion = {
    u"\u2019": "'", # apostrophe
    u"\u201c": "\"", # double quote
    u"\u00a0": " ", # non breaking space
    "\n" : "=>" # newline
    }

def convert(data):
    for k, v in conversion.items():
        # https://stackoverflow.com/questions/14156473
        data = data.replace(k,v)
    return data.encode('ascii')

print convert("hello\nworld" )
# hello=>world

print convert(u"hello \u201cworld\u201c" )
# hello "world"‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JoeBorgione · ‎02-13-2019

Hammers... Fixing the world's problems one smack at a time! 😉

That should just about do it....

Luke_Pinner · ‎02-12-2019

# -*- coding: utf-8 -*-

This only tells the python interpreter that string literals in your script are utf-8, it doesn't apply to string data that you use in the script.

i.e

status = "It's -10°C here in Überwald and I'm sitting outside at the Café"
print status‍‍

C:\Python27\ArcGIS10.3\python.exe test.py
  File "test.py", line 1
SyntaxError: Non-ASCII character '\xc2' in file test.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details‍‍‍

# -*- coding: utf-8 -*-

status = "It's -10°C here in Überwald and I'm sitting outside at the Café"
print status‍‍‍‍

C:\Python27\ArcGIS10.3\python.exe test.py
It's -10°C here in Überwald and I'm sitting outside at the Café

Process finished with exit code 0
‍‍‍‍

JoeBorgione · ‎02-13-2019

I took Randy's hammer idea and got the following. I can actually get a handle as to which records are the offenders:

""" Run in an ArcMap 10.6.1 Python window.  I got the same output in
    a Spyder console
"""
>>> import arcpy
arcpy.env.workspace = r'J:\WaterQuality\test_tables.gdb'
#arcpy.env.workspace = r'I:\GIS\ArcSDE\SuperUser\pwengfc\SLCOen@pweng.sde'
fields = ['OBJECTID', 'SITENOTES']
table = 'MacrosSamples'

with arcpy.da.SearchCursor(table,fields)as cursor:
    for row in cursor:
        if row[1] == None:
            pass
        elif u"\u201c" in row[1]:
            print(row)
(3, u'Substrate \u201co\u201d = bedrock\nUpper reaches definitely take caution, high flow at this time. Lower reaches nice and easy to sample. Slick rocks. Better with multiple samplers.')
(16, u'Park at Spruces campground lot.\nX site (A) is just downstream of bridge. Work downstream from \u201cA\u201d site.\nSome rocks are very slippery\nSeveral large logs/debris jams in reach')
(33, u'\u201cBedrock\u201d is a calcified structure.\nSafe for 1-2 samplers\nGolfers utilizing course that river runs through\nCulvert at start and middle of reach\nEasy parking in golf course lot')
(65, u'Substrate \u201cother\u201d = bedrock')
(66, u'Substrate \u201cother\u201d = bedrock')
(69, u'Substrate \u201co\u201d = bedrock\nSafe for 1 sampler')
(81, u'Lambs Canyon Creek enters Parleys at beginning of \u201cA\u201d transect. Long hike to x site.  Safe for 1 person.\n')
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

That should just about do it....

Luke_Pinner · ‎02-17-2019

Can you explain a bit more about what you want to do with this text data. Your errors are happening when you print it out and this is expected (in Python 2, though in Python 3 you don't need to worry as much as strings are unicode objects).

For example: I can read in some data with non-ascii characters and write it out to another table with no issues, but if I try to print it (or write it to a text file/spreadsheet) without encoding it, I get the dreaded UnicodeEncodeError. If you want to print it/output it you need to encode it.

table,field = 'c:/temp/default.gdb/test', 'testfield'

with arcpy.da.SearchCursor(table,field) as rows:
    for row in rows:
        data = row[0]
        break
        
print('{} = tablename; {} = fieldvalue'.format(table,data))

# Runtime error 
# Traceback (most recent call last):
#   File "<string>", line 4, in <module>
# UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 8: ordinal not in range(128)

with open('c:/temp/test.txt', 'w') as t:
    t.write(data)

# Runtime error
# Traceback (most recent call last):
#   File "<string>", line 2, in <module>
# UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 8: ordinal not in range(128)

print('{} = tablename; {} = fieldvalue'.format(table,data.encode('utf-8')))
# c:/temp/default.gdb/test = tablename; It's -10°C here in Überwald and I'm sitting outside at the Café = fieldvalue
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Hammers sometimes miss and hit you on the thumb. You can try and replace most of the usual non-ASCII characters, but you'll always run across another...

Your best bet is deal with it as unicode then encode it on output.

If you absolutely must force it to ascii, t ry the hammer (manual replacement), if that fails, just strip them out:

data.encode('ascii', 'ignore')

You could also look at a 3rd party library that takes Unicode data and tries to represent it in ASCII characters - Unidecode · PyPI