'ascii' codec can't encode character u'\u201c'

17700
28
02-12-2019 02:22 PM
JoeBorgione
MVP Emeritus

Database qa/qc would be so much easier when no data is entered....

Last week I was dealing with newline characters. (see Where clause for '\n' ).  Today I'm getting the following error:

Runtime error 
Traceback (most recent call last):
  File "<string>", line 26, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 10: ordinal not in range(128)

Okay...  Best I can tell u201c is a left double quote.(http://www.fileformat.info/info/unicode/char/201C/index.htm ). I put

# -*- coding: utf-8 -*-   

as the first line of my script and it still errors out.  Am I doomed, or is there a way past these special characters?

That should just about do it....
0 Kudos
28 Replies
JoeBorgione
MVP Emeritus

You can try and replace most of the usual non-ASCII characters, but you'll always run across another... And that's exactly what I encountered.  The field I need to edit is a 'comments/field notes' field so just about anything and everything is 'allowed'.  

Ultimately, I just need to replace the newline with another symbol in the database field.  I'm not at work today so I don't have the code in front of me but I tried either the decode or encode method with an ignore argument. It didn't quite work out; I must still have a flaw in my business logic.   Thankfully this project got pushed to a back burner in favor of some others.

I'll give the encode(ascii, ignore) method approach when I get back to work.  Thanks!

That should just about do it....
0 Kudos
DanPatterson_Retired
MVP Emeritus

I suspect \ in the field, turns a lot of stuff into unicode

JoeBorgione
MVP Emeritus

Could be.  This exercise just screams my disdain for free text, anything goes feilds in a data base...

That should just about do it....
0 Kudos
RandyBurton
MVP Alum

And auto correct can be so helpful , like adding an accented e at the end of "café" etc. and thus sending it into extended ASCII or Unicode.

DanPatterson_Retired
MVP Emeritus

python 3.6.8

uni = [u"\u2019",  u"\u201c", u"\u00a0", u"Überwald",
 u'Substrate \u201co\u201d = bedrock\nUpper reaches definitely take caution,
 high flow at this time. Lower reaches nice and easy to sample. Slick rocks. Better with multiple samplers.',
 u'Park at Spruces campground lot.\nX site (A) is just downstream of bridge. Work downstream from \u201cA\u201d site.\nSome rocks are very slippery\nSeveral large logs/debris jams in reach',
 u'\u201cBedrock\u201d is a calcified structure.\nSafe for 1-2 samplers\nGolfers utilizing course that river runs through\nCulvert at start and middle of reach\nEasy parking in golf course lot',
 u'Substrate \u201cother\u201d = bedrock',
 u'Substrate \u201cother\u201d = bedrock',
 u'Substrate \u201co\u201d = bedrock\nSafe for 1 sampler',
 u'Lambs Canyon Creek enters Parleys at beginning of \u201cA\u201d transect. Long hike to x site.  Safe for 1 person.\n']

for i in uni:
    print(i.encode().decode())
    
---   
-  Überwald
-  Substrate “o” = bedrock
Upper reaches definitely take caution, high flow at this time. Lower reaches nice and easy to sample. Slick rocks. Better with multiple samplers.
-  Park at Spruces campground lot.
X site (A) is just downstream of bridge. Work downstream from “A” site.
Some rocks are very slippery
Several large logs/debris jams in reach
-  “Bedrock” is a calcified structure.
Safe for 1-2 samplers
Golfers utilizing course that river runs through
Culvert at start and middle of reach
Easy parking in golf course lot
-  Substrate “other” = bedrock
-  Substrate “other” = bedrock
-  Substrate “o” = bedrock
Safe for 1 sampler
-  Lambs Canyon Creek enters Parleys at beginning of “A” transect. Long hike to x site.  Safe for 1 person.

0 Kudos
JoeBorgione
MVP Emeritus

Finally got resolution to this on,  thanks to Joshua at ESRI tech support.  He had me add a couple of lines to my script and all is well...

# -*- coding: utf-8 -*-   """ This has always been my very first line """"

import arcpy,time,datetime,os,sys
""" added the two lines below """
reload(sys)             
sys.setdefaultencoding('utf-8')‍‍‍‍‍‍

I would have thought that line 1 would have taken care of things, but I guess not..

There is a gotcha though: I've added lines 5 and 6 to my script that is the basis of a script tool used in ArcMap or Catalog 10.6.1. If I use those two lines in a standalone script and execute from a Spyder console (associated with python 3.x) it fails.  See this stackoverflow post  that states:

Also, the use of sys.setdefaultencoding() has always been discouraged, and it has become a no-op in py3k. The encoding of py3k is hard-wired to "utf-8" and changing it raises an error.

 

So what's the answer?  I guess it depends.....

That should just about do it....
0 Kudos
DanPatterson_Retired
MVP Emeritus

quit using arcmap I suppose

0 Kudos
JoeBorgione
MVP Emeritus

Yeah but.....  Our customer base is pretty broad, lots of ArcMap ArcCatalog users, and a lot of them have theirs heels dug in pretty deep.  So as sys admin types, myself and the others my team have to play to both sides of the fence. We run tons of overnight scheduled tasks for individual agency eGDBs and this will be one of them, now that I know how to avoid the errors I was encountering. It's a coin toss which way to go.

The real issue though is this interim time where python 2.x and 3.x are overlapping along with respective Arcpy functionalities.  Job security through the remainder of my career I suppose...  

That should just about do it....
0 Kudos
Luke_Pinner
MVP Regular Contributor

Joe Borgione wrote:

# -*- coding: utf-8 -*-   """ This has always been my very first line """"

I would have thought that line 1 would have taken care of things, but I guess not..

 

So what's the answer?  I guess it depends.....

As I mentioned earlier the special coding: comment line only applies to string literals in your code, i.e literally written in the .py file, not to any data/variables.

The real answer is to treat your strings as unicode (cos they are when read from a cursor) and do any string munging with unicode strings, i.e univar = row[0].replace(u'\n', u'==>') then encode on output univar.encode('utf8').