But your data does contain some "unexpected data". The error message says so:
AttributeError: <unprintable AttributeError object>
<type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode characters in position 29-37: ordinal not in range(128)
It even tells you where the characters are. It is very easy these days to enter unicode by mistake from the keyboard or a web page cut and paste.Not all Python functions handle unicode as if they were ascii. If you have tried to convert a unicode string that contains a character other than the ascii set then you need to encode/decode the value to an equivalent ascii value. Just doing>> str(u'\u0100')
Traceback (most recent call last):
File "<interactive input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0100' in position 0: ordinal not in range(128)
will upset Python>>> str(u'\u0100'.encode('utf-8'))
'\xc4\x80'
And this will probably upset the reader, but it will explain the problem.Here is what I had to work out to add macrons to a database
# unicode.py This Python file uses the following encoding: utf-8
import encodings.utf_8_sig
comment = """
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
To make your pages display properly in Unicode, you need to change this to:
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
import codecs
f = codecs.open("file","w",encoding='utf-8')
f.write(stuff)
f.close()
"""
##Ā = A-
##Ē = E-
##Ī = I-
##Ō = O-
##Ū = U-
##ā = a-
##ē = e-
##ī = i-
##ō = o-
##ū = u-
dMacron = {
"A" : u'\u0100',
"a" : u'\u0101',
"E" : u'\u0112',
"e" : u'\u0113',
"I" : u'\u012A',
"i" : u'\u012B',
"O" : u'\u014C',
"o" : u'\u014D',
"U" : u'\u016A',
"u" : u'\u016B'
}
f1 = open("e:/project/training/unicode/unicode_sample.txt","w")
print
# read-only f1.encoding # = "utf8"
# print >> f1,dMacron # cannot do this to ascii encoder
mO = u"\N{LATIN CAPITAL LETTER O WITH MACRON}"
word = "Maori".replace("o",dMacron["o"])
print >> f1,word.encode('utf-8')
# PyUnicode.__print__ word.encode('utf-8')
lstKey = dMacron.keys()
lstKey.sort()
for k in lstKey :
print >> f1,dMacron.encode('utf-8'),
print dMacron,
##mA = u'\u0100'
##ma = u'\u0101'
##mE = u'\u0112'
##me = u'\u0113'
##mI = u'\u012A'
##mi = u'\u012B'
##mO = u'\u014C'
##mo = u'\u014D'
##mU = u'\u016A'
##mu = u'\u016B'
# word = "Maori".replace("o",dMacron["o"])
mO = u"\N{LATIN CAPITAL LETTER O WITH MACRON}"
word = "MAORI".replace("O",mO)
print >> f1,word.encode('utf-8')
print word
f1.close()
print "success"
I once reinstalled the entire operating system on a Sun 3, including printer drivers when the new postscript printer stopped working.It had stopped because the printer had run out of paper and the Sun driver did not understand postscript error messages. 🙂I don't ever reinstall a software package when I hit a bug now, life is too short.