...a little late, but I think this will help below. Character encoding is interesting, although a full understanding of it on my part is a work in progress. So at the start I apologize for my limited explanation and limited facility in French as well.Basically, to cut to the chase, the 'raw' entry of your unicode character is not 'understood' by python in order to encode it properly...so of course when you try to get it back (as with a print statement), you typically get nonsense. With my standard output (stout) set as cp1252, I get: �?©If you prefix it with 'u' (to distinguish it from a possible non-unicode character), you should be in business with the declared utf-8 formatting.Not sure if you have the unicodedata module in your python version, but I think you'll get the idea just reading the script and corresponding output (output included further below). Try this short demo (my system is different from yours, so also attached is the script for you to run on your own system):
# -*- coding: utf-8 -*-
import sys, unicodedata
print 'The default encoding is {0}'.format(sys.getdefaultencoding())
print 'The standard output encoding is {0}\n'.format(sys.stdout.encoding)
a= u"\xe9"
print 'Trial 1- variable \'a\' is:', a
print '...and this is the type: ', type(a)
print 'This character is {0}\n'.format(unicodedata.name(a))
a= u"é"
print 'Trial 2- variable \'a\' is:', a
print '...and this is the type: ', type(a)
print 'This character is {0}\n\n'.format(unicodedata.name(a))
a= u"Wow caractères Unicode peuvent être difficiles à manipuler, ce qui avec le codage et le décodage en cours."
print 'Trial 3- (a bunch of French with accented characters):\n'
print a
print '\n...and this is the type: ', type(a)
a="é"
print '\n\nTrial 4- variable \'a\' is now becomes gibberish, improperly encoded utf-8:', a
print '...and this is the type: ', type(a)
On my system, I get back this (from the print statements):
>>>
The default encoding is ascii
The standard output encoding is cp1252
Trial 1- variable 'a' is: é
...and this is the type: <type 'unicode'>
This character is LATIN SMALL LETTER E WITH ACUTE
Trial 2- variable 'a' is: é
...and this is the type: <type 'unicode'>
This character is LATIN SMALL LETTER E WITH ACUTE
Trial 3- (a bunch of French with accented characters):
Wow caractères Unicode peuvent être difficiles à manipuler, ce qui avec le codage et le décodage en cours.
...and this is the type: <type 'unicode'>
Trial 4- variable 'a' is now becomes gibberish, improperly encoded utf-8: �?©
...and this is the type: <type 'str'>
>>>
-WaynePS- By the way, if I feed in the French in trial 3 above and forget the 'u' prefix, this is the resulting feedback printed, which should now be no surprise:
Wow caract�?¨res Unicode peuvent �?ªtre difficiles �?* manipuler, ce qui avec le codage et le d�?©codage en cours.