Select to view content in your preferred language

Unicode Errors

1243
0
07-07-2020 08:02 AM
JoeBorgione
MVP Emeritus

More than once I've come across unicode errors when using a da cursor (see 'ascii' codec can't encode character u'\u201c'  ) or some other type of field value manipulation ( Where clause for '\n' ).  My latest adventure into these errors is using python to execute a sql query through pyodbc, writing the results to a csv file.  The end game to this is to get the data out of the SQL server db and into a fgdb.

Life is all about trade-offs.  I prefer to use the python-odbc-to-csv approach as it preserves date-type fields even when the fields have null or no values in them. But, the trade off is you can run into unicode errors like this one:

UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 226: character maps to <undefined>

Another tactic is to use Sql Server Management Studio (SSMS) to execute the query and then export the results to a csv; the trade off with this approach is date fields with null or empty values get automagically converted into text fields, rendering useless for date range queries. 

Admittedly, I'm not the sharpest knife in the drawer when it comes to codecs and related issues, but I wanted to see if I could figure out a way past the unicode errors, and keep the date fields intact. In my googling I stumbled upon a couple of resources: codecs – String encoding and decoding - Python Module of the Week  and Unicode HOWTO — Python 3.8.4rc1 documentation  which have helped put a little better edge on my otherwise dull blade.

Here is the code I'm working with:

import pyodbc as odbc
import csv, os


conn = odbc.connect('Driver={SQL Server};'
                    'Server=xxxxx;'
                    'Database=HANSEN;')

outFolder = r'C:\Hansen'                    

query = """select * 
            from imsv7.APCASE
            """

cursor = conn.cursor()
cursor.execute(query)

rows = cursor.fetchall()
column_names = [i[0] for i in cursor.description]
dotcsv = 'APCASE_py.csv'
outFile = open(os.path.join(outFolder,dotcsv),'w',newline ='')
myFile = csv.writer(outFile)
myFile.writerow(column_names)
myFile.writerows(rows)

cursor.close()

Focus your attention on line 21. By making a slight adjustment I have been able to overcome the error shown above:

outFile = open(os.path.join(outFolder,dotcsv),'w',newline ='',encoding = 'utf-16')

The addition of encoding = 'utf-16' is apparently all it takes.  And, as with life's trade offs, opinions are a dime a dozen and even cheaper now with internet forums: What are reasons to use UTF-16 instead of UTF-8 in some situations? - Quora .  

The bottom line for me is I am able to get past the error, maintain my date fields and get the results I'm after.  In the future, I'll probably use the encoding = 'utf-16' as my default, to avoid any errors.

That should just about do it....
Tags (2)
0 Replies