Unicode issue with Python and cx_Oracle module...

StephenMitchell · ‎04-02-2010

Synopsis

I am developing an integration process, using Python Win 2.5.4, that performs both geo- and tabular manipulations, and the process must work with standard Oracle 10g R2 databases (non-SDE databases) from across the enterprise. It works well in this regards.

However, we also have an Oracle 10g R 2 'ASP' GDB in an enterprise SDE, and cx_Oracle query results appear to be returning unicode when in fact the unicode is part of the output string.

Specifics:

Here's the crux of how 'cx_Oracle' is used to return results from an Oracle table within our ASP GDB:

>>> import os
>>> os.environ["NLS_LANG"] = "AMERICAN_AMERICA.WE8MSWIN1252"
>>> import cx_Oracle
>>> import string
>>> oraConnectAsp =          cx_Oracle.Connection(oraUserAsp + "/" + oraPwAsp + "@" + oraDbAsp)
>>> oraCursorAsp =              oraConnectAsp.cursor()
>>> whereCol =                        "a2a2_" + string.lower(envArg)
>>> sql =                                 "SELECT sys_acct FROM " + tblAspNotify + " WHERE " + whereCol + " = 'Y'"
>>> print sql
SELECT sys_acct FROM ASP.ASP_Notification WHERE a2a2_dev = 'Y'
>>> oraCursorAsp.execute(sql)
<__builtin__.OracleCursor on <cx_Oracle.Connection to asp@SDEPID>>
>>> statusMsgList =             oraCursorAsp.fetchall()
>>> print statusMsgList
[(u'nmassaro',), (u'smitchel',)]
>>> print statusMsgList[0]
(u'nmassaro',)

The output looks like unicode but in fact it is part of the string...

>>> x = statusMsgList[0]
>>> x
(u'nmassaro',)
>>> print x
(u'nmassaro',)

...whereas unicode behaves as follows:

>>> x = u"nmassaro"
>>> x
u'nmassaro'
>>> print x
nmassaro

As stated, using 'cx_Oracle' to connect to a standard Oracle 10g R2 database (ie, a non SDE database) it is not an issue. Here's an example of the results of a query on a table in an AGED database:

>>> oraConnectAge =     cx_Oracle.Connection(oraUserAge + "/" + oraPwAge + "@" + oraDbAge)
>>> oraCursorAge =   oraConnectAge.cursor()
>>> itmStr =                             "upunttime, uphour, uphour, upminute, usleep, udependency"
>>> sql =                                "SELECT " + itmStr + " FROM " + tblDb_UpdtParam + " WHERE uname = 'A2A2'"
>>> print sql
>>> oraCursorAge.execute(sql)
<__builtin__.OracleCursor on <cx_Oracle.Connection to geocode@AGED>>
>>> oraResultAge =           oraCursorAge.fetchall()
>>> print oraResultAge
[('01:09:00', 1, 1, 9, 30, 'A2A1_FINISHED SW_FINISHED')]
>>> updtDependency =     string.split(str(oraResultAge[5]))
>>> print updtDependency
['A2A1_FINISHED', 'SW_FINISHED']
>>> print str(updtDependency[0]) + " " + str(updtDependency[1])
A2A1_FINISHED SW_FINISHED

If the query results were truly Unicode I could convert to ASCII, but when what appears to be Unicode is part of the output then the onus is a lot of string manipulation- not desirable.

Ideas how to return either simple ASCII or, at the worst, true Unicode?

Thanks-

JasonScheirer · ‎04-04-2010

Looks like it's giving you a list of tuples of unicodes instead of a list of unicodes. Try x = statusMsgList[0][0] instead, and if you REALLY need a Windows/Western European encoded string and not the unicode (ie you know you're only going to be dealing with American and/or Western European characters in your text), use x = statusMsgList[0][0].encode("cp1252", "replace")

StephenMitchell · ‎04-05-2010

Thank you- your suggestion works fine.