ArcPy Encoding Problem

7156
7
Jump to solution
02-23-2019 02:34 AM
Nicole_Ueberschär
Esri Regular Contributor

Hello folks, 

I have to work with txt files that contain special (non-ascii) characters like in "Temperature (°C)" or "Chl_A (µg/L)". 

When I read the field names directly from the table with utf-8 encoding and list them in Python I get 'Temperature (\xc2\xb0C)' and 'Chl_A (\xb5g/L)'. 

I receive the names of the fields to work on from a script tool where the validation tool reads the field names (with utf-8 encoding) and gives them back as a list to chose from (where they look "properly spelled"). These chosen ones are then received from my python script with GetParameterAsText and then split up.

fieldnames=arcpy.GetParameterAsText(3)
fieldnameslist=fieldnames.split(";")

When I read and write those I get  [u"'Chl_A (\xb5g/L)'", u"'Temperature (\xb0C)'"]

Note also the difference between the first and the second output: (\xc2\xb0C) and  (\xb0C)  which seem to be two different unicode encodings. When I read them again in a new list with utf-8 encoding I get ["'Chl_A (\xc2\xb5g/L)'", "'Temperature (\xc2\xb0C)'"]

Now when I try to use the field name again to read from the table the values I need for further calculations the field name is not recognised and I guess that is because of the encoding. Interestingly (at least for me), when I read from the utf-8 list or from the other the field name I get again "Temperature (°C)" or "Chl_A (µg/L)" as an output. So I assume I have to "translate" the symbols back into hexa spelling to be able to communicate with the txt file. 

When I hardcode the field name to be 'Chl_A (\xb5g/L)' in the arcpy.message I still get Chl_A (µg/L) but then the search cursor complains that "'utf8' codec can't decode byte 0xb5 in position 7: invalid start byte". When I use u'Chl_A (\xb5g/L)' the field name is recognized. 

How can I give the field names from my list so that the script can read them and they are recognized in the table again for the search cursor?

I'm using Python 2.7.14 with ArcMap 10.6.1

PS: Looking for further information I tumbled over this read, which might give you something to laugh for today: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Cha... (plus some well explained information I think, but it didn't help solve my problem...). 

0 Kudos
1 Solution

Accepted Solutions
RandyBurton
MVP Alum

Thanks for attaching the files.  From that I noticed, you wanted the option to select multiple fields.  I modified the validator code in your other post for this option.

I believe that the encoding issues arise because the tool interface appears to pass values that contain special characters in single quotes. From the tool printout, note line 7 below. The two fields with special characters are enclosed within single quotes and inside double quotes.

Executing: encodingtest C:\path\to\folder\encoding 'Chl_A (µg/L)';'Temperature (°C)';pH
Start Time: Sun Feb 24 19:55:53 2019
Running script encodingtest...

Table view field names: ['Original_file_&_sheet', 'Campaign', 'Profile_No', 'Date', 'Longitude (degrees_east)', 'Latitude (degrees_north)', 'Depth (m)', 'Temperature (\xc2\xb0C)', 'pH', 'Chl_A (\xc2\xb5g/L)', 'UID', 'UIDGraph']

fieldnamesliste from parameter: ["'Chl_A (\xc2\xb5g/L)'", "'Temperature (\xc2\xb0C)'", 'pH']‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

By trimming the quotes, I was able to get your script to work.  Here's the section I modified; it starts around line 142 in the original code.  The .decode("utf-8") in line 4 below may be the result of some of the encoding/decoding you were doing earlier in the script.  I'm not sure what of that code can safely be removed.

def PrintPlot(gs, gsNr, table, fieldname, min, max, xlabelname, colorL):
    if fieldname[0] == "'":
        arcpy.AddMessage("apostrophe found")
        fieldname = fieldname[1:-1].decode("utf-8")
        xlabelname = fieldname
    fields = [fieldname, "Depth (m)"]‍‍‍‍‍‍‍‍‍‍‍‍

Hope this helps.

View solution in original post

7 Replies
DanPatterson_Retired
MVP Emeritus

I don't have python 2.7 but in python 3.6.8

u"'Chl_A (\xb5g/L)'".encode().decode()

"'Chl_A (µg/L)'"

u"'Temperature (\xb0C)'".encode().decode()

"'Temperature (°C)'"
Nicole_Ueberschär
Esri Regular Contributor

Hi Dan, 

At which point would you suggest I need to do the decoding?

fieldnames=arcpy.GetParameterAsText(3)
fieldnamesliste=fieldnames.split(";")
fieldnames_decoded=[field.encode("utf-8").decode("utf-8") for field in fieldnamesliste]
fieldnames_encoded=[field.encode("utf-8") for field in fieldnamesliste]

From what I have seen until now I would assume I have somehow to use the fieldname including the u".." which I get when not en- and decoding or use encoding and decoding: 

decoded: [u"'Chl_A (\xb5g/L)'"] (which is actually the same as not using encode+decode, which makes sense to me, I don't know why it doesn't print the u for you, but that might be Python3 depending)
encoded: ["'Chl_A (\xc2\xb5g/L)'"]

But when I read the field name again for using it for my search cursor I get for 

decoded: field name: 'Chl_A (µg/L)'

encoded: field name: 'Chl_A (µg/L)'

Somehow I am stuck between this encoding and decoding. It seems like the python script automatically interpretes the hex codes...

0 Kudos
DanPatterson_Retired
MVP Emeritus

I think it is the getparameterastext thing

gpat = "Chl_A (\xb5g/L);Temperature (\xb0C)" # GetParameterAsText ??????

[i.encode().decode() for i in gpat.split(";")]

['Chl_A (µg/L)', 'Temperature (°C)']

But I don't know since everything reads fine in Python 3 since "string" is unicode

0 Kudos
RandyBurton
MVP Alum

Can you supply a small test file?

0 Kudos
Nicole_Ueberschär
Esri Regular Contributor

Attached is a zip file that contains a toolbox with a script tool, the corresponding script and a test file. 

When you run the script tool it might throw the bad descriptor error again (as reported in my other posting) but you can just click ok and run the script. Sometimes you have to press ok twice. 

Select as directory the directory where you saved the txt file (propably the main folder of the zip). I left you three parameters with a couple of values, two with special characters and one simple one. 

In the script at lines 105 and 106  I write min and max values to a directory where I would like to have proper (readable) names, that's why I am using an extra variable (field_text, list  is listfields_text) here. I need these values again from line 130 where I am creating a graph from the table values. Here I will need to loop through the field names again but cannot figure out how to reference them in a way it would be recognized in the table as column. In line 70 I found a work around which works as long as the parameters with special characters are limited (normally there are two more in the table) but it is not very elegant. I could do the same with the loop for the graphs but I feel like there must be a better way of dealing with these characters.  

Thanks for looking into it!

0 Kudos
RandyBurton
MVP Alum

Thanks for attaching the files.  From that I noticed, you wanted the option to select multiple fields.  I modified the validator code in your other post for this option.

I believe that the encoding issues arise because the tool interface appears to pass values that contain special characters in single quotes. From the tool printout, note line 7 below. The two fields with special characters are enclosed within single quotes and inside double quotes.

Executing: encodingtest C:\path\to\folder\encoding 'Chl_A (µg/L)';'Temperature (°C)';pH
Start Time: Sun Feb 24 19:55:53 2019
Running script encodingtest...

Table view field names: ['Original_file_&_sheet', 'Campaign', 'Profile_No', 'Date', 'Longitude (degrees_east)', 'Latitude (degrees_north)', 'Depth (m)', 'Temperature (\xc2\xb0C)', 'pH', 'Chl_A (\xc2\xb5g/L)', 'UID', 'UIDGraph']

fieldnamesliste from parameter: ["'Chl_A (\xc2\xb5g/L)'", "'Temperature (\xc2\xb0C)'", 'pH']‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

By trimming the quotes, I was able to get your script to work.  Here's the section I modified; it starts around line 142 in the original code.  The .decode("utf-8") in line 4 below may be the result of some of the encoding/decoding you were doing earlier in the script.  I'm not sure what of that code can safely be removed.

def PrintPlot(gs, gsNr, table, fieldname, min, max, xlabelname, colorL):
    if fieldname[0] == "'":
        arcpy.AddMessage("apostrophe found")
        fieldname = fieldname[1:-1].decode("utf-8")
        xlabelname = fieldname
    fields = [fieldname, "Depth (m)"]‍‍‍‍‍‍‍‍‍‍‍‍

Hope this helps.

Nicole_Ueberschär
Esri Regular Contributor

Awesome, thanks a lot, Randy! Works great!

0 Kudos