End-of-line (EOL) Problem

NickJacob · ‎02-15-2012

Came across a weird EOL Error with Arcpy's CalculateField_management() function today and I was wondering if anyone out there could point me towards a work around?

I previewed one of my tables in Catalog and at first glance the offending cells seemed fine. However, when I copy/pasted one over to a text file a lot more text became visible. My current theory is that the Enter key is being hit when my customers edit cells in MS Excel (my typical source data) and invisible newline characters are being carried into my geodatabase tables. Tricky thing is I can't see them in Excel or Arc, so I'm not sure how to strip or replace the backslashes via Python.

Any advice would be much appreciated,
- Nick

curtvprice · ‎08-06-2014

This method seems a little dangerous as I can envision where the string "0a" could occur in your hex string by accident, for example the two byte hex "b0 ac" represented this way would be an invalid match.

Here's a function implementation of your approach, not using hex codes. You could paste this function into the ArcMap python window and use it there. Note I'm using chr(10) and chr(13) for "\n" and "\r" so this function can also be used inside the Calculate Value tool in modelbuilder... as the usual use of "\n" breaks the geoprocessing messaging string representation in the code...

I'm also using arcpy.da.UpdateCursor because it is very fast compared to the 10.0 flavor.... and the "with" construct helps you out by closing the cursor even if the code fails -- avoiding the possibility of a nasty hanging file lock!

import arcpy
def strip_newlines(tbl, field, eolchar=""):
  with arcpy.da.UpdateCursor(tbl, [field]) as rows:
    for row in rows:  
      row[0] = row[0].replace(chr(10), eolchar).replace(chr(13), eolchar) 
      rows.updateRow(row)

View solution in original post

markdenil · ‎02-15-2012

Try dumping the table to a text file, if you think there is stuff going on in there you cannot see.

There is an 're' module in python that handles regular expressions. That is the tool set you want for weeding out pesky newlines ('\n').

It may be easier to weed them in the table or in the text dump (which could then be re-imported to a table).

NickJacob · ‎02-16-2012

Try dumping the table to a text file, if you think there is stuff going on in there you cannot see.

There is an 're' module in python that handles regular expressions. That is the tool set you want for weeding out pesky newlines ('\n').

It may be easier to weed them in the table or in the text dump (which could then be re-imported to a table).

Thanks for the quick response! I copied a cell over to Notepad++ and it turns out each line is proceeded with a carriage return and line feed (CR LF). Ordinarily, I would just clean-up data by hand, but in this case I need a script to filter this sort of thing. Have you ever seen this done via field calculator by chance? I've been experimenting with regular expressions, but no luck so far. I keep getting "EOL while scanning string literal" related errors. One of my many attempts below:

#ESRI Codeblock
codeblock="""def trimNewline(val):
    import re
    newVal = re.sub('(?m)[\r\n]',"",val)
    return newVal"""

#Expression parameter
expression = "trimNewline(str(!FIELD!))

# CalculateField_management(in_table,field,expression,{expression_type},{code_block})
arcpy.CalculateField_management(mytable,"FIELDNAME",expression,"PYTHON",codeblock)

curtvprice · ‎02-16-2012

There's a nice python string method that trims whitespace around strings:

>>> x
'\r\nhere is my real text\n'
>>> x.strip()
'here is my real text'

So one approach you could use would be:

arcpy.CalculateField_management(mytable,"FIELDNAME","!FIELDNAME!.strip()","PYTHON")

One could also use VBScript:

arcpy.CalculateField_management(mytable,"FIELDNAME","Trim([FIELDNAME]")

NickJacob · ‎02-17-2012

-------
Update
-------

Just a quick update to anyone that may come across this post down the road (and big thank you to mdenil and curtvprice). The short-term conclusion I've come to is that it's not possible to feed hidden newline characters ('\n') into the Field Calculator. Also, because carriage returns and newlines are special, some functions can't access them like I had anticipated, such as strip().

Workaround was to convert the problem strings to hexadecimal and swap out values with an Update Cursor. See example below:

rows = arcpy.UpdateCursor(fc)
for row in rows:
    if len(row.NAME) >= 255:
        hexString = str(row.NAME).encode("hex")
        if "0a" in hexString: # "0a" is hex equivalent of '\n'
            hexString = hexString.replace("0a","")
    row.NAME = hexString.decode("hex")
    rows.updateRow(row)

Definitely not ideal, but seems reliable so far. Another interesting thing is that my problem carriage returns magically disappeared somewhere earlier in my script, when converting an Excel worksheet to an in-memory feature class. This might suggest another arcpy function stripped them out, but not sure which one.

ZacharyOrdo__GISP · ‎08-06-2014

This is the solution that worked for me, though I removed the len(row.NAME) >= 255 requirement.

Here's the working example:


rows = arcpy.UpdateCursor("Assets\Welds")


for row in rows:


 hexString = str(row.REMARKS).encode("hex")


 if "0a" in hexString:


 hexString = hexString.replace("0a","")


 row.REMARKS = hexString.decode("hex")


     rows.updateRow(row)

Replace "Assets\Welds" with the appropriate fc name and replace row.REMARKS with row.(insert field name here)

You may or may not need to run:

import arcpy
import string

I just wanted to make this idiotproof because I struggled for a bit (I copied and pasted your example expecting it to work). I'm still somewhat new to (and still learning) Python so I'm sure I'm not the only one who will forget to change the appropriate variables to their ArcGIS data.

curtvprice · ‎08-06-2014

This method seems a little dangerous as I can envision where the string "0a" could occur in your hex string by accident, for example the two byte hex "b0 ac" represented this way would be an invalid match.

Here's a function implementation of your approach, not using hex codes. You could paste this function into the ArcMap python window and use it there. Note I'm using chr(10) and chr(13) for "\n" and "\r" so this function can also be used inside the Calculate Value tool in modelbuilder... as the usual use of "\n" breaks the geoprocessing messaging string representation in the code...

I'm also using arcpy.da.UpdateCursor because it is very fast compared to the 10.0 flavor.... and the "with" construct helps you out by closing the cursor even if the code fails -- avoiding the possibility of a nasty hanging file lock!

import arcpy
def strip_newlines(tbl, field, eolchar=""):
  with arcpy.da.UpdateCursor(tbl, [field]) as rows:
    for row in rows:  
      row[0] = row[0].replace(chr(10), eolchar).replace(chr(13), eolchar) 
      rows.updateRow(row)

ZacharyOrdo__GISP · ‎11-06-2014

This worked for me in 10.2. For some reason the old methods that worked in 10.0 no longer worked in 10.2 so I'm glad I found your solution Curtis. Thanks for sharing!

Waan · ‎07-27-2012

This is a useful thread. I've run in to related problems in how to find and correct them (VB vs Python). When performing Python field calculations on fields containing \n, \x, etc, I get the same errors as Nick described--but I don't get them when using VB (because \n et al aren't special in VB?). So a few points:

1) I didn't even realize string fields could support carriage returns! Apparently this is new as of 9.2? For example, my address string field can contain:

568 N Courier Ct
and I can type more here
and here if I use "ctrl+enter"
or if the incoming table
had \n or \r for line breaks

... but when I visually inspect the cell, all I see is "568 N Courier Ct" since it's on the first line. It is not visually apprent without starting an edit session, highlighting text, and dragging and/or arrowing. This is a real hazard when importing from text files or Excel; this could really drive somebody nuts when geocoding imported addresses.

2) Besides \n, I've also run in to problems with \x in the fields as well. I guess any of the reserved characters would be problematic. ArcGIS SQL is a little helpful for spotting these, in that

SELECT * FROM street_features WHERE
"street" LIKE '%
%'

finds all instances of carriage returns in the "street" field ... but I have no idea how to write a similar query to find instances of \x, \b, etc.

3) Nick's hexidecimal conversion seems to work for \n but, again, how can we apply it to other special characters? Or is it just easier to use VB instead of Python for field calculations?

NickJacob · ‎07-28-2012

Cool idea using SQL to search for carriage returns. Later on, I came to the conclusion that I shouldn't bother stripping out the carriage returns, or any of the other random problems with the data for that matter. At some point I needed to shift the responsibility of data maintenance over to the end user of whatever model or script I was writing. So, in the end I think I included some exception handling to specifically watch for them.

As for the never ending programming language of preference debate, does it really matter? Who honestly writes more than a handful of scripts or models per year anyways?