ArcMap can't read string in TXT file

YichiLiu1 · ‎12-13-2013

Hi,

I have a bunch of txt files that I need to convert to shapefiles. I first wrote a simple script to format the data as comma delimited. But when I opened the converted files in ArcMap, some of characters didn't show up. For example, "COUSOU" would show up as <NULL>, "88F6" would show as "886". But, in some files, they worked fun. Any one knows why and how to deal with it?

Here is my code for converting file. Any though would help me a lot. Thank you very much!

inFile = open (arcpy.GetParameterAsText(0),'r') outFile = open (arcpy.GetParameterAsText(1),'w')  buffer = []  for line in inFile::     line = line.strip()     buffer = line.split(' ')     buffer = filter (None, buffer)     outFile.writelines(",".join(buffer))     outFile.writelines("".join('\n'))  inFile.close() outFile.close()

WilliamCraft · ‎12-14-2013

Ok, give this a try in a 10.1 environment. You can run it through Python IDLE or you can double click it to run it through the Windows Command Prompt. Either way, there will be two user inputs it asks for: the first is the full path and file name for the input TXT file; the second is the full path to the directory at which the output files should be written.

[ATTACH=CONFIG]29889[/ATTACH]

Once the script completes, you will see that I decided to actually retain a CSV in addition to the Shapefile you asked for. I did this for a bit of QA so you can make sure what got parsed from the input file matches the Shapefile. Now, here's the thing... your input source files must all be in the same format in terms of field name sequencing. If they differ, then your field names in the Shapefile may not match what the values really represent in the input file. Also, I've not set a spatial reference on your data for the Add XY Event Layer GP tool in the script. You may need to do that for overlay accuracy purposes. Lastly, you're welcome to modify this to create FGDB feature classes rather than Shapefiles. Using a Shapefile means that your field names are limited in character length, so they may look a bit funny.

Let me know what you think! The only thing I'll ask for is, if you are happy with the script and my efforts, please mark this as the correct answer for the thread and then vote up other helpful responses in the thread as well.

Anonymous User · ‎12-14-2013

William had some very good points in his feedback. As he suggested, you will want to add a spatial reference for your output files and with the lengthy field names, gdb feature classes are the way to go (shapefiles have a 10 character limit). Here is my attempt at it:

import arcpy, os, time, glob
arcpy.env.overwriteOutput = True

arcpy.env.workspace = ws = arcpy.GetParameterAsText(0)
SR = arcpy.GetParameterAsText(1)
gdb = str(arcpy.CreateFileGDB_management(ws, 'VegetationAnalysis.gdb').getOutput(0))
csv_fold = os.path.join(ws, 'CSV')
if not os.path.exists(csv_fold):
    os.makedirs(csv_fold)

for txt in glob.glob(os.path.join(ws, '*.txt')):
    with open(txt) as rd:
        lines = [line for line in rd.readlines() if ' NG' in line]
    arcpy.AddMessage("Creating CSV file for %s...."  %os.path.basename(txt))
    outFile = os.path.join(csv_fold, os.path.basename(txt.replace('.txt','.csv')))
    with open(outFile, 'w') as csv:            
        csv.write("X,Y,Z,Station,AboveGround,AboveGroundOK,BackWeatherStrCase1,BackWeatherStrCase2,VertorRadialMargin,HorzMargin,MinDistToWireGrow,OK,RadialMargin,MinDistToWireFall,DistToWire,OK\n")
        csv.write('\n'.join(",".join(line.strip().split()) for line in lines))
    arcpy.AddMessage("Generating Feature Class...")

    arcpy.MakeXYEventLayer_management(outFile,'X','Y','XY_event_layer',SR,'Z')
    arcpy.CopyFeatures_management('XY_event_layer',os.path.join(gdb, 'SurveyPoints_%s' %os.path.basename(txt.split('.')[0])))
arcpy.AddMessage("Complete.  Files located at " + gdb)

Also a few notes:
1. built in keywords should not be used for variable names (i.e. buffer) even though it doesn't hurt anything in this situation
2. the filter() function was not necessary since we were already splitting by spaces and stripping out any whitespace (shouldn't be any None data types after that)

I have attached a 10.0 toolbox. Mine is designed to supply a folder that contains all the text files and it will iterate through all those and create a csv and feature class for each txt file.

However, if you end up using my code, you should still mark William's answer as the answer and not mine as he has done a lot of work on this and I just borrowed some of his code.

YichiLiu1 · ‎12-15-2013

Thanks for your help! However, both methods didn't seem to solve my problem. The result is still not right. The image below showed what I got, 'COUSOU' became <NULL> and '88F4' became '884'. Do your guys have any idea why did that happen?[ATTACH=CONFIG]29892[/ATTACH]

I am actually thinking about using cursor to create a shapefile/feature class directly. But some of the files might have over 3 millions or 4 millions data entries. I don't know if the cursor will be able to handle that. Any one knows?

Thank you!

WilliamCraft · ‎12-15-2013

For whatever reason, the field containing the altered values looks like may be getting created as numeric. I'll take another look at my output to see if I'm seeing similar behavior in the data type for the field, but when I checked the results from converting the example.txt you provided all of the values were outputted correctly. That is, I didn't see any NULL of truncated values as you're seeing.

YichiLiu1 · ‎12-15-2013

I used this file to test.

YichiLiu1 · ‎12-15-2013

The cvs file is outputted correctly. It's the conversion from cvs to shapefile.

WilliamCraft · ‎12-15-2013

Well, I've gone back and re-run the script I wrote using both the example.txt and the example2.txt files as input. Not only are my CSV files correct, but so are my output shapefiles. Below is a screenshot of what I am getting:

[ATTACH=CONFIG]29895[/ATTACH]

As you can see in my output, I don't get any NULL or truncated values. I have, however, highlighted the differences that I can see between what I'm getting and what you're getting. First, you have an OBJECTID field and I have an FID field. My guess is that you may have adopted Caleb's code or some modification of it; but I don't think this difference is relevant in the context of the issue you're describing with the output. Nonetheless, it's a difference so I noted it. Secondly, and probably more importantly, the field that seems to be giving you trouble is left justified in my output and right justified in yours. That led me to check the field type; in my output, the field was created as a string. Can you check yours to see if it's getting created as numeric (i.e., Long Integer, Double)? Based on the fact that COUSOU becomes NULL and 88F4 becomes 884 makes me think that the letters aren't recognized as part of the value because the field is not getting created as a string like it should be for you.

So, what's different in our environments? Our script is basically the same. our inputs are the same. Yet our results are different. I'm using 10.1 SP1 build 3143 with Python 2.7... can you check the exact build number of ArcGIS and version number of Python on your machine? For ArcMap, you can use the About ArcMap from the Help menu or run Patchfinder.

EDIT: One last difference in our outputs (yet minor, I think) is that my output is M aware in addition to being Z aware. Yours seems to be only Z aware.

YichiLiu1 · ‎12-15-2013

I think the version of ArcGIS made the difference. I was using ArcGIS 10.0 and if I ran same file in ArcGIS 10.1, it didn't seem to have that problem. My field type came out as double in 10.0. I checked and figured out that only strings with 'F' in them have this problem. I can run the data in computers with ArcGIS 10.1 for now. However, I still want to know why 10.0 does this. Just curious!

And again, thank you very much for your help!

T__WayneWhitley · ‎01-14-2014

I think the version of ArcGIS made the difference. I was using ArcGIS 10.0 and if I ran same file in ArcGIS 10.1, it didn't seem to have that problem. My field type came out as double in 10.0. I checked and figured out that only strings with 'F' in them have this problem. I can run the data in computers with ArcGIS 10.1 for now. However, I still want to know why 10.0 does this. Just curious!

And again, thank you very much for your help!

Apologize for the delayed response, but if you're still waiting for an answer as to this ArcGIS 10.0 behavior and the workaround, I can fill you in...

The source csv text files (part of the output of the script) were written to correctly - the error occurred in how ArcGIS read that 'intermediate' csv output to create the final gdb fc output. A simple entry in an ini file can 'override' registry settings that ArcGIS may use in reading these files. On my 10.0 machine, I duplicated your error, then modified the companion ini file (that resides in the same dir), reran the MakeXYEvent and CopyFeature tool processes and that corrected the error...this was the ini entry for the test file I ran (example.csv):

[example.csv]
Format=CSVDelimited
ColNameHeader=True
MaxScanRows=1
Col7=BackWeatherStrCase1 Char

Note the 'Col7=BackWeatherStrCase1 Char' line -- this means read that field (corresponding to column 7) as character. A field width number is optional after 'Char'. Of course, it would be wise to add the formatting specs to the ini for the other fields, but for this test I was only interested in overriding this behavior:

IDLE 2.6.5      ==== No Subprocess ====
>>> outFile = r'C:\mapdocs\temp\textINI\CSV\example.csv'

>>> arcpy.MakeXYEventLayer_management(outFile,'X','Y','XY_event_layer')
<Result 'XY_event_layer'>

>>> arcpy.CopyFeatures_management('XY_event_layer', r'C:\mapdocs\temp\textINI\VegetationAnalysis.gdb\test1')
<Result 'C:\\mapdocs\\temp\\textINI\\VegetationAnalysis.gdb\\test1'>

>>> # Let's test read the file gdb results:
>>> rows = arcpy.SearchCursor(r'C:\mapdocs\temp\textINI\VegetationAnalysis.gdb\test1')

>>> # ...get the 1st record
>>> row = rows.next()

>>> # ...get a field list to call by name with getValue:
>>> fields = arcpy.ListFields(r'C:\mapdocs\temp\textINI\VegetationAnalysis.gdb\test1')

>>> # For the 1st row, print the field values read via the cursor:
>>> for field in fields:
               print field.name, '...', row.getValue(field.name)
 
              
OBJECTID ... 1
Shape ... <geoprocessing describe geometry object object at 0x096D9050>
X ... 751274.78
Y ... 1484273.55
Z ... 20.82
Station ... 270.52
AboveGround ... 16.97
AboveGroundOK ... OK
BackWeatherStrCase1 ... None  # ah, this is the Null value, of course incorrect
BackWeatherStrCase2 ... WCD-0in-0psf-100degC-WCD
VertorRadialMargin ... -0.01
HorzMargin ... 0.0
MinDistToWireGrow ... 9.99
OK ... NG
RadialMargin ... 0.0
MinDistToWireFall ... 0.0
DistToWire ... 9.99
OK_1 ...
>>>  

>>> row.Station
270.51999999999998

>>> # what is the BackWeatherStrCase1 88F4 value read as?
>>> while row:
               if row.Station > 300.0:
                              print row.BackWeatherStrCase1
                              break
               row = rows.next()
 
              
884.0  # clearly wrong!
>>>

Then, after the ini file 'fix', the result (for this file result, this field only) tests out okay:

>>> print outFile
C:\mapdocs\temp\textINI\CSV\example.csv

>>> arcpy.MakeXYEventLayer_management(outFile,'X','Y','XY_event_layer')
<Result 'XY_event_layer'>

>>> arcpy.CopyFeatures_management('XY_event_layer', r'C:\mapdocs\temp\textINI\VegetationAnalysis.gdb\test1')
<Result 'C:\\mapdocs\\temp\\textINI\\VegetationAnalysis.gdb\\test1'>

>>> rows = arcpy.SearchCursor(r'C:\mapdocs\temp\textINI\VegetationAnalysis.gdb\test1')

>>> # this time, just printing all the vals in the BackWeatherStrCase1 field:
>>> for row in rows:
               print row.BackWeatherStrCase1
 
              
COUSOU
COUSOU
COUSOU
COUSOU
COUSOU
88F4
88F4
88F4
88F6
88F10
88F10
88F10
88F10
88F10
88F15
88F15
88F15
88F15
88F15
88F15
88F15
88F15
88F15
88F15
88F15
88F16
88F16
88F16
>>>  # read error corrected.

The info contained here was vital:
http://msdn.microsoft.com/en-us/library/ms709353(VS.85).aspx

There was a hint about schema.ini file usage here in this doc excerpt at the Make XY Event Layer web help:

"The standard delimiter for tabular text files with extensions .csv or .txt is a comma, and for files with a .tab extension, a tab. To use an input table with a nonstandard delimiter, you must first specify the correct delimiter used in the table using a schema.ini file."
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00170000006z000000

...again here:

"ArcGIS uses the Microsoft OLE DB provider for Open Database Communication (ODBC) drivers and the Microsoft ODBC Text Driver for text files to access tabular data in text files. The driver stores data description (schema) information about each text file in a file named schema.ini so the data can be accessed properly. This file refers only to the text data files in the directory in which it resides."
http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//005s00000010000000

Hope that helps...
Wayne