AnsweredAssumed Answered

Problem importing CSV files into ArcGIS Desktop

Question asked by geonetadmin on Dec 1, 2011
Latest reply on Dec 1, 2011 by BrianAltheim
Original User: BrianAltheim

tl;dr:  ArcGIS desktop handles CSV imports inconsistantly, and has no way to manually assign field types for imported data


I have a pretty specific and somewhat inconsistent problem that comes up when I'm importing CSV data into ArcGIS Desktop.  The data that I'm importing has only four fields (columns): Line, Shot, Latitude, Longitude.  This data is seismic shotpoint data that is in the public domain.

The workflow I use is as follows:

1)  Download the data from a third party source (Basin database) as an excel document.
2)  Save the excel file as a CSV file (I'm old school).
3)  Add the CSV data into ArcGIS Desktop.
4)  Display XY data by right-clicking on the data entry in the Table of Contents.
5)  Right-click on the displayed data (in the ToC) and choose Data -> Export Data to save the points as a Shapefile.
6)  From here, I use the "Points To Line" tool from the Spatial Ecology toolset to convert the point shapefile into a line shapefile, using the Line field to group the points together.

This procedure works fine, as long as the file I'm using doesn't have any non-numeric characters in it.  Unfortunately, some of the datasets have seismic lines with non-numeric characters.  This is where things get weird.  I noticed pretty quickly that some of the datasets would result in Line field entries with <null> data in them.  It turned out that the import action was autodetecting the format of the entries in each column of the CSV, and assigning field properties based on the first few entries found within each field.  If the first few lines of a dataset were 100% numeric characters, the field would be assigned as a Double field.  If the first few lines of the Line field had a mix of characters, it would be assigned as a String field.  This is done in the background.  I can find no way of controlling how the Field Type gets assigned.  If there is a mix of purely numeric and non-numeric characters in the Line field (such as a line '1' followed by a line '1A'), the importer will automatically assign that field type as Double and enter the non-numeric entries as <null> in the Line Field.  Which leads to problems if there are a number of non-numeric lines in a given dataset.

I was able to fool the importer for some files by rearranging the line orders within the CSV file to put non-numeric lines at the top of the file.  This trick didn't work for one of the datasets, presumably because there weren't enough non-numeric line entries, and I'm uncomfortable calling it a 'solution' because it still leads to inconsistancy between these similar datasets, as some datasets will have Double fields and others with have String fields for what is essentially identical information.  This is where things get weirder!  Attached are two CSV files that I tried this trick of moving the non-numeric lines to the top of the file.  The first file, which has 141 non-numeric entries and ~500 numeric entries (8624-T7-2E.xlsx) worked for me...the Line field was assigned as a String type.  The second file (8624-H7-1E.xlsx), which has 28 non-numeric entries and ~2750 numeric entries, did not work...the Line field was assigned as a Double type.  So whatever stage is autodetecting the Field Type is obviously going deeper than just looking at the first line of data.

Is there some other way of importing this sort of data that allows the user to manually assign Field types?  This would be the ideal solution.  An explanation of how Field types are automatically assigned might aid me in fooling ArcGIS in the future, tthough, if manually assigning Field types is impossible.  I do not have a licence for the Data Interoperability toolset.

Note:  I've uploaded the files as .xlsx files because I'm not allowed to upload the .csv versions.

Attachments

Outcomes