Are there secret tags in CSV files?

1952
2
Jump to solution
01-15-2017 10:22 AM
MaribethPrice
Occasional Contributor

I would love an Esri programmer (or anyone for that matter) to answer this for me.  I encountered some unusual behavior in an Excel table named ORprecipnormals (97-2003 .xls format) containing precipitation normals for climate stations. ArcMap versions 10.2 through 10.5 read it fine; Pro insists on reading the numeric values as text fields, rendering them useless. I've always known that the Excel interpreter can have some gremlins in it. I tried updating the Excel table to .xlsx format--still got text.  I tried saving it as a CSV file--still got text.

In desperation, I copied the data in Excel and pasted them (as values only) into a new spreadsheet (named simply precipnormals), saved the spreadsheet as .xlsx, and also saved it as a CSV file.  Pro now correctly interprets both  spreadsheet and CSV files to contain numbers (Double). 

That the new spreadsheet works doesn't surprise me; there must have been something funky in the original spreadsheet. But I am mystified why one CSV file gets interpreted correctly and one does not.  Side by side in Notepad, they appear identical. Isn't that the whole point of a CSV file?  They are simple ASCII and direct and transportable.  Yet it appears that Excel dumped some invisible tag into the first CSV, and Pro is heeding it to interpret the values as text.  Why oh why would this be? Note that ArcMap does NOT heed those invisible tags; it interprets any of the four files correctly.

The attachments include the two CSV files shown together in Notepad, the Fields view showing how they are interpreted differently in Pro, and finally, the files themselves.

I await enlightenment.

Meanwhile, for those with more pressing practical concerns, if Pro is making hash of your spreadsheets, try the copy/paste-special-values trick.

Maribeth

0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

There aren't any secret tags, but there isn't a standard either, so there are lots of different implementations.  Looking at the two files, they are not identical, not even visually/textually.

What is noticeable in ORprecipnormals.csv is that every numeric is exactly 2 decimal places, even when the number ends in a zero, i.e., 18.40 instead of 18.4.  In and of itself, that is not causing the issue you are seeing, but it does make me believe that the Excel table had number formatting turned on when the CSV was exported.

When number formatting is turned on, it has been my experience that Excel treats formatted "numbers" as text when exporting to certain formats, CSV included.  In this case, Excel has added an extra line to the end of the file.  If you open up and look at ORprecipnormals.csv  in a text editor, you will notice the last line doesn't contain actual data, but a series of underscores separated by commas.  It is this last row that is being used to convey formatting information back to Excel, and also Pro, when opening up CSV files.

If you open ORprecipnormals.csv in a text editor and remove the last line, it will load the "numbers" as numbers.  But, and this is a big but, the overly aggressive caching mechanism of Pro will continue to see those numbers as text even if you remove and add the file back into the application.  If you remove the last line of ORprecipnormals.csv  and rename the file and then load it, you will see it load as numbers.

Personally, I consider the caching issue/behavior I described above as a bug, I just haven't gotten around to chasing it with Esri Support.

View solution in original post

2 Replies
JoshuaBixby
MVP Esteemed Contributor

There aren't any secret tags, but there isn't a standard either, so there are lots of different implementations.  Looking at the two files, they are not identical, not even visually/textually.

What is noticeable in ORprecipnormals.csv is that every numeric is exactly 2 decimal places, even when the number ends in a zero, i.e., 18.40 instead of 18.4.  In and of itself, that is not causing the issue you are seeing, but it does make me believe that the Excel table had number formatting turned on when the CSV was exported.

When number formatting is turned on, it has been my experience that Excel treats formatted "numbers" as text when exporting to certain formats, CSV included.  In this case, Excel has added an extra line to the end of the file.  If you open up and look at ORprecipnormals.csv  in a text editor, you will notice the last line doesn't contain actual data, but a series of underscores separated by commas.  It is this last row that is being used to convey formatting information back to Excel, and also Pro, when opening up CSV files.

If you open ORprecipnormals.csv in a text editor and remove the last line, it will load the "numbers" as numbers.  But, and this is a big but, the overly aggressive caching mechanism of Pro will continue to see those numbers as text even if you remove and add the file back into the application.  If you remove the last line of ORprecipnormals.csv  and rename the file and then load it, you will see it load as numbers.

Personally, I consider the caching issue/behavior I described above as a bug, I just haven't gotten around to chasing it with Esri Support.

MaribethPrice
Occasional Contributor

Thanks.  I still can't see the difference in the files except that last line, but that would certainly cause the problem I'm seeing.  Always helps to get someone else's eyes on your data!

0 Kudos