CSV data source documentation

2647
6
07-17-2018 11:02 AM
TimMinter
Occasional Contributor III

ArcGIS Pro (and ArcMap before it) would make some assumptions about the column values in a CSV file. Too often in my experience, the assumptions were just plain wrong, causing speed bumps, gnashing of teeth, and general furrowing of brows. It turns out that Esri's customers who use ArcGIS Pro can remedy the assumptions being made by using a schema.ini file to tell ArcGIS Pro how to perceive the column values. It also turns out that as of 7/17/2018, it's really hard to find this information, which can lead to time spent interacting with the Esri support teams. So, I'm sharing some info here, where I might be able to find it when I need it again.

Background

  1. This becomes a problem when you need to import text values that are composed of:
    • numerals (0009198045), and keep the leading zeros
    • numerals and dots (1.0), and need the resulting value to specify a release version in text format instead of "1"
    • and probably some other cases I don't remember
  2. GP tools and other functions don't seem to expose any control over how ArcGIS Pro should perceive column values. For example, in the Append GP tool, specify a CSV file as input, click on a field name in the "Field Map" area of the dialog, click "Source", and hover over the field name. ArcGIS Pro will show you a "tooltip" (or something) that tells you what type of value it has assumed is in the field.
  3. There is very little info about this situation in the ArcGIS Pro Help documentation. Esri support referred me to this, which appears to be related: http://resources.arcgis.com/en/help/main/10.1/index.html#//005s00000010000000. I just didn't think to look there for help with ArcGIS Pro. I probably need to up my game .
  4. Schema.ini info - (random search result) Schema.ini File (Text File Driver) | Microsoft Docs 
  5. Edit:  Esri Support Services reports that Esri does not document how to import CSV files in the ArcGIS Pro help because Esri's view is that the issue is not with the tool, rather "the issue is with the CSV" (maybe they mean that the issue is in how they have chosen to design and develop their tools to work with CSV files... maybe not).
    • I don't buy this proposed notion because from the user perspective, Esri's tools read the correct value and write an incorrect value.  As a user, I don't care if the developers use a stick to scratch in the sand to make their function work correctly or if they use the Microsoft ODBC Text File driver.
    • Maybe my idea posting will get some traction, or maybe this is an edge case that no one cares about.  https://community.esri.com/ideas/15278 


Workaround 

- from an Esri support request for ArcMap, tested and adjusted for ArcGIS Pro

  1. Add the csv file to the Contents of an ArcGIS Pro map.
  2. Export the csv file to a new folder outputting the same file name as the input. ArcGIS Pro will write out the CSV file and a "schema.ini" file.
  3. Using Windows Explorer go to the output location and copy the schema.ini file that gets created.
  4. Paste it into the original location (original folder).
  5. Right-click and edit the schema.ini file to specify the correct field type. Note that the header specifies the target CSV file name.
    • Edit:  Note that you'll need to make the column number indicators and specified fields line up with your source CSV.  During step 2 above, ArcGIS Pro adds its OBJECTID field and writes that specification to the schema.ini table.  If your CSV file does not have an OBJECTID field, then things may not go well when you use it, unless you fix it.
  6. Now, when you use the CSV file with ArcGIS Pro, it should (hopefully - haven't tested all cases) notice the schema.ini file and use it to override the assumptions it has made about your input.

ArcGIS Ideas

  1. Expose the ability for the user to override ArcGIS Pro assumptions and specify CSV file input field definitions in all ArcGIS Pro tools and functions that accept CSV as input
  2. Document the capability and/or the workaround in the ArcGIS Pro Help instead of causing support requests and Geonet posts.
  3. Edit:  

Alright, enough of that...  back to work!

tim

6 Replies
DanPatterson_Retired
MVP Esteemed Contributor

Tim... could you post a few rows of a sample csv that you are dealing with or having trouble with.

There are more refined tools for reading csv files that perhaps could be implemented and I am always looking for things to work on (ie. see my Table Tools for Pro... )

0 Kudos
TimMinter
Occasional Contributor III

Hi Dan - I cannot post rows from an actual CSV, but here are rows that can be used to demonstrate the user experience I describe:

RowID,PKID,ReleaseVersion
1,0009198045,1.0
2,0009198046,1.1
3,0009198047,1.2
4,0009198048,1.3
5,0009198049,1.4
6,0009198050,1.5
7,0009198051,1.6
8,0009198052,1.7
9,0009198053,1.8
10,0009198054,1.9
11,0009198055,1.10
12,0009198056,1.11
13,0009198057,1.12
14,0009198058,1.13
15,0009198059,1.14
16,0009198060,1.15
17,0009198061,1.16
18,0009198062,1.17
19,0009198063,1.18
20,0009198064,1.19

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Tim  with your setup are you treating the 2nd column as a text/string format? and the 3rd as a float or text/string.

There are several ways of structuring and reading the format.

The first column is just read as an integer while the 2nd and 3rd are read as strings (Unicode).

Even though the last two columns are read as strings, they can be 'viewed' and worked with as though they were numbers.

Basically, what sorts of things do you need to do with the data besides save and load text in a format that can be consistently read by you?

sample csv file read with one data type specification

a2
array([( 1, '0009198045', '1.0'), ( 2, '0009198046', '1.1'),
       ( 3, '0009198047', '1.2'), ( 4, '0009198048', '1.3'),
       ( 5, '0009198049', '1.4'), ( 6, '0009198050', '1.5'),
       ( 7, '0009198051', '1.6'), ( 8, '0009198052', '1.7'),
       ( 9, '0009198053', '1.8'), (10, '0009198054', '1.9'),
       (11, '0009198055', '1.10'), (12, '0009198056', '1.11'),
       (13, '0009198057', '1.12'), (14, '0009198058', '1.13'),
       (15, '0009198059', '1.14'), (16, '0009198060', '1.15'),
       (17, '0009198061', '1.16'), (18, '0009198062', '1.17'),
       (19, '0009198063', '1.18'), (20, '0009198064', '1.19')],
      dtype=[('RowID', '<i4'), ('PKID', '<U10'), ('ReleaseVersion', '<U5')])

0 Kudos
TimMinter
Occasional Contributor III

Hi Dan,

I essentially need control over the data types that Esri's tools write to their geodatabase structure after reading from a CSV file.  This control is not provided in Esri's tools or in their documentation.  The workaround suggested by technical services gets me where I'm going, albeit with a time-consuming detour.  So, I've posted the workaround as my own online documentation that I can find later when I need it again, and maybe help out others who discover they have the same need for control.

The two examples I've shared above are cases where Esri's tools make incorrect assumptions about the data types in the CSV and write different, unhelpful, and incorrect values into their geodatabase structure.

tim

DanPatterson_Retired
MVP Esteemed Contributor

Ok... I agree that even the csv readers, numpy and pandas can get quite complicated when trying to read csv files (commas and quotes in a text field for instance).

0 Kudos
BruceHarold
Esri Regular Contributor

Bouncing data through a CSV incarnation is a bugbear, like Dan I did a sample to help with schema handling:

http://pm.maps.arcgis.com/home/item.html?id=d887241f6908466a984c94631fd1974f 

Data Interoperability extension handles CSV schema files intelligently; remember with many CSV files in a folder the schema.ini file must be maintained to reference them all.