ArcGIS Pro (and ArcMap before it) would make some assumptions about the column values in a CSV file. Too often in my experience, the assumptions were just plain wrong, causing speed bumps, gnashing of teeth, and general furrowing of brows. It turns out that Esri's customers who use ArcGIS Pro can remedy the assumptions being made by using a schema.ini file to tell ArcGIS Pro how to perceive the column values. It also turns out that as of 7/17/2018, it's really hard to find this information, which can lead to time spent interacting with the Esri support teams. So, I'm sharing some info here, where I might be able to find it when I need it again.
Background
Workaround
- from an Esri support request for ArcMap, tested and adjusted for ArcGIS Pro
ArcGIS Ideas
Alright, enough of that... back to work!
tim
Tim... could you post a few rows of a sample csv that you are dealing with or having trouble with.
There are more refined tools for reading csv files that perhaps could be implemented and I am always looking for things to work on (ie. see my Table Tools for Pro... )
Hi Dan - I cannot post rows from an actual CSV, but here are rows that can be used to demonstrate the user experience I describe:
RowID,PKID,ReleaseVersion
1,0009198045,1.0
2,0009198046,1.1
3,0009198047,1.2
4,0009198048,1.3
5,0009198049,1.4
6,0009198050,1.5
7,0009198051,1.6
8,0009198052,1.7
9,0009198053,1.8
10,0009198054,1.9
11,0009198055,1.10
12,0009198056,1.11
13,0009198057,1.12
14,0009198058,1.13
15,0009198059,1.14
16,0009198060,1.15
17,0009198061,1.16
18,0009198062,1.17
19,0009198063,1.18
20,0009198064,1.19
Tim with your setup are you treating the 2nd column as a text/string format? and the 3rd as a float or text/string.
There are several ways of structuring and reading the format.
The first column is just read as an integer while the 2nd and 3rd are read as strings (Unicode).
Even though the last two columns are read as strings, they can be 'viewed' and worked with as though they were numbers.
Basically, what sorts of things do you need to do with the data besides save and load text in a format that can be consistently read by you?
sample csv file read with one data type specification
a2
array([( 1, '0009198045', '1.0'), ( 2, '0009198046', '1.1'),
( 3, '0009198047', '1.2'), ( 4, '0009198048', '1.3'),
( 5, '0009198049', '1.4'), ( 6, '0009198050', '1.5'),
( 7, '0009198051', '1.6'), ( 8, '0009198052', '1.7'),
( 9, '0009198053', '1.8'), (10, '0009198054', '1.9'),
(11, '0009198055', '1.10'), (12, '0009198056', '1.11'),
(13, '0009198057', '1.12'), (14, '0009198058', '1.13'),
(15, '0009198059', '1.14'), (16, '0009198060', '1.15'),
(17, '0009198061', '1.16'), (18, '0009198062', '1.17'),
(19, '0009198063', '1.18'), (20, '0009198064', '1.19')],
dtype=[('RowID', '<i4'), ('PKID', '<U10'), ('ReleaseVersion', '<U5')])
Hi Dan,
I essentially need control over the data types that Esri's tools write to their geodatabase structure after reading from a CSV file. This control is not provided in Esri's tools or in their documentation. The workaround suggested by technical services gets me where I'm going, albeit with a time-consuming detour. So, I've posted the workaround as my own online documentation that I can find later when I need it again, and maybe help out others who discover they have the same need for control.
The two examples I've shared above are cases where Esri's tools make incorrect assumptions about the data types in the CSV and write different, unhelpful, and incorrect values into their geodatabase structure.
tim
Ok... I agree that even the csv readers, numpy and pandas can get quite complicated when trying to read csv files (commas and quotes in a text field for instance).
Bouncing data through a CSV incarnation is a bugbear, like Dan I did a sample to help with schema handling:
http://pm.maps.arcgis.com/home/item.html?id=d887241f6908466a984c94631fd1974f
Data Interoperability extension handles CSV schema files intelligently; remember with many CSV files in a folder the schema.ini file must be maintained to reference them all.
I am trying to upload a table to my project which has a field that had the pad of 00s,
acct |
0020720000014 |
0021440000001 |
0021440000003 |
0021440000008 |
0021440000008 |
0021480000001 |
0021480000002 |
0021520000004 |
0021650000007 |
ArcGIS Pro keeps assigning a Double to acct. I tried to edit the schema.ini to Long, but nothing happens (checking in the fields view).
Can someone please help me understand what I am doing wrong?
I have tried to close and reopen the txt file after editing the schema making sure it has retained my edits.
I even tried to export by defining the field.
I need the zeros to join the data to another data set.
Any help would be greatly appreciated it - many thanks in advance
Thank you very much. That was very clear.