I am using the "Table to table" GP to input the following csv to gdb for genetic data.
the sequence of virus1 has length about 31000
The default text length in the tool is 8000 and I amended it to 50000
The import ran successfully and I added a field for finding the maximum sequence length.
However, when I looked in the attribute table.
The import string is truncated to length of 8001 which originally should be ~31000.
Question: How do I fix it?
I have attached the required testing file.
Thank for anyone that can help with this issue.
Solved! Go to Solution.
Looks like a known bug, with no solution.
Current Status: Not in Product Plan
However, in ArcGIS Pro, Copy-Paste from Excel to Attribute Table has worked for me.
Although, not the best way, but it just might work for you.
@Felix10546 if you are good with numpy, then you can just use
# -- set dtype to string rather than unicode
dt = np.dtype([('strain', 'S6'), ('sequence', 'S50000')])
ar2 = np.genfromtxt(f, dtype=dt, delimiter=",", names=True, autostrip=True, encoding='utf-8')
ar2.dtype
dtype([('strain', 'S6'), ('sequence', 'S50000')])
fc1 = r"C:\arcpro_npg\npg\Project_npg\tests.gdb\strain2"
NumPyArrayToTable(ar2, fc1)
If you use unicode strings it doubles your field widths, if you stick with string S, the field widths are retained
Here is the fields view of the data you posted.
BTW NumPyArrayToTable skips all the hoops to get a table if you are working with csv data
I have done some further testing on my original data. I found combination of the "Table to geodatabase" GP and data input type = xlsx works.
| Input file Type | Geoprocessing tool | Result |
1 | csv | Table to Table
| 418 records Truncated len = 8001 |
2 | csv | Table to Geodatabase
| 0 record |
3 | xslx | Table to Table | 418 records Truncated len = 8001 |
4 | xslx | Table to Geodatabase
| 418 records Len ~29000 preserved |
might be getting confused since your separator on the first line is comma, space and for the next 2 lines it is just comma. I checked things in numpy to confirm the structure of those two sequence lines.
import numpy as np
# -- array datatype created using your information and array info
dt = np.dtype([('strain', 'U6'), ('sequence', 'U50000')])
ar = np.genfromtxt(f, dtype=dt, delimiter=",", names=True, autostrip=True, encoding='utf-8')
# -- checks
ar.dtype.names
('strain', 'sequence')
ar.dtype
dtype([('strain', '<U6'), ('sequence', '<U50000')])
[len(i) for i in ar['sequence']]
[31752, 31752]
# -- both records in the sequence column are 31752 chars in length
Hi DanPatterson,
Yah, numpy can pandas python script can read and my original sequence 31752 length. However, it got truncated in some Geoprocessing Tools even if the text length is set to be >8000. Thank you for viewing my sample data.
Check the following steps.
1. Create a new File Geodatabase table Create Table (Data Management). Add the required fields with appropriate Field Length. Create and manage fields
2. Use Append (Data Management).
Input Dataset: CSV File
Target Dataset: File GDB Table
Field Matching Type: "Use the Field Map to reconcile Field Differences".
*Under Field Map, ensure the CSV fields are mapped appropriately with the File GDB Fields.
Hi JayantaPoddar,
Thank you for your attempt. I tried the workflow and still get a truncated sequence of 8001 length by using "Append".
Could you share a sample CSV with us? A couple of records would be fine.
I have attached the one sample fail_example.csv in my 1st post. Here is another sample2.csv for your reference.
Looks like a known bug, with no solution.
Current Status: Not in Product Plan
However, in ArcGIS Pro, Copy-Paste from Excel to Attribute Table has worked for me.
Although, not the best way, but it just might work for you.
Thanks. In fact, I tried using "Table to Geodatabase" works with excel data but I don't know whether any action will be done or only my matter