Join csv to shapefile with leading zeros using arcpy

1199
5
07-16-2021 02:48 PM
JonathanButtle
New Contributor II

Hi -

I'm trying to join a csv file to a shapefile using arcpy. I have ArcGIS Pro v2.8. Problem is, the join fields have leading zeros. The field in the shapefile has the correct type (text) but the field in the csv imports as integer. 

Here is my code:

import arcpy
arcpy.env.workspace = "myPath"
spatial_import = "pathToShapeFile"
tab_inport = "pathToCSV"
LAYER = arcpy.management.JoinFields(in_data=spatial_import, in_field='county',join_table='tab_import', join_field="county")

 For counties with a "100" code series the join works but not for counties with leading zeros. 

I've tried importing the csv file into Pandas as a dataframe (formatting the "county" field as a string) and then redoing the join but arcpy threw an error (apparently you can't use a Pandas dataframe in JoinFields). 

I have a lot of maps with joins like this so I was trying to streamline the workload. Any ideas about how to import the csv as a table while preserving leading zeros so the join will work? 

Or, manipulating data using Pandas and then joining with a shapefile?

Thanks.

0 Kudos
5 Replies
DanPatterson
MVP Esteemed Contributor

you may have to use a scheme ini file to define the csv format

Add an ASCII or text file table—ArcGIS Pro | Documentation

numpy has tools to read delimited data with defined field types.. I am sure pandas does, however, numpy can be used in conjunction the acrpy.da.NumPyArrayToTable and/or arcpy.ExtendTable to interact with ArcGIS Pro and geodatabase tables.

PS skip shapefiles, if you need to export them, do so at the end of processing.


... sort of retired...
curtvprice
MVP Esteemed Contributor

Hi @JonathanButtle  If you don't want to mess with schema.ini files a kludgy but handy way to do this is to add a field with dummy text data to force text type (and field width). After the data are loaded in, select the dummy row and delete it. I agree with Dan to use geodatabase and export to shapefile. (Search for shapefile limitations in the help for more info on that.)  Here's an example csv to demonstrate. The csv below will load as Long, String, Double field types.

 

ID, TXTFIELD, NUMFIELD
999,A234567890,999.999
1,050,050
2,060,060

 

 

 

 

JonathanButtle
New Contributor II

Thanks for the responses. I actually tried a different tact - I imported the csv as a Pandas dataframe and was able to convert it to a Numpy array with the proper formatting. I exported the array to a file geodatabase (along with the shapefile) and did the join in the database. I'm still trying to sort out workload but it seems I can do whatever processing I need to do to the data in python and then import it into a project or a geodatabase.

Thanks.

-jb

DanPatterson
MVP Esteemed Contributor

Specifying the data type really makes things simple so you don't have to rely on behind the scenes conversions, especially for padded values that may exist

    sample csv ::

        "IDs","Float_vals","Text_vals","Int_vals","Int_as_text","Lead_int"
        1,2.98,a,38,38,0038
        2,9.99,b,74,74,0074
        3,1.23,c,35,35,0035
        4,3.45,d,9,9,0009
        5,4.56,e,10,10,0010

    >>> dt = np.dtype([('IDs', 'i8'), ('Float_vals', 'f8'),
    ...                ('Text_vals', 'U5'), ('Int_vals', 'i8'),
    ...                 ('Int_as_text', 'U8'), ('Lead_int', 'U8')])
    >>> a = np.genfromtxt(tbl, dtype=dt, delimiter=",", names=True,
    ...                   autostrip=True, encoding='utf-8')
    array([(1,   2.98, 'a', 38, '38', '0038'),
           (2,   9.99, 'b', 74, '74', '0074'),
           (3,   1.23, 'c', 35, '35', '0035'),
           (4,   3.45, 'd',  9, '9', '0009'),
           (5,   4.56, 'e', 10, '10', '0010')],
          dtype=[('IDs', '<i8'), ('Float_vals', '<f8'),
                 ('Text_vals', '<U5'), ('Int_vals', '<i8'),
                 ('Int_as_text', '<U8'), ('Lead_int', '<U8')])

... sort of retired...
JonathanButtle
New Contributor II

Hi Dan -

Thanks. I did something similar by importing the csv into Pandas for analysis, manipulation, etc. and then converting it into a numpy recarray (being sure to convert object fields into float, integer or unicode types) and exporting it to a file geodatabase table. 

0 Kudos