gregorym

dynamic join of csv data to raster

Discussion created by gregorym on Sep 9, 2011
Latest reply on Sep 9, 2011 by gregorym
I'm trying to dynamically join a CSV file to a raster using geoprocessing tools.  I'm not sure my workflow is correct, but I'm coming across an uninformative error in my script.  I'm using an external library (matplotlib) to inspect my csv file for correct data types before bringing it in to a table that Arc can use.

Based on the type information, I create a table and then add fields based on a lookup of numpy data type to ESRI data type.


import arcgisscripting
from matplotlib import mlab

numpy_to_esri_type = {
    ('b', 1): 'SHORT',
    ('i', 1): 'SHORT',
    ('i', 2): 'SHORT',
    ('i', 4): 'LONG',
    ('f', 4): 'FLOAT',
    ('f', 8): 'DOUBLE',
}

gp = arcgisscripting.create(9.3)
gp.workspace = 'D:/matt/hfa'

# CSV file
in_csv = 'stand_attr.csv'

# Temporary INFO file
temp_table = 'xxtmp'

# Test raster
raster = 'nn_test'

# Join field
join_field = 'FCID'

# Read the CSV file in to a recarray and get the data types
ra = mlab.csv2rec(in_csv)
col_names = [x.upper() for x in ra.dtype.names]
col_types = \
    [(ra.dtype[i].kind, ra.dtype[i].itemsize) for i in xrange(len(ra.dtype))]

# Sanitize column names
#   No field name may be longer than 16 chars
#   No field name can start with a number
for i in xrange(len(col_names)):
    col_names[i] = col_names[i][0:16]
    col_names[i] = col_names[i].lstrip('0123456789')

# Create the INFO table using the data type information to add fields
gp.createtable(gp.workspace, temp_table) 
for (n, t) in zip(col_names, col_types):
    try:
        esri_type = numpy_to_esri_type[t]
        gp.addfield(temp_table, n, esri_type)
    except KeyError:
        if t[0] == 'S':
            gp.addfield(temp_table, n, 'TEXT', '#', '#', t[1]) 
        else:
            err_msg = 'Type not found for ' + str(t)
            print err_msg
            continue

# Append the records from the CSV file to the temporary INFO table
gp.append_management(in_csv, temp_table, 'NO_TEST')

# Strip out the join field from the names to avoid adding it to the raster
col_names.remove(join_field)

# Create a semi-colon delimited string of the fields we want to join
field_list = ';'.join(col_names)

# Join from INFO to GRID 
gp.joinfield(raster, 'VALUE', temp_table, join_field, field_list)

# Delete the temporary table
gp.delete(temp_table)


This works for a very small CSV file with 'safe' column names, but on a large table (e.g. 200 fields x 7000 records), it's raising an ExecuteError telling me it can't open the csv file.  Obviously the csv file gets opened to be read into the recarray, so I know that's not the problem (as well as the same code working on small test cases). 

Questions:

1) Is there some kind of size limitation that I'm bumping up against with large CSV files
2) Is this the best workflow or are there better (dynamic) ways of doing this?

thanks, matt

Outcomes