dynamic join of csv data to raster

MattGregory · ‎09-09-2011

I'm trying to dynamically join a CSV file to a raster using geoprocessing tools. I'm not sure my workflow is correct, but I'm coming across an uninformative error in my script. I'm using an external library (matplotlib) to inspect my csv file for correct data types before bringing it in to a table that Arc can use.

Based on the type information, I create a table and then add fields based on a lookup of numpy data type to ESRI data type.

import arcgisscripting
from matplotlib import mlab

numpy_to_esri_type = {
    ('b', 1): 'SHORT',
    ('i', 1): 'SHORT',
    ('i', 2): 'SHORT',
    ('i', 4): 'LONG',
    ('f', 4): 'FLOAT',
    ('f', 8): 'DOUBLE',
}

gp = arcgisscripting.create(9.3)
gp.workspace = 'D:/matt/hfa'

# CSV file
in_csv = 'stand_attr.csv'

# Temporary INFO file
temp_table = 'xxtmp'

# Test raster
raster = 'nn_test'

# Join field
join_field = 'FCID'

# Read the CSV file in to a recarray and get the data types
ra = mlab.csv2rec(in_csv)
col_names = [x.upper() for x in ra.dtype.names]
col_types = \
    [(ra.dtype.kind, ra.dtype.itemsize) for i in xrange(len(ra.dtype))]

# Sanitize column names
#   No field name may be longer than 16 chars
#   No field name can start with a number
for i in xrange(len(col_names)):
    col_names = col_names[0:16]
    col_names = col_names.lstrip('0123456789')

# Create the INFO table using the data type information to add fields
gp.createtable(gp.workspace, temp_table) 
for (n, t) in zip(col_names, col_types):
    try:
        esri_type = numpy_to_esri_type
        gp.addfield(temp_table, n, esri_type)
    except KeyError:
        if t[0] == 'S':
            gp.addfield(temp_table, n, 'TEXT', '#', '#', t[1]) 
        else:
            err_msg = 'Type not found for ' + str(t)
            print err_msg
            continue

# Append the records from the CSV file to the temporary INFO table
gp.append_management(in_csv, temp_table, 'NO_TEST')

# Strip out the join field from the names to avoid adding it to the raster
col_names.remove(join_field)

# Create a semi-colon delimited string of the fields we want to join
field_list = ';'.join(col_names)

# Join from INFO to GRID 
gp.joinfield(raster, 'VALUE', temp_table, join_field, field_list)

# Delete the temporary table
gp.delete(temp_table)

This works for a very small CSV file with 'safe' column names, but on a large table (e.g. 200 fields x 7000 records), it's raising an ExecuteError telling me it can't open the csv file. Obviously the csv file gets opened to be read into the recarray, so I know that's not the problem (as well as the same code working on small test cases).

Questions:

1) Is there some kind of size limitation that I'm bumping up against with large CSV files
2) Is this the best workflow or are there better (dynamic) ways of doing this?

thanks, matt

MattGregory · ‎09-09-2011

Sorry for the noise, but I think I figured this out. The ExecuteError message was a bit of a red herring saying that it couldn't open the CSV file. What had happened was that some column names in the original CSV file were illegal (too long or starting with a number) so there was a mismatch of column names in the CSV file and the fields I had appended into the INFO table.

By writing out the sanitized data (from the recarray with amended legal column names) and overwriting the existing CSV data, you can avoid the mismatch. I'd still like to hear of other ways folks have accomplished this task.

matt