Numpy: adding a field to a structured array fails in Python Toolbox, but not in IDE, or Python window

NathanielRoth · ‎05-29-2015

I've got a script that I'm trying to get into a python toolbox for use and distribution. It adds a new column to a dataset that containing the quantile that a data field falls into (number of quantiles is an input as is the field, and an option to invert the quantile numbers).

Basically:

It extracts the needed fields (OID, and the selected field) from a feature class using arcpy.da.FeatureClassToNumPyArray.

When I try to add a new field to the the resulting structured array using the code in a Python Toolbox, I get the following error:

Traceback (most recent call last):

File "<string>", line 133, in execute

TypeError: data type not understood

What's odd about this, is the same code differing (as far as I can see) only in externally providing the parameter values works fine in an IDE (Eclipse/PyDev using the same Python interpreter). So does cutting and pasting the lines of code into the Python window in ArcCatalog.

If I run the same section of code for creating the copy of the structured array using only the existing fields, it also works (at least that far) in the Python Toolbox.

I have tried multiple formulations of the dtype values with the same result from: <i4, int, integer

ArcGIS 10.2.2, with the default installation of python(2.7.5) and numpy (v1.7.1).

If anyone wants to take a look, my toolbox is available at: CenterForRegionalChange/QuantileCalc · GitHub

Thanks,

Nate

DanPatterson_Retired · ‎05-29-2015

Hard to follow with all that toolbox stuff. In short, maybe your arcpy.da.ExtendTable(in_features,"OBJECTID" ,nparray3,"OID@") just isn't cutting it with your dtype and/or environment.

When I am working with numpy arrays and wish to concatenate or join arrays or columns together, use recfunctions which is housed in the numpy.lib folder. Its initial use was for matplotlib and it is functionality has never made it mainstream numpy since the functions are in essence shells for basic numpy array operations with the ugliness of reshaping and reformulating arrays hidden from the gentle user. Arcpy has some of this functionality in the arcpy.da section like Extend Table as well as the conversion to-from arc* to numpy

Simple arrays of the same dtype are easy to put together.

>>> # plain arrays of the same data type
>>> a = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]],dtype='float64')
>>> b = np.array([[0, 1],[2, 3]],'float64')
>>>
>>> # concatenate the columns, at least 2 methods
>>> np.c_[a, b]
array([[ 0.,  1.,  2.,  3.,  4.,  0.,  1.],
      [ 5.,  6.,  7.,  8.,  9.,  2.,  3.]])
>>> np.concatenate((a, b),axis = 1)
array([[ 0.,  1.,  2.,  3.,  4.,  0.,  1.],
      [ 5.,  6.,  7.,  8.,  9.,  2.,  3.]])

However, as you will agree, when you have mixed dtypes, things need to be addressed a tad more carefully.

So via example, I will let you explore the options.

Assume I have two arrays that I want to put together to form one...say this is a 'join' in arc* terminology.

Procedure:

import recfunctions (rfn for short)
take your 2 arrays which can be of different dtypes as shown
use the merge_arrays function in rfn
examine your resultant

>>> import numpy as np
>>> import numpy.lib.recfunctions as rfn
>>> a = np.array([(1, 2, 3.0),(4, 5, 6.0)], dtype=[('X','int32'), ('Y','int32'),('Z','float64')])
>>> b = np.array([('a','b'),('c','d')], dtype=[('Var_1','|S10'),('Var_2','|S5')])
>>> c = rfn.merge_arrays((a, b), asrecarray=True, flatten=True)
>>>
>>> a                #  input array 1
array([(1, 2, 3.0), (4, 5, 6.0)],
      dtype=[('X', '<i4'), ('Y', '<i4'), ('Z', '<f8')])
>>> b                # input array 2
array([('a', 'b'), ('c', 'd')],
      dtype=[('Var_1', 'S10'), ('Var_2', 'S5')])
>>> c                # joined at last...
rec.array([(1, 2, 3.0, 'a', 'b'), (4, 5, 6.0, 'c', 'd')],
      dtype=[('X', '<i4'), ('Y', '<i4'), ('Z', '<f8'), ('Var_1', 'S10'), ('Var_2', 'S5')])

recfunctions also has a bunch of other tools that you will find interesting.

>>> dir(rfn)

['MaskedArray', 'MaskedRecords', ... snip ..., 'append_fields', 'drop_fields', 'find_duplicates', 'flatten_descr', 'get_fieldstructure', 'get_names', 'get_names_flat', 'itertools', 'izip_records', 'join_by', 'ma', 'merge_arrays', 'ndarray', 'np', 'rec_append_fields', 'rec_drop_fields', 'rec_join', 'recarray', 'recursive_fill_fields', 'rename_fields', 'stack_arrays', 'sys', 'zip_descr']

Hope this helps .... otherwise, for general info.

NathanielRoth · ‎05-29-2015

Thank you Dan,

That's very good information, but that doesn't actually get at my issue.

I've reformatted the Python toolbox at the github link above to try to more clearly identify the problem.

I've added a snippet at the bottom of the .pyt module so that I can execute it directly against the same function, and it works fine.

I have a function that does exactly what I need it to. It works if I run it in a separate IDE, calling the same Python installation (32bit, ver 2.7.5) as ArcGIS is uses (not just the same version, using the default ArcGIS installation of Python). That same code works if I paste it line by line into the the python window in ArcGIS (a painful process).

def Quantiles(in_features, in_field, in_quant, in_qdir):
    print("converting to numpy")
    nparray = da.FeatureClassToNumPyArray(in_features,["OID@",in_field],skip_nulls = True)
    
    print("calculating quantiles")
    n = 1.0/float(in_quant)
    qs = [n*x*100 for x in xrange(1,int(in_quant)+1)]
    print(qs)
    
    print("calculating percentiles")
    flcol = np.array(nparray[[in_field]], np.float)
    ps = np.percentile(flcol, qs)
    print(ps)
    
    print("Adding new numpy field")
    newfldname = "".join(["Q",in_field])
    fldtype = (newfldname,'int32',)
    dtype=nparray.dtype.descr
    dtype.append(fldtype)
    dtype2 = np.dtype(dtype)
    nparray2 = np.empty(nparray.shape, dtype=dtype2)
    for name in nparray.dtype.names:
        nparray2[name] = nparray[name]
    
    print("Assign Quantiles")
    out = AssignQuant(flcol,ps)
    if in_qdir == "Reverse":
        out = (int(in_quant) + 1) - out
    
    nparray2[newfldname] = out
    nparray3 = nparray2[['OID@',newfldname]]
   
    print("Extend table to include the new values")
    da.ExtendTable(in_features,"OBJECTID" ,nparray3,"OID@")


    print("Done")

When the function is called from within a Python Toolbox, it has problems with the dtype definition (line 41 on github, 20 above).

dtype2 = np.dtype(dtype)

I can see quite clearly that the parameters are making it into the function. I could do this in several other ways, but the performance is (much) better with Numpy than any of the cursor based, or join, addField, and calculateField methods I've looked at. If it were just for me, I'd run it just as the standalone script and import the function as needed. I suspect this'll be a starting point for publishing a geoprocessing service for our use, so I need to keep it within that realm.

DanPatterson_Retired · ‎05-29-2015

I don't know but using variable names the same as python objects may be sketchy, I don't see anything obvious, but why don't you follow my example and let numpy do the heavy lifting, and determine the data type from the data provided. Sorry for another example...but

>>> a
array([(0, 342004.0, 5023921.0, 0), (1, 342056.0, 5023947.0, 0),
      (2, 341674.0, 5023846.0, 0), (3, 341547.0, 5023635.0, 0),
      (4, 341936.0, 5023331.0, 0)],
      dtype=[('IDs', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Results', '<i4')])
>>> new_fld = np.arange(5)
>>> new_fld
array([0, 1, 2, 3, 4])
>>> rfn.append_fields(a,"NewStuff",new_fld,usemask=False)
array([(0, 342004.0, 5023921.0, 0, 0), (1, 342056.0, 5023947.0, 0, 1),
      (2, 341674.0, 5023846.0, 0, 2), (3, 341547.0, 5023635.0, 0, 3),
      (4, 341936.0, 5023331.0, 0, 4)],
      dtype=[('IDs', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Results', '<i4'), ('NewStuff', '<i4')])
>>>

In the above, an ndarray existed (because I was answering another question).

I decided to create some critical data (aka line 06...5 lowly numbers

I have already imported recfunctions as in my first example

Using the append_fields (rather than my example for merging arrays), I create a NewStuff field, whip in the aforemention data and indicated not to create a masked array. NOTE, you can create a recarray if you want...check the help for rfn.append_fields. You will notice that the dtype has been determined from the input data

NathanielRoth · ‎05-29-2015

Thanks again,

rfn.append_fields definitely simplifies things, and you're correct about my use of object names as variable names.

I've made use of the append_fields method from recfunctions and receive a different error, but one that seems to indicate the same problem.

Traceback (most recent call last):
  File "<string>", line 154, in execute
  File "<string>", line 52, in Quantiles
  File "C:\Python27\ArcGIS10.2\lib\site-packages\numpy\lib\recfunctions.py", line 616, in append_fields
    data = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
TypeError: data type not understood


Failed to execute (QuantileCalc).

Like before, I can run the function in an IDE without error, generating the desired result.

I remain puzzled by what's going on. It might be time to bring in an Esri tech support call.

DanPatterson_Retired · ‎05-29-2015

Hmmmm you might, but I would have a real hard look at the data that you are trying to append. To test, try appending some other data of the same type...then a different type...ensuring that there are no missing values in the field. If it can be handled outside in an IDE, then it is an Esri problem

DanPatterson_Retired · ‎05-29-2015

I can't generate any errors in python as you have noted but your use of a.view... is giving me some issues your way gives me a list, if I am not mistaken, and the alternate way of viewing a gives me a different one...then you create the array from the view. I suggest you skip trying to create the dtype altogether and just merge the two arrays to produce the new output

>>> import numpy as np
>>> x = np.arange(10)
>>> Q = np.arange(-5,5)
>>>
>>> a = np.array([x,Q],dtype=[('X','i4'),('Q','int32')])
>>>
>>> names = ['X','Q']
>>> data = [x,Q]
>>> test = [a.view([(name, a.dtype)]) for (name, a) in zip(names, data)]
>>> test
[array([(0,), (1,), (2,), (3,), (4,), (5,), (6,), (7,), (8,), (9,)],
      dtype=[('X', '<i4')]), array([(-5,), (-4,), (-3,), (-2,), (-1,), (0,), (1,), (2,), (3,), (4,)],
      dtype=[('Q', '<i4')])]
>>> test2 = [(name, a.dtype) for (name, a) in zip(names, data)]
>>> test2
[('X', dtype('int32')), ('Q', dtype('int32'))]
>>> a.view(test2)
array([(-5, -4), (-3, -2), (-1, 0), (1, 2), (3, 4)],
      dtype=[('X', '<i4'), ('Q', '<i4')])
>>>

NathanielRoth · ‎05-29-2015

Thanks again,

The irony is that that a.view call is actually being made internally to the recfunctions module as part of the append_fields function.

I think I'm going to try to run this up the flagpole with Esri support to see what they've got to say. I will report back.

AlexanderNohe1 · ‎06-03-2015

Hey Nathaniel Roth,

I was playing with your script here and got it to work when I cast newfldname to a string:

(Line 52 in the Github script)

nparray2 = rfn.append_fields(nparray, str(newfldname), out, usemask = False)

I hope this helps!

Update: I did a little more digging and found that when you get the in_field from the tool GUI, that it is passed as type unicode. That might be why it was failing as a python toolbox tool rather than when you hard code the data or run it in an IDE.

Also, I made a pull request to your repo on GitHub.

NathanielRoth2 · ‎07-13-2015

I just wanted to stop back by to thank everyone for chiming in.

In short, it looks like you have to be very cautious because strings passed into code from Python toolboxes seems to get converted to unicode which fail several sets of equality checks when compared to standard Python strings including attempts to match them for doing joins. It also comes up when numpy is expecting a regular string for field naming.