arcpy.da.NumPyArrayToTable() returns "RuntimeError: create table"

4195
18
12-27-2018 12:03 PM
Arne_Gelfert
Regular Contributor

After spending a few months working through a Python for Data Science curriculum, I have gotten to like woriking with pandas for data prep in Python. Naturally, I wanted to try using it with arcpy. Usually, I would attempt large batch GIS data manipulation or data moving with FME. But FME, while it is powerful, often takes too many settings or mouse-clicks to remember. Seems like I can never remember some parameter that is only visible after drilling down some UI tree. If I script it, I see all my settings. I found a great example of getting a pandas Dataframe into a geodatabase here

import arcpy
from arcpy import env

env.workspace = "my_favorite_workspace" #pointing to SDE connection file

#I already have a dataframe df that is loaded from SQL Server using pyodbc

#import pandas as pd
#import pyodbc

#sql_conn = pyodbc.connect('''DRIVER={ODBC Driver 13 for SQL Server};
#                                                SERVER=<myserver>;
#                                                DATABASE=mydatabase>;
#                                                Trusted_Connection=yes''')

#query = "<myquery>"
#df = pd.read_sql(query, sql_conn,columns=['count'])


This all works like a charm. I can now do all kinds of data manipulation.Then I create a structured array in numpy.

#first you turn your dataframe into a simple np series

nparray = df.values

#then you create a structured numpy array, see here for documentation

npstruct = np.rec.fromrecords(nparray)
#then you create a list of field names from the dataframe

colnames = df.dtypes.index.tolist()

#then you use that list to update column names in structured array

npstruct.dtype.names = tuple(colnames)

Again, everything works fine. I get my field names and I get a structured array that looks like it should based on the documentation.The final step is to use that array in arcpy.da.NumPyArrayToTable() to create a table.  ESRI has additional documentation here.

arcpy.da.NumPyArrayToTable(np_struct, "newtable")

---------------------------------------------------------------------------RuntimeError                              Traceback (most recent call last)<ipython-input-31-14d9b2719bd8> in <module>----> 1 arcpy.da.NumPyArrayToTable(np_struct, "newtable")RuntimeError: create table

Need help figuring out what's the problem. I have my env.workspace pointing to an SDE connection. I've also tried using the same code to create "newtable.dbf" in a local directory. Don't think it's permissions either because the following is successful:

arcpy.CreateTable_management(env.workspace, 'new_table')

0 Kudos
18 Replies
DanPatterson_Retired
MVP Esteemed Contributor

make sure you are using a full path to the table in the geodatabase... skip dbf altogether

arcpy.da.NumPyArrayToTable(array, "c:/data/gdb.gdb/out_table")

where array is the structured array, and out_table is the table name to use in the *.gdb.

PS overwriteOutput has no affect. It will fail miserably even if set, so use a different name if you have saved it before.

Arne_Gelfert
Regular Contributor

Thanks, Dan. I tried that and get the exact same error. Could it be that I have to actually create the table before loading data? Ha! I will try that next.

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor
tbl = r"c:\path\to_npy_file\sample_20.npy"  # save this to a volder

a = np.load(tbl)

a.dtype
dtype([('OBJECTID', '<i4'), ('f0', '<i4'), ('County', '<U2'), ('Town', '<U6'), ('Facility', '<U8'), ('Time', '<i4')])

a

array([( 1,  0, 'B', 'B_', 'Hall', 11), ( 2,  1, 'A', 'A_', 'Hall', 24),
       ( 3,  2, 'C', 'C_', 'Hosp', 43), ( 4,  3, 'A', 'B_', 'Hall', 43),
       ( 5,  4, 'B', 'B_', 'Hall', 16), ( 6,  5, 'B', 'A_', 'Hall',  8),
       ( 7,  6, 'A', 'C_', 'Hall', 26), ( 8,  7, 'B', 'C_', 'Hall', 31),
       ( 9,  8, 'C', 'C_', 'Hall',  7), (10,  9, 'A', 'A_', 'Hall', 58),
       (11, 10, 'A', 'A_', 'Hosp', 20), (12, 11, 'C', 'A_', 'Hosp', 37),
       (13, 12, 'C', 'B_', 'Hall', 36), (14, 13, 'A', 'B_', 'Hosp', 33),
       (15, 14, 'C', 'C_', 'Hosp', 51), (16, 15, 'B', 'C_', 'Hosp', 53),
       (17, 16, 'C', 'A_', 'Hosp', 21), (18, 17, 'C', 'C_', 'Hosp', 42),
       (19, 18, 'A', 'B_', 'Hosp', 43), (20, 19, 'A', 'C_', 'Hall',  5)],
      dtype=[('OBJECTID', '<i4'), ('f0', '<i4'), ('County', '<U2'), ('Town', '<U6'), ('Facility', '<U8'), ('Time', '<i4')])‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Load that then try NumPyArrayToTable.  put it into a gdb and give it a name

I know it works, I use this as a test file all the time.

Arne_Gelfert
Regular Contributor

Thanks, Dan, and happy new year! - I tried your above example and it works for me just fine. I had to use the full path to the SDE or GDB table. Don't understand why using only table name and env.workspace doesn't work but I can create a new table with correct field names and data types of the structured array. 

However, I am still not getting anywhere with my own example. The whole purpose of this exercise was to be able to work with a dataframe in pandas, and get the pandas datatypes converted to numpy so that the arcgis function can create a table from it. By now, I suspect this method may be the culprit - scipy.org search returns no results for it at all.

  • np.rec.fromrecords

In your case, the array you created "manually" looked like this:

  • array([ (A1, B1, C1),(A2, B2, C3) ....], dtype = [('FieldA', 'DatatypeA'),('FieldB', 'DatatypeB')...])

When I use the above method on the df.values and convert back into ndarray, I get :

  1. x = nd.array(np.rec.fromrecords(df.values))
  2. x
  3. >>array([ (A1, B1, C1),(A2, B2, C3) ....], dtype = (numpy.record, [('FieldA', 'DatatypeA'),('FieldB', 'DatatypeB')...]))

So although np.array() is supposed to turn the record array back into a regular nd.array, there is still that "numpy.record" in the dtype. 

The more I think about it, this probably isn't a question for Geonet any longer but rather a numpy question. But if you have any suggestions, let me know.

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Arne, I just don't use pandas, There is really no need to if you are just bringing arrays (aka tables and/or rasters) to get pandas involved at all.  It really doesn't offer much more to numpy except for a gentler 'shell' to the interface, but for the most part, I haven't found anything that pandas offers that I can't already handle (including pivot tables, groupby operations and joins).

I have dozens of blog posts on working with numpy and arcpy for both raster and tabular data

0 Kudos
Arne_Gelfert
Regular Contributor

Dan, thanks for chiming in again... I've been a little side tracked from looking at this again. I hear you. If you have a toolkit that does the trick, why learn another one that accomplishes the same. Especially in a world of wrappers around wrappers. But I kinda feel that way about pandas... I know it better than numpy and wanted to figure out how to do that. --- I did take a look at your blog just now and will be sure to return to it soon. Thanks.

0 Kudos
MarkZito
New Contributor III

version .24 of pandas will have a .to_numpy() method to convert a data frame to a numpy array. ArcGIS Pro 2.3.2 ships with version .23 so the method is not available yet. In the meantime you can use the following.

#df is the pandas dataframe

x = df.reset_index()
z = np.rec.fromrecords(x.values, names=x.columns.tolist())
arcpy.da.NumPyArrayToTable(z, out_table)

more info on this thread

https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array

0 Kudos
ThomasTaggart2
New Contributor III

Mark,

I am attempting to use your proposed solution and I get an error from Arcpy telling me that the the in_array is not a numpy array (TypeError: 'in_array' is not numpy array).  Now my IDE recognizes it as an 'recarry'.

0 Kudos
MarkZito
New Contributor III

change the 2nd line to the following. You need to pass in the values of the dataframe, not the dataframe itself. 

z = np.rec.fromrecords(x.values, names=x.columns.tolist())

0 Kudos