NumPyArrayToTable error "RuntimeError: create table"

davedoesgis · ‎04-15-2020

Hi - I have a web request that returns JSON that I'm trying to get into a file geodatabase table. When I execute arcpy.da.NumPyArrayToTable, I get the following error:

RuntimeError: create table

This doesn't give me much to go on, so I'm not sure what to try next. Here is my code:

from pathlib import Path
import arcpy 
import numpy
import pandas
import requests

proxyDict = {
    "http":"<http_URL>",
    "https":"<https_URL>"
}
url = 'https://api.census.gov/data/2018/acs/acs5?get=NAME,group(B18102)&for=county:*&in=state:53'
fgdb = Path('C:/Temp/ACS2018/ACS2018.gdb')
table = fgdb / 'test_table'

r = requests.get(url)
# r = requests.get(url, proxies=proxyDict) # if you have a proxy server
df = pandas.read_json(r.text)

headers = df[0:1] # first row is headers
header_list = headers.values.tolist()[0] # convert to list
header_list[0] = 'FULL_NAME' # de-duplicate "NAME"
new_columns = dict(zip(list(range(0,len(header_list))), header_list)) #make dict

df2 = df[1:] # data in rows 2+
df3 = df2.rename(columns=new_columns) # add column names back in

# Convert to numpy array, then to table
numpy_array = numpy.array(numpy.rec.fromrecords(df3.values))
names = df3.dtypes.index.tolist()
numpy_array.dtype.names = tuple(names)
arcpy.da.NumPyArrayToTable(numpy_array, str(table))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

I also tried df3.to_numpy(), but got the same result.

DanPatterson_Retired · ‎04-15-2020

You better add some print statements to check that your array is properly formed...

You have passed on the dtype names, but there is no indication of the type format (ie text, numbers etc)

Here is an example.

d[:3]
 
array([('alabaster', '0.7.12', 'py_0', 'python'),
       (... snip ... ),
       ('argh', '0.26.2', 'py36_0', 'python')],
      dtype=[('Name', '<U30'), ('Version', '<U15'), ('Build', '<U15'),
             ('Requires', '<U293')])‍‍‍‍‍‍‍

Note that Uxx is for unicode (aka, text fields), and you can use

('Float_fld', '<f8') for double/float fields and

('Int_fld', '<i4') for integer fields (or '<i8' if you need large integers)

The help topic uses a dictionary format to form the dtype, which is also permissable, but I prefer explicit and you can use the short forms to cast the text/numeric format

NumPyArrayToTable—Data Access module | Documentation

PS, I have tons of blog posts on numpy

View solution in original post

DanPatterson_Retired · ‎04-15-2020

You better add some print statements to check that your array is properly formed...

You have passed on the dtype names, but there is no indication of the type format (ie text, numbers etc)

Here is an example.

d[:3]
 
array([('alabaster', '0.7.12', 'py_0', 'python'),
       (... snip ... ),
       ('argh', '0.26.2', 'py36_0', 'python')],
      dtype=[('Name', '<U30'), ('Version', '<U15'), ('Build', '<U15'),
             ('Requires', '<U293')])‍‍‍‍‍‍‍

Note that Uxx is for unicode (aka, text fields), and you can use

('Float_fld', '<f8') for double/float fields and

('Int_fld', '<i4') for integer fields (or '<i8' if you need large integers)

The help topic uses a dictionary format to form the dtype, which is also permissable, but I prefer explicit and you can use the short forms to cast the text/numeric format

NumPyArrayToTable—Data Access module | Documentation

PS, I have tons of blog posts on numpy

davedoesgis · ‎04-16-2020

Hi Dan - Regarding print statements, I've been developing these in a notebook, and my code blocks end by executing variable names that show what the value. I normally develop in an IDE (Eclipse/PyDev) with debugging breakpoints, and also use logging, but I was giving Jupyter a try for this. It's new to me, but I really like it for noodling on new things.

Anyhow, I created a list of numeric fields 'sum_names' and cast them all to int in Pandas:

df3[sum_names] = df3[sum_names].astype(int)‍

I'm not sure why that helped, because I still have other fields that are the default 'object' type, but the error went away.

After creating the table, I was going to join it to a feature class. In all my Googling, I also found arcpy.da.ExtendTable, which is really what I needed to do all in one step. Interestingly, that was partially working, in that it added the fields to my table, but all the values were NULL and it returned another cryptic error. When I got NumPyArrayToTable working, the ExtendTable also started working.

Are your blog posts here on GeoNet or somewhere else? I'm loving the power of numpy, but find Pandas a bit more intuitive.

VHolubec · ‎05-11-2020

Dear Dan Patterson‌,

Thank you for your post. I have the same problem. Everything seems fine in my settings. I just don´t know if some of the dtype couldn´t case the problem.

dtype=[('MistoMEreniID', '<i4'), ('Provozovatel', '<U3'), ('Datum', '<U10'), ('MaximalniHodnota', '<f8'), ('PrumernaHodnota', '<f8'), ('MistoMereni', '<U10'), ('Popis', 'O')])

Are all these types supported?

Thank you for any idea.

Regards,

Vladimir

JoshuaBixby · ‎05-11-2020

It is likely choking on the Object type of 'Popis'. Assuming Popis is a string field, you will need to convert it to string before passing it to Esri to create a table or feature class.

VHolubec · ‎05-12-2020

Dear Joshua Bixby‌,

thank you, it was the case!

Vladimir

DanPatterson · ‎05-12-2020

The whole array will be of object dtype, hence string as Joshua suggested if the structure is simple, otherwise, you have to use a compound dtype specifying the sequence of dtypes within it.

Numpy can use compound dtypes as in the following example

dt = [('OID_', '<i4'), ('XYZ', [('X', '<f8'), ('Y', '<f8'),('Z', '<f8')])]‍

So instead of X, Y and Z being separate entities, you can the Y values using

XYZ['Y'] (structured array) or

XYZ.Y (recarray)

To simplify your data structure you can convert the compound nature of the above to separate 'fields'

dt = [('OID_', '<i4'), ('X', '<f8'), ('Y', '<f8'), ('Z', '<f8')]‍‍

... sort of retired...

davedoesgis · ‎04-16-2020

I did a LOT of tinkering to get this working, so I'm not 100% sure what fixed the problem, but just to close the loop on this, here is the Jupyter notebook I got working exported to Python. Notes:

I think the solution was setting the columns as int, but I also removed a bunch of columns I wasn't interested in.
It writes to both a table and does a join to an existing feature class at the end.
This is just some prototype code I was thrashing on and my next task is to clean this up, so you've been warned, it's a bit rough, but it runs (for me, anyhow).
I did update to the latest Pandas using Pip.
There's some places I took out my environment settings. Should be pretty obvious.
As long as the Census keeps this URL working and you have their County feature class with field GEOID, this should work with very little modification. It's requesting data on language ability (group B16005) and sums up certain columns (a.k.a. names) that I'm interested in.

#!/usr/bin/env python
# coding: utf-8
# In[222]:

from pathlib import Path
import arcpy 
import numpy
import pandas
import requests

# In[223]:

proxyDict = {
    "http":"<url>",
    "https":"<url>"
}
url = 'https://api.census.gov/data/2018/acs/acs5?get=NAME,group(B16005)&for=county:*&in=state:*'
sum_names = ['B16005_007E', 'B16005_008E', 'B16005_012E', 
       'B16005_013E', 'B16005_017E', 'B16005_018E',
       'B16005_022E', 'B16005_023E', 'B16005_029E',
       'B16005_030E', 'B16005_034E', 'B16005_035E',
       'B16005_039E', 'B16005_040E', 'B16005_044E',
       'B16005_045E']
fgdb = Path('C:/<folder1>/<folder2>/<your_FGDB>.gdb')
table = fgdb / 'test_table'
counties = fgdb / 'County'
print(sum_names)

# In[224]:

r = requests.get(url)
# r = requests.get(url, proxies=proxyDict) # if you have a proxy server
print('response received')
df = pandas.read_json(r.text)
df

# In[225]:

headers = df[0:1] # first row is headers
header_list = headers.values.tolist()[0] # convert to list
header_list[0] = 'FULL_NAME' # de-duplicate "NAME"
new_columns = dict(zip(list(range(0,len(header_list))), header_list)) #make dict
new_columns

# In[230]:

df2 = df[1:] # data in rows 2+
df3 = df2.rename(columns=new_columns) # add column names back in
df3['fips_append'] = df3["state"] + df3["county"]
print(len(df))
print(len(df3.columns))
df3[sum_names] = df3[sum_names].astype(int)
# verify type set
#for name in sum_names:
#    print(name, df3.dtypes[name])
#print(df3.dtypes) # prints all, but truncated
df3['sumvalue'] = df3[sum_names].sum(axis=1)
out_cols = ['fips_append'] + sum_names + ['sumvalue']
print(out_cols)
df3 = df3[out_cols]
df3

# In[227]:

numpy_array = numpy.array(numpy.rec.fromrecords(df3.values))
names = df3.dtypes.index.tolist()
numpy_array.dtype.names = tuple(names)
numpy_array

# In[228]:

arcpy.da.NumPyArrayToTable(numpy_array, str(table))

# In[229]:

arcpy.da.ExtendTable(str(counties), 'GEOID', numpy_array, 'fips_append', append_only=False)

# In[ ]:


‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍