After spending a few months working through a Python for Data Science curriculum, I have gotten to like woriking with pandas for data prep in Python. Naturally, I wanted to try using it with arcpy. Usually, I would attempt large batch GIS data manipulation or data moving with FME. But FME, while it is powerful, often takes too many settings or mouse-clicks to remember. Seems like I can never remember some parameter that is only visible after drilling down some UI tree. If I script it, I see all my settings. I found a great example of getting a pandas Dataframe into a geodatabase here.
import arcpy
from arcpy import envenv.workspace = "my_favorite_workspace" #pointing to SDE connection file
#I already have a dataframe df that is loaded from SQL Server using pyodbc
#import pandas as pd
#import pyodbc#sql_conn = pyodbc.connect('''DRIVER={ODBC Driver 13 for SQL Server};
# SERVER=<myserver>;
# DATABASE=mydatabase>;
# Trusted_Connection=yes''')#query = "<myquery>"
#df = pd.read_sql(query, sql_conn,columns=['count'])
This all works like a charm. I can now do all kinds of data manipulation.Then I create a structured array in numpy.
#first you turn your dataframe into a simple np series
nparray = df.values
#then you create a structured numpy array, see here for documentation
npstruct = np.rec.fromrecords(nparray)
#then you create a list of field names from the dataframecolnames = df.dtypes.index.tolist()
#then you use that list to update column names in structured array
npstruct.dtype.names = tuple(colnames)
Again, everything works fine. I get my field names and I get a structured array that looks like it should based on the documentation.The final step is to use that array in arcpy.da.NumPyArrayToTable() to create a table. ESRI has additional documentation here.
arcpy.da.NumPyArrayToTable(np_struct, "newtable")
---------------------------------------------------------------------------RuntimeError Traceback (most recent call last)<ipython-input-31-14d9b2719bd8> in <module>----> 1 arcpy.da.NumPyArrayToTable(np_struct, "newtable")RuntimeError: create table
Need help figuring out what's the problem. I have my env.workspace pointing to an SDE connection. I've also tried using the same code to create "newtable.dbf" in a local directory. Don't think it's permissions either because the following is successful:
arcpy.CreateTable_management(env.workspace, 'new_table')
Mark,
Thanks for the help, but it isn't working. Here is what I have (with some things Xed out):
import pandas as pd import pyodbc import arcpy import numpy as np cnctStr='Driver={SQL Server Native Client 11.0};Server=XXXXXX;Database=XXXXXX;Trusted_Connection=yes;' conn = pyodbc.connect(cnctStr) SQL_Query = pd.read_sql_query( '''SELECT FILE_NUMBER, TYPE_CODE, STATUS_CODE, REVIEWER_ID, PREVIOUS_REVIEWER, OWNERS, WATERBODY_NAME, PROJECT_CATEGORY, PRELIMINARY_CATEGORY, APPLICANTS_REQUESTED_IMPACT, CALCULATED_AREA_OF_IMPACT, IMPACT_WETLAND_TYPE_CODE, CITY_NAME, COUNTY, LAST_ACTION_DATE, DATE_LAST_INFO_LETTER_SENT, IS_MORE_INFO_LETTER_SENT, DATE_MORE_INFO_LETTER_RETURNED, DATE_RECEIVED FROM WETLANDS_PERMIT_V WHERE (STATUS_CODE IN ('REVIEW EXT', 'PENDING', 'EA_ISSUED', 'MI RECEIVE', 'MI REQUEST')) AND (DATE_RECEIVED >= '01/01/2014')''', conn) df = pd.DataFrame(SQL_Query, columns=['FILE_NUMBER','TYPE_CODE','STATUS_CODE','REVIEWER_ID','PREVIOUS_REVIEWER','OWNERS', 'WATERBODY_NAME','PROJECT_CATEGORY','PRELIMINARY_CATEGORY','APPLICANTS_REQUESTED_IMPACT', 'CALCULATED_AREA_OF_IMPACT','IMPACT_WETLAND_TYPE_CODE','CITY_NAME','COUNTY','LAST_ACTION_DATE', 'DATE_LAST_INFO_LETTER_SENT','IS_MORE_INFO_LETTER_SENT','DATE_MORE_INFO_LETTER_RETURNED','DATE_RECEIVED']) arcpy.env.overwriteOutput = True arcpy.env.workspace = 'XXXXXXXXX.gdb' x = df.reset_index() z = np.rec.fromrecords(x.values, names=x.columns.tolist()) arcpy.da.NumPyArrayToTable(x, 'EA_Dash')
Which produces this error:
TypeError: 'in_array' is not numpy array
Snapshot of the 'z' object:
I think my code matches you're exactly? So, any idea what gives?
Tom
I think I see the issue, in the last line where you call arcpy.da.NumPyArrayToTable(x, 'EA_Dash'), change the variable 'x' to 'z'.
Okay, well that was a big 'Duh!' on my part.
Fixing that though, it seems there is still some sort of problem. I then get this error:
RuntimeError: create table
And I think it has to do with Python 'Object' types in the array not able to be handled by the arcpy function:
Here are my dtypes - with 'O' being python 'Objects':
(numpy.record, [('index', '<i4'), ('FILE_NUMBER', '<U10'), ('TYPE_CODE', '<U10'), ('STATUS_CODE', '<U10'), ('REVIEWER_ID', 'O'), ('PREVIOUS_REVIEWER', 'O'), ('OWNERS', '<U80'), ('WATERBODY_NAME', 'O'), ('PROJECT_CATEGORY', 'O'), ('PRELIMINARY_CATEGORY', 'O'), ('APPLICANTS_REQUESTED_IMPACT', '<f8'), ('CALCULATED_AREA_OF_IMPACT', '<f8'), ('IMPACT_WETLAND_TYPE_CODE', 'O'), ('CITY_NAME', '<U17'), ('COUNTY', '<U3'), ('LAST_ACTION_DATE', 'O'), ('DATE_LAST_INFO_LETTER_SENT', 'O'), ('IS_MORE_INFO_LETTER_SENT', 'O'), ('DATE_MORE_INFO_LETTER_RETURNED', 'O'), ('DATE_RECEIVED', 'O')])
Numpy also added two more functions to facilitate array conversion of ndarrays to structured arrays and from structured arrays to ndarrays.
structured_to_unstructured
unstructured_to_structured
from numpy.lib.recfunctions import structured_to_unstructured as stu
from numpy.lib.recfunctions import unstructured_to_structured as uts
Which I have named 'stu' and cousin 'uts'
Cool - now I just need to get back to where I was stu+n months ago. Thanks for recording this though. Great to see someone in GIS is keeping up with what's happening in the numpy world.
I am having this exact same problem - has anyone worked it out?
This worked for me.
Basically, I breakdown the pandas Dataframes to multiple simple arrays. Use the arrays to get their numpy datatype, because arcpy.da.NumPyArrayToTable() is having trouble handling pandas object dtype. Then create a new Structured Array with np.rec.fromarrays function
Get the columns names:
names = sms_new.columns
Get the data in the columns:
arrays = [ sms_new[col].get_values() for col in names ]
Get the data type for each data array:
formats = [f'{array.astype(str).dtype}' for array in arrays ]
Create new Structured Array with .fromarrays function:
sms_array = np.rec.fromarrays( arrays, dtype={'names': names, 'formats': formats} )
Hi,
Just putting this out here in case someone has the same "RuntimeError: create table" and tried all above input as well . I suddenly got this error while the exact same script was working well before. Realized that my dataframe had some NaN's in it. When filled the NaN's with empty strings (if field is a text field..)
df = df.fillna('')
the script suddenly worked again. If specific fields are e.g. numbers, fill them with a 0.
As noted by someone else earlier, I find that sometimes the conversion from numpy array to pandas converts some data types to Object 'O' data types. This might occur if you're converting from FeatureClass to numpy ndarray and then the ndarray to a pandas DataFrame.
arr = arcpy.da.FeatureClassToNumPyArray(data, fields_list)
df = pandas.DataFrame(arr)
You could just grab the data types from the original array as a dict and supply them to df.to_records, like this (assuming you haven't changed the fields and data types, otherwise you'll have to create your own dict or edit the one obtained from arr):
recs = df.to_records(index=False, column_dtypes=dict(arr.dtype.descr))
Then converting back to database table or featureclass should work. Here's example converting to database table.
arcpy.da.NumPyArrayToTable(recs, out_table)