How can I pass along null (None) values through NumPyArrayToTable?

14122
10
Jump to solution
03-30-2018 08:51 AM
BlakeTerhune
MVP Frequent Contributor

I am querying data from another Oracle database, doing some work on the data, geocoding it, then appending it to a feature class in our enterprise geodatabase. I use ArcSDESQLExecute(), then NumPyArrayToTable() to pass into GeocodeAddresses_geocoding(). However, during the NumPyArrayToTable() process, null values always come through as a text string 'None'. I tried changing None values to numpy.NaN but those just come through as a string 'nan'. My datatypes for my numpy array fields are all string. How do you deal with null values with NumPyArrayToTable()?

0 Kudos
1 Solution

Accepted Solutions
DanPatterson_Retired
MVP Emeritus

You can't... since the concept of <null> in the table is purely visual in a sense.

What I do when I have to bring stuff back into Pro or arcmap, then the first thing is to query for "None" the string... and do a field calculation to set them to <null>.

None is python is object, of which there is only one type and it has no real properties.

None.__bool__()
False

None.__class__
<class 'NoneType'>

None.__doc__  
# ---- there is a blank line beneath‍‍‍‍‍‍‍

None.__repr__()
'None'

None.__str__()
'None'

If you are unsure about None, you will notice that the array that is returned is an Object array and not an array of strings.

That is because None is NOT a subclass of string or anything else!

a = np.array([
    ("one", "first"),
    ("two", None),
    ("three", "third"),
    (None, "fourth")
])

a.dtype
dtype('O')

a
array([['one', 'first'],
       ['two', None],
       ['three', 'third'],
       [None, 'fourth']], dtype=object)

# ---- you will notice that None is not in quotes!!!

Just remember... you have to keep track of what you are doing.  I personally hate the <null> thing since it removes some of the link between the user and their data.  <null>  should really be replaced with

< ..... hey you !!!! .... > 

perhaps people will then not take None-ness for granted

View solution in original post

10 Replies
DanPatterson_Retired
MVP Emeritus

None is correct for strings... there is only one None.

NaN is correct for floating point numbers, integers have no concept of null so they are either assigned depending on the int type (see below)

NaT is for null times

np.iinfo(np.int8)
iinfo(min=-128, max=127, dtype=int8)

np.iinfo(np.int16)
iinfo(min=-32768, max=32767, dtype=int16)

np.iinfo(np.int32)
iinfo(min=-2147483648, max=2147483647, dtype=int32)

np.iinfo(np.int64)
Out[12]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)
BlakeTerhune
MVP Frequent Contributor

Sorry Dan, I don't quite follow. So if I have a structured NumPy array like this:

[
    ("one", "first"),
    ("two", None),
    ("three", "third"),
    (None, "fourth")
]

and I specify the structured NumPy array field data types all as '|S25', how can I ensure that the ArcGIS table that is created will have null values instead of a string 'None'?

0 Kudos
DanPatterson_Retired
MVP Emeritus

You can't... since the concept of <null> in the table is purely visual in a sense.

What I do when I have to bring stuff back into Pro or arcmap, then the first thing is to query for "None" the string... and do a field calculation to set them to <null>.

None is python is object, of which there is only one type and it has no real properties.

None.__bool__()
False

None.__class__
<class 'NoneType'>

None.__doc__  
# ---- there is a blank line beneath‍‍‍‍‍‍‍

None.__repr__()
'None'

None.__str__()
'None'

If you are unsure about None, you will notice that the array that is returned is an Object array and not an array of strings.

That is because None is NOT a subclass of string or anything else!

a = np.array([
    ("one", "first"),
    ("two", None),
    ("three", "third"),
    (None, "fourth")
])

a.dtype
dtype('O')

a
array([['one', 'first'],
       ['two', None],
       ['three', 'third'],
       [None, 'fourth']], dtype=object)

# ---- you will notice that None is not in quotes!!!

Just remember... you have to keep track of what you are doing.  I personally hate the <null> thing since it removes some of the link between the user and their data.  <null>  should really be replaced with

< ..... hey you !!!! .... > 

perhaps people will then not take None-ness for granted

DanPatterson_Retired
MVP Emeritus

Almost forgot the discussion about 'none'ness

https://community.esri.com/blogs/dan_patterson/2016/11/03/before-i-forget-18-those-pesky-null-things

and there is one on null geometry... but that can be left for later

0 Kudos
BlakeTerhune
MVP Frequent Contributor

Thanks, I also went looking at your other NumPy snippets #6 blog post.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Ahhhh yes... which reminds me, I should do something with np.NaT for those that work with time

GIS_Spellblade
Frequent Contributor

Hi Dan,

Speaking of time, I'm trying to deal with NaT after consuming a google sheet into a pandas dataframe. Is this something that has to be handled after the table has been created? Or is there something that I can do before I pull it in?

I'm trying to run something like this:

import pandas
import numpy as np

google_sheet = "https://docs.google.com/spreadsheets/d/GOOGLEUNIQUECODE/export?format=csv"
#this is the standard way of making a Google Sheet a link-downloadable CSV after the unique key type: /export?format=csv
df = pandas.read_csv(google_sheet)
#read_csv can ingest a csv from a web link
#the following two lines are trying to correct a mixed-type column from the google sheet, because people like "TBD"
df['Project Start Date'] = pandas.to_datetime(df['Project Start Date'], errors='coerce')
df['Estimated Project Completion Date'] = pandas.to_datetime(df['Estimated Project Completion Date'], errors='coerce')

for i, row in enumerate(df.itertuples(),1):
print(row[1])
x = np.array(np.rec.fromrecords(df.values))
names = df.dtypes.index.tolist()
#names has to be names, I’m not sure why but it broke when I tried to rename it
x.dtype.names = tuple(names)
arcpy.da.NumPyArrayToTable(x, r'%scratchGDB%\testTable')

0 Kudos
DanPatterson_Retired
MVP Emeritus

Mixing text in with datetimes in a spreadsheet is obviously an issue, so I presume that is what ... errors='coerce' is trying to do.  

What does it produce? and if it was formatted as string first, would converting to datetime be easier after?

0 Kudos
GIS_Spellblade
Frequent Contributor

Yes coerce forces non-dates to NaT, the mixed-type column comes from a project management system managed by non-dates folks.

I did some more testing and even after making all the dates into dummy dates (ex 1/1/1950) I still ran into a table creation error when using the to_date function within pandas.

I think that I'll bring it in as a string and run more processes within arcgis, but I would have preferred to bring it straight in since none of the other fields need post-processing.

0 Kudos