How can I pass along null (None) values through NumPyArrayToTable?

BlakeTerhune · ‎03-30-2018

I am querying data from another Oracle database, doing some work on the data, geocoding it, then appending it to a feature class in our enterprise geodatabase. I use ArcSDESQLExecute(), then NumPyArrayToTable() to pass into GeocodeAddresses_geocoding(). However, during the NumPyArrayToTable() process, null values always come through as a text string 'None'. I tried changing None values to numpy.NaN but those just come through as a string 'nan'. My datatypes for my numpy array fields are all string. How do you deal with null values with NumPyArrayToTable()?

DanPatterson_Retired · ‎03-30-2018

You can't... since the concept of <null> in the table is purely visual in a sense.

What I do when I have to bring stuff back into Pro or arcmap, then the first thing is to query for "None" the string... and do a field calculation to set them to <null>.

None is python is object, of which there is only one type and it has no real properties.

None.__bool__()
False

None.__class__
<class 'NoneType'>

None.__doc__  
# ---- there is a blank line beneath‍‍‍‍‍‍‍

None.__repr__()
'None'

None.__str__()
'None'‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If you are unsure about None, you will notice that the array that is returned is an Object array and not an array of strings.

That is because None is NOT a subclass of string or anything else!

a = np.array([
    ("one", "first"),
    ("two", None),
    ("three", "third"),
    (None, "fourth")
])

a.dtype
dtype('O')

a
array([['one', 'first'],
       ['two', None],
       ['three', 'third'],
       [None, 'fourth']], dtype=object)

# ---- you will notice that None is not in quotes!!!‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Just remember... you have to keep track of what you are doing. I personally hate the <null> thing since it removes some of the link between the user and their data. <null> should really be replaced with

< ..... hey you !!!! .... >

perhaps people will then not take None-ness for granted

View solution in original post

DanPatterson_Retired · ‎03-30-2018

None is correct for strings... there is only one None.

NaN is correct for floating point numbers, integers have no concept of null so they are either assigned depending on the int type (see below)

NaT is for null times

np.iinfo(np.int8)
iinfo(min=-128, max=127, dtype=int8)

np.iinfo(np.int16)
iinfo(min=-32768, max=32767, dtype=int16)

np.iinfo(np.int32)
iinfo(min=-2147483648, max=2147483647, dtype=int32)

np.iinfo(np.int64)
Out[12]: iinfo(min=-9223372036854775808, max=9223372036854775807, dtype=int64)‍‍‍‍‍‍‍‍‍‍‍

BlakeTerhune · ‎03-30-2018

Sorry Dan, I don't quite follow. So if I have a structured NumPy array like this:

[
    ("one", "first"),
    ("two", None),
    ("three", "third"),
    (None, "fourth")
]‍‍‍‍‍‍

and I specify the structured NumPy array field data types all as '|S25', how can I ensure that the ArcGIS table that is created will have null values instead of a string 'None'?

DanPatterson_Retired · ‎03-30-2018

You can't... since the concept of <null> in the table is purely visual in a sense.

What I do when I have to bring stuff back into Pro or arcmap, then the first thing is to query for "None" the string... and do a field calculation to set them to <null>.

None is python is object, of which there is only one type and it has no real properties.

None.__bool__()
False

None.__class__
<class 'NoneType'>

None.__doc__  
# ---- there is a blank line beneath‍‍‍‍‍‍‍

None.__repr__()
'None'

None.__str__()
'None'‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If you are unsure about None, you will notice that the array that is returned is an Object array and not an array of strings.

That is because None is NOT a subclass of string or anything else!

a = np.array([
    ("one", "first"),
    ("two", None),
    ("three", "third"),
    (None, "fourth")
])

a.dtype
dtype('O')

a
array([['one', 'first'],
       ['two', None],
       ['three', 'third'],
       [None, 'fourth']], dtype=object)

# ---- you will notice that None is not in quotes!!!‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Just remember... you have to keep track of what you are doing. I personally hate the <null> thing since it removes some of the link between the user and their data. <null> should really be replaced with

< ..... hey you !!!! .... >

perhaps people will then not take None-ness for granted

DanPatterson_Retired · ‎03-30-2018

Almost forgot the discussion about 'none'ness

https://community.esri.com/blogs/dan_patterson/2016/11/03/before-i-forget-18-those-pesky-null-things

and there is one on null geometry... but that can be left for later

BlakeTerhune · ‎04-02-2018

Thanks, I also went looking at your other NumPy snippets #6 blog post.

DanPatterson_Retired · ‎04-02-2018

Ahhhh yes... which reminds me, I should do something with np.NaT for those that work with time

GIS_Spellblade · ‎11-13-2018

Hi Dan,

Speaking of time, I'm trying to deal with NaT after consuming a google sheet into a pandas dataframe. Is this something that has to be handled after the table has been created? Or is there something that I can do before I pull it in?

I'm trying to run something like this:

import pandas
import numpy as np

google_sheet = "https://docs.google.com/spreadsheets/d/GOOGLEUNIQUECODE/export?format=csv"
#this is the standard way of making a Google Sheet a link-downloadable CSV after the unique key type: /export?format=csv
df = pandas.read_csv(google_sheet)
#read_csv can ingest a csv from a web link
#the following two lines are trying to correct a mixed-type column from the google sheet, because people like "TBD"
df['Project Start Date'] = pandas.to_datetime(df['Project Start Date'], errors='coerce')
df['Estimated Project Completion Date'] = pandas.to_datetime(df['Estimated Project Completion Date'], errors='coerce')

for i, row in enumerate(df.itertuples(),1):
print(row[1])
x = np.array(np.rec.fromrecords(df.values))
names = df.dtypes.index.tolist()
#names has to be names, I’m not sure why but it broke when I tried to rename it
x.dtype.names = tuple(names)
arcpy.da.NumPyArrayToTable(x, r'%scratchGDB%\testTable')

DanPatterson_Retired · ‎11-13-2018

Mixing text in with datetimes in a spreadsheet is obviously an issue, so I presume that is what ... errors='coerce' is trying to do.

What does it produce? and if it was formatted as string first, would converting to datetime be easier after?

GIS_Spellblade · ‎11-13-2018

Yes coerce forces non-dates to NaT, the mixed-type column comes from a project management system managed by non-dates folks.

I did some more testing and even after making all the dates into dummy dates (ex 1/1/1950) I still ran into a table creation error when using the to_date function within pandas.

I think that I'll bring it in as a string and run more processes within arcgis, but I would have preferred to bring it straight in since none of the other fields need post-processing.