Arrays
The case against <null>
The scenario... |
---|
Some data. Simple, but a combination of singlepart, multipart a couple of holes and tabular data with nulls. |
Time for the SearchCursor Easy couple of steps, when you know what you want. We will just blow up the polygons to points and get all the attributes in the table.
Any questions? Pretty straightforward. But cursors have some interesting properties.
Wow! Almost looks numpy array-like. There is even an _as_narray ... I wonder.
|
Well that didn't go well! Translation? looks like an attempt was made to convert some bit of data to an integer and it couldn't be because of those pesky <nulls> ...aka... None in the table!
Yes those things that you have to exclude from calculations, you have to query for, you have to remember it isn't good to see if something is equal to None but whether something IS None.
Hence, Nulls are evil. They are a lazy way out of not recording a value in a table that represents the fact that you have no observation for that field.
You have to get rid of nulls, so if you are too lazy to do it at the beginning during data preparation and Q/A work, then there is a way out. So if you ignore the verbosity in the code below, there are a few tips.
def _make_nulls_(in_flds, int_null=-999):
"""Return null values for a list of fields objects, excluding objectid
and geometry related fields. Throw in whatever else you want.
Parameters
----------
in_flds : list of field objects
arcpy field objects, use arcpy.ListFields to get a list of featureclass
fields.
int_null : integer
A default to use for integer nulls since there is no ``nan`` equivalent
Other options include
>>> np.iinfo(np.int32).min # -2147483648
>>> np.iinfo(np.int16).min # -32768
>>> np.iinfo(np.int8).min # -128
"""
if not isinstance(in_flds, (list, tuple)):
in_flds = [in_flds]
nulls = {'Double': np.nan, 'Single': np.nan, 'Float': np.nan,
'Short': int_null, 'SmallInteger': int_null, 'Long': int_null,
'Integer': int_null, 'String':str(None), 'Text':str(None),
'Date': np.datetime64('NaT')}
bad = ['Guid', 'FID', 'OID', 'OBJECTID', 'Geometry', 'Shape',
'Shape_Length', 'Shape_Area', 'Raster', 'Blob']
good = [f for f in in_flds if f.editable and f.type not in bad]
fld_dict = {f.name: f.type for f in good}
fld_names = list(fld_dict.keys())
null_dict = {f: nulls[fld_dict[f]] for f in fld_names}
return null_dict, fld_names
So what happens when you try to use the shortcut _as_narray?
Row values in a list format... |
---|
|
Notice the integer columns have nulls in them as do the other columns. Columns should have one data type. We get too complacent because of spreadsheets.
To the rescue...
b = arcpy.da.FeatureClassToNumPyArray(
in_fc, "*", "", spatial_reference=SR, explode_to_points=False,
skip_nulls=False, null_value=null_dict)
b
array([(1, [ 300015. , 5000005. ], 1, 1, 4, 1., 100., 'A 10 chars', 40. , 100., '2019-03-28T00:00:00.000000', '2019/03/28 00:00:00'),
(2, [ 300005. , 5000015. ], 2, -999, -999, nan, nan, 'None', 64. , 64., '2019-03-28T00:00:00.000000', 'None'),
(3, [ 300010.495, 5000010.593], 3, 3, 6, 3., 300., 'C not null', 99.416, 182., '2019-03-30T00:00:00.000000', '2019/03/30 00:00:00')],
dtype=[('OBJECTID', '<i4'), ('Shape', '<f8', (2,)), ('Id', '<i4'), ('Long_1', '<i4'),
('Short_1', '<i4'), ('Float_1', '<f4'), ('Double_1', '<f8'), ('Text_1', '<U10'),
('Shape_Length', '<f8'), ('Shape_Area', '<f8'), ('Date_time', '<M8[us]'), ('DT_str', '<U20')])
Do the scrolly thing and you will notice that the null_dict that I created previously now converts integer <null> values to -999, Floating point values are tru NaN's and text is 'None'.
Lesson... don't use <null>, define a true null value that represents what is meant by it. There can be more than one type of value to represent conditions like, no data collected, versus no data available, versus no observation possible... don't lump them into one category at the beginning. You can aggregate AFTER data collection, you shouldn't do it before
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.