Spatial Dataframe to FC, Fill np.nan values will ArcGIS 'null' values, not 0

794
4
05-19-2019 05:21 PM
NicholasRomano1
New Contributor II

Using the operation "to_featureclass" from a spatial dataframe, np.nan values in int64 and float64 columns are converted to 0. It would save me a lot of trouble if they went in as ArcGIS NULL values instead of zero. Is it possible to do this, or could you add support in a later release?

0 Kudos
4 Replies
DanPatterson_Retired
MVP Esteemed Contributor

Integers have no 'null' equivalent like floats to with 'NaN'.  It would be wise to do your conversion ahead of time by doing the conversion in the sdf since '0' can be a valid value, as you well know. 

some suggestions for several integer types and floats.  mins or maxs work well and there are unsigned equivalents for integers if needed

np.iinfo(np.int32).min
-2147483648

np.iinfo(np.int16).min
-32768

np.iinfo(np.int8).min
-128

np.finfo(np.float64).min
 -1.7976931348623157e+308

np.finfo(np.float64).max
1.7976931348623157e+308
0 Kudos
NicholasRomano1
New Contributor II

The point I'm making is that I need to save a SpatialDataFrame to a feature class, and in that feature class, for numeric columns that contain nan values, to go into the feature class as <Null>.

Currently the API takes the value (pd.isnan(x) == True), and sets it to 0 instead of <Null>

This an issue for many business workflows which require null values. 

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Nicholas... I understand the point, but <Null> is an artifact of display in a featureclass table for floating point numbers and doesn't even exist for integers since the is no integer equivalent to NaN or NaT (for time). 

The point I am making is that it isn't a 'bug' but a deficiency (?) and extra step that esri or you will have to take until this behaviour is altered.

I would definitely put it into ArcGIS Ideas‌ or directly better still

GitHub - Esri/arcgis-python-api: Documentation and samples for ArcGIS API for Python

I have not seen this issue raised on GitHub, perhaps because people are oblivious to the issue/the don't allow null values, but classify things into no-value, not-measured, not-available, -999, -998 (in other words, there is no such thing as a null, just different ways of categorizing a lack of a value)

This is the same behaviour that shapefiles use to represent 'null-ness' being substituted with 0 or "" for strings.

Perhaps they are trying to keep things on an equal footing since shapefiles (through pyshape I think) are supported if arcpy isn't available.

If you know where the code lines are that does this, then you can replace then in the output with one or more of my suggestions, then change them to None in the field calculator until a proper workaround is provided.

On a technical note, the np.nan values are masked in numpy and pandas and hence sdf using 'nan functions' or masking to remove values from calculations

ie

a = [1., 2., np.nan, 4]
np.nansum(a)
7.0‍‍‍‍‍‍‍‍‍‍

# sums don't matter, but replacing nan with 0 has issues
a = [1., 2., np.nan, 4]
b = [1., 2., 0., 4.]

np.nanmean(a)
2.3333333333333335

np.mean(b)
1.75
‍‍‍‍‍‍‍
0 Kudos
NicholasRomano1
New Contributor II

Dan,

Understood. I guess its not a bug, but I'd agree to label it as a deficiency.

I brought it up because it has come up in a some ETL workflows recently. This specific case requires me to use the update cursor after the call of to_featureclass just to set the values of Null cells. With UpdateCursor, Arcpy takes Python's None object and uses that to derive their <Null> value. 

it seems arcpy knows to handle Python's None into <Null>. I think it'd be relevant if the to_featureclass method took the same thinking and applied it to the np.nan object, or even None for that matter.

0 Kudos