Spatial DataFrame... deprecated or just shuffled a bit?

2668
8
10-14-2018 06:44 PM
DanPatterson_Retired
MVP Emeritus

Line 221 in

C:\ArcGISPro\bin\Python\envs\arcgispro-py3\Lib\site-packages\arcgis\features\_data\geodataset\geodataframe.py

#warnings.warn("SpatialDataFrame has been deprecated.  Please switch to the GeoAccessor/GeoSeriesAccessor.")

Then it appears that a pandas dataframe and series are now a substitute?  For example, the *.py scripts in 

C:\ArcGISPro\bin\Python\envs\arcgispro-py3\Lib\site-packages\arcgis\features\geo

Since numpy object arrays are supported natively with FeatureClassToNumPyArray and its siblings, what was the reason to wrap some of the basic array functions into Pandas-ish classes rather than provide direct, or even twin, functionality with numpy?  A simple example is using ExtendTable to join.

Any insights would be useful.  

Do you have plans to enable subclassing for anything other than Pandas 'arrays' ?

0 Kudos
8 Replies
DanPatterson_Retired
MVP Emeritus

Anything?  It appears this place isn't monitored much

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I have never had much success using this place to meaningfully engage the Esri staff involved with this API.  I don't know if Jay Theodore has ever commented on GeoNet. It has been months since Rohit Singh‌ was around. Atma Mani‌ has one or two comments in the past month.  John Yaist‌ commented a couple days ago, but that was the first time in a month.  The take away for me, don't expect an answer from any Esri staff here for the type of question you are asking.  Unfortunate....

DanPatterson_Retired
MVP Emeritus

Joshua I posted on the GitHub site as well... 

The current people you have identified are the owners of the place.  Michelle Mathias‌ is currently in the process to see whether rarely monitored sites need to be merged or ownership transferred.

simoxu
by MVP Regular Contributor
MVP Regular Contributor

According to ESRI Document, ESRI has stopped development on SpatialDataFrame:

"New at version 1.5, the Spatially Enabled DataFrame is an evolution of the SpatialDataFrame object that you may be familiar with. While the SDF object is still avialable for use, the team has stopped active development of it and is promoting the use of this new Spatially Enabled DataFrame pattern. The SEDF provides you better memory management, ability to handle larger datasets and is the pattern that Pandas advocates as the path forward."

From <https://developers.arcgis.com/python/guide/introduction-to-the-spatially-enabled-dataframe/>

Extending pandas to add spatial functionality seems a good move, this will make the data analysis experience seamless for pandas users I hope.

0 Kudos
DanPatterson_Retired
MVP Emeritus

Simo... importing pandas  seems like an awful lot of trouble to go to to get a numpy backed geometry array

(ie NumPyBackedExtensionArrayMixin from _array.py in C:\… install_folder...\bin\Python\envs\arcgispro-py3\Lib\site-packages\arcgis\features\geo)

For those that work natively with numpy, splitting the geometry and the attributes and working with both as arrays (whether simple for points or object arrays for poly* features) separately isn't a problem.  Even reconstructing featureclasses from arrays isn't a problem (ie cursors) after.

Structured/recarrays operations handle most attribute based operations without the need to wrap it into a Panda's dataframe.  It seems that native numpy arrays, either for featureclasses or rasters operations, has been skipped altogether.  People that use the SciPy stack for their work might want a more streamlined approach to working with jupyter notebooks (or lab).  I will grant that Pandas is 'friendly' but it sure isn't needed for most operations.  Maybe a 'lite' version is needed

0 Kudos
AndrewChapkowski
Esri Regular Contributor

Dan,

I responded to your github issue: documentation on moving from spatialdataframe · Issue #315 · Esri/arcgis-python-api · GitHub 

Let me know if you have any other questions.

Thanks

Andrew

0 Kudos
RohitSingh2
Esri Contributor

Pandas is the de-facto standard for working with tabular data using Python. This is the primary reason for extending Pandas to work with spatial data. Users who are already familiar with Pandas can use pandorable code to work with feature data. Pandas is also a common denominator between ArcGIS Feature Layers / spatial data (From shapefiles and file geodatabases) and many Python libraries like scikit-learn. It provides a higher level API than numpy and is the preferred way to work with non-numeric data, whereas numpy might be considered better for numeric data.

Finally, it's a matter of taste - I like Pandas but Dan apparently has a preference for numpy.  I don't quite understand how "import pandas as pd" is more trouble than "import numpy as np". Both are available in the standard anaconda environment. 

DanPatterson_Retired
MVP Emeritus

A point seems to be missed.

FeatureClassToNumPyArray, TableToNumPyArray, RasterToNumPyArray and back no pandas, I presume. 

*.npy/npz files can't be read directly (pickle variants and csv can. 

When you work in the mixed world of vector and raster data analysis I guess geometry and the spatial is more important than the attribute, so there is little use for Pandas in that environment. 

Also,...  import pandas as pd  ..does import numpy of course right away and check for compatibility with numpy (from pandas.compat.numpy import * in the ___init__.py of pandas). 

As long as the move isn't going to break working with numpy/arcpy geometry and rasters, I have no problem and will just subclass if the need arises.

It seems you can't get away without having pandas imported anyway if you want to work with the arcgis package

0 Kudos