How to find which features in an SEDF cause .to_featureset() to fail?

A_Schwab · ‎03-27-2024

I have a spatially enabled data frame with 20,000 features.

When I run `sedf.spatial.to_featureset()`, it fails with this error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\lib\site-packages\arcgis\features\geo\_accessor.py", line 3680, in to_featureset
d = self.__feature_set__
File "C:\Users\User\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\lib\site-packages\arcgis\features\geo\_accessor.py", line 3339, in __feature_set__
geom_type = _geom_types[
KeyError: <class 'pandas.core.series.Series'>

When run the same command on random samples from the dataset, it works fine. E.g.

>>> sedf.sample(200).spatial.to_featureset()
<FeatureSet> 200 features

Is there any way to find out which feature(s) are causing the error with to_featureset(), to narrow down what the issue might be?

A_Schwab · ‎03-28-2024

After some testing, the issue is in the first 30 features. to_featureset() works fine for all the rows after row 30:

>> sedf[30:20000].spatial.to_featureset()
<FeatureSet> 19970 features

Running to_featureset() on the first 30 rows still gives an error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\User\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\lib\site-packages\arcgis\features\geo\_accessor.py", line 3680, in to_featureset
d = self.__feature_set__
File "C:\Users\User\AppData\Local\ESRI\conda\envs\arcgispro-py3-clone\lib\site-packages\arcgis\features\geo\_accessor.py", line 3339, in __feature_set__
geom_type = _geom_types[
KeyError: <class 'pandas.core.series.Series'>

However, running the same command on subsets of the first 30 rows produces no error:

>>> sedf[0:10].spatial.to_featureset()
<FeatureSet> 10 features
>>> sedf[10:20].spatial.to_featureset()
<FeatureSet> 10 features
>>> sedf[20:30].spatial.to_featureset()
<FeatureSet> 10 features

I've exported the 30 records to CSV and looked through them, and can't see anything obvious that would cause them to trigger an error.

Anyone have any suggestions?

EarlMedina · ‎04-01-2024

Sorry, had to reread what you did and said earlier. I am curious what happens if you split the df into chunks. There are a number of ways to do this, but one fairly simple way to do this might be:

n = 10
df_list = [df[i:i+n] for i in range(0,len(df),n)] # Break up into groups of ten

for i, df in enumerate(df_list):
    start_idx = i * n
    print(f"Records {start_idx} through {start_idx + n - 1}")
    df.spatial.to_featureset()

I have seen that method fail because of illegal values, but never have I seen it behave in the manner you describe.