Hey all,
I need to drop duplicates on geometries within a spatially enabled dataframe. In order to do this, I first need to convert the SHAPE column to a new column using the WKT value:
sdf['SHAPE_WKT'] = sdf['SHAPE'].apply(lambda geom: geom.WKT if geom else None)
However, this takes a long time to run. Is there anyway to limit the size of my sdf but only running this on the geometries that intersect with other geometries? I know you can run intersections between different feature layers, but what about within the SHAPE column of the current working sdf?
For anyone curious, I can't drop duplicates on the SHAPE column directly because of the following error that occurs:
sdf.drop_duplicates(subset='SHAPE', keep='first')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
In [46]:
Line 1: sdf.drop_duplicates(subset='SHAPE', keep='first')
File C:\Users\Eric\AppData\Local\ESRI\conda\envs\assemblage_env\Lib\site-packages\pandas\util\_decorators.py, in wrapper:
Line 311: return func(*args, **kwargs)
File C:\Users\Eric\AppData\Local\ESRI\conda\envs\assemblage_env\Lib\site-packages\pandas\core\frame.py, in drop_duplicates:
Line 6125: duplicated = self.duplicated(subset, keep=keep)
File C:\Users\Eric\AppData\Local\ESRI\conda\envs\assemblage_env\Lib\site-packages\pandas\core\frame.py, in duplicated:
Line 6262: labels, shape = map(list, zip(*map(f, vals)))
File C:\Users\Eric\AppData\Local\ESRI\conda\envs\assemblage_env\Lib\site-packages\pandas\core\frame.py, in f:
Line 6235: labels, shape = algorithms.factorize(vals, size_hint=len(self))
File C:\Users\Eric\AppData\Local\ESRI\conda\envs\assemblage_env\Lib\site-packages\pandas\core\algorithms.py, in factorize:
Line 749: codes, uniques = values.factorize(na_sentinel=na_sentinel)
File C:\Users\Eric\AppData\Local\ESRI\conda\envs\assemblage_env\Lib\site-packages\pandas\core\arrays\base.py, in factorize:
Line 1028: uniques_ea = self._from_factorized(uniques, self)
TypeError: _from_factorized() takes 2 positional arguments but 3 were given
---------------------------------------------------------------------------
I believe you can identify which records intersect one another with a spatial join. In my small test case, this works and returns the expected records:
from arcgis import GIS
from arcgis.features import FeatureLayer
gis = GIS("https://machine.domain.com/portal", "user", "pass")
fl_url = "https://src.domain.com/server/rest/services/test/FeatureServer/0"
fl = FeatureLayer(fl_url, gis)
df = fl.query(as_df=True)
joined = df.spatial.join(
df.copy(),
how="left",
op="intersects",
left_tag="left",
right_tag="right"
)
intersecting_records = joined.loc[joined.OBJECTID_left != joined.OBJECTID_right]
Hope this helps! This was a really good question that I'm sure many will get value from.