spatiallly Enabled Dataframe append

3818
16
Jump to solution
01-23-2019 05:40 AM
OisinSlevin
New Contributor III

I've created a blank spatially Enabled Dataframe  with columns, 

I'm attempting to append an a row to the Dataframe, which I'm doing successfully, 

but as I append a row it overwrites the SHAPE column of all previous rows.

sdf.append(some_row, ingore_index=True)

0 Kudos
16 Replies
by Anonymous User
Not applicable

Hi oisins 

I'm seeing the same behavior you document. I'll log a bug into the system and keep you posted about progress and any workaround if I dig one up.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

John Yaist‌, if you can reproduce the issue, why does this code work on my machine?

>>> import pandas as pd
>>> from arcgis.features import FeatureSet
>>> 
>>> esri_json = { 
...     "geometryType":"esriGeometryPolygon",
...     "spatialReference":{"wkid":102100, "latestWkid":3857},
...     "fields":[
...         {"name":"Field1", "type":"esriFieldTypeString"}
...     ],
...     "features":[
...         {
...             "attributes":{"Field1":"Hello"},
...             "geometry":{"rings":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}
...         },{
...             "attributes":{"Field1":"World"},
...             "geometry":{"rings":[[[11,11],[11,21],[21,21],[21,11],[11,11]]]}
...         }
...     ]
... }
>>> 
>>> sdf = FeatureSet.from_dict(esri_json).sdf
>>> print(sdf.to_string())
  Field1  OBJECTID                                              SHAPE
0  Hello         1  {"rings": [[[0, 0], [0, 10], [10, 10], [10, 0]...
1  World         2  {"rings": [[[11, 11], [11, 21], [21, 21], [21,...
>>> 
>>> rec = sdf.loc[0].copy(deep=True)
>>> rec.OBJECTID = None
>>> rec.Field1 = "Bye"
>>> rec.SHAPE = {"rings":[[[0,11],[5,11],[5,16],[0,16],[0,11]]]}
>>> 
>>> sdf = sdf.append(rec, ignore_index=True)
>>> print(sdf.to_string())
  Field1 OBJECTID                                              SHAPE
0  Hello        1  {'rings': [[[0, 0], [0, 10], [10, 10], [10, 0]...
1  World        2  {'rings': [[[11, 11], [11, 21], [21, 21], [21,...
2    Bye     None  {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
>>> 
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

UPDATE:  If I change line #30 to

rec.SHAPE["rings"] = [[[0,11],[5,11],[5,16],[0,16],[0,11]]]‍‍‍‍‍

the outcome becomes:

>>> print(sdf.to_string())
  Field1 OBJECTID                                              SHAPE
0  Hello        1  {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
1  World        2  {'rings': [[[11, 11], [11, 21], [21, 21], [21,...
2    Bye     None  {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
>>> ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

 Oisin Slevin‌, regarding:

but as I append a row it overwrites the SHAPE column of all previous rows.

All previous rows or just the previous row, or maybe the first row?

In terms of using pandas.DataFrame.copy(deep=True), the documentation notes are key to understanding what is happening:  pandas.DataFrame.copy — pandas 0.23.4 documentation :

Notes

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).

In short, Pandas deepcopy isn't quite a deep copy.

UPDATE 2:  It appears deepcopy used to behave differently:  pd.DataFrame.__deepcopy__ is broken · Issue #17406 · pandas-dev/pandas · GitHub 

UPDATE 3:  I think a different approach is warranted overall.  That said, I think the most straightforward workaround is to use copy.deepcopy on a dictionary of the template record since pandas.DataFrame.append — pandas 0.23.4 documentation works with "dict-like" objects.  Using the following

>>> rec = copy.deepcopy(sdf.loc[0].to_dict())
>>> rec["OBJECTID"] = None
>>> rec["Field1"] = "Bye"
>>> rec["SHAPE"]["rings"] = [[[0,11],[5,11],[5,16],[0,16],[0,11]]]

results in:

  Field1 OBJECTID                                              SHAPE
0  Hello        1  {'rings': [[[0, 0], [0, 10], [10, 10], [10, 0]...
1  World        2  {'rings': [[[11, 11], [11, 21], [21, 21], [21,...
2    Bye     None  {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
simoxu
by MVP Regular Contributor
MVP Regular Contributor

Thank you‌ for the code, and you are spot on.

It certainly looks like a deepcopy quirk for pandas.Series.copy (Joshua you pointed to the pandas.Dataframe.copy document, but a row is actually a pandas.Series. Nevertheless, copy(deep=true) behaves the same way in both Series and DataFrame. ), It is important to know, so I am sort of repeating it here:

pandas.Series.copy — pandas 0.24.0 documentation 

When deep=True, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data 

As to the question you asked Oisin SlevinAll previous rows or just the previous row, or maybe the first row?

My answer is, it would be the first row and and all the appended rows. Why?  because for all these records, the following key (a variable) is pointing to the same memory address.

record_to_append.SHAPE['rings']

In you experiment, the second row will not be affected from the other rows, as its SHAPE['rings'] is pointing to a different address. Oisin does not have a second initial row, so for him, the SHAPE['paths'] for all the records will be the same.

I think you also provided the solution in your code:

rec.SHAPE = {"rings":[[...]]}

copy.deepcopy() might work as another solution as well.

0 Kudos
by Anonymous User
Not applicable

Hey Oisin Slevin and Joshua Bixby 

It looks like the reason I was able to reproduce this behavior was because of this line:

record_to_append.SHAPE['paths']=[new_line]

The following code reproduced the behavior:

line_itm = gis.content.search('empty_line-lyr')[0]
line_lyr = line_itm.layers[0]
line_df_empty = line_lyr.query(where='1=1', out_sr=4326, as_df=True)

line_df_with_data = line_data = gis.content.search('line_lyr_with_data')[0].layers[0].query(where='1=1', as_df=True)

record_template = line_lyr_with_data.loc[0].copy(deep=True)

new_line = [[13.393477, 52.556251], [13.427148, 52.552371], [13.454957, 52.541487],
                   [13.448451, 52.537313], [13.441882, 52.531468]]
record_template.SHAPE['paths'] = [new_line]
record_template.purpose = 'tourism'
record_template.roadway = 'Major Thoroughfare'

new_df = line_df_empty.append(record_template, ignore_index=True)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
print(new_df.to_string())
   OBJECTID       roadway       purpose     Shape__Length     SHAPE                                                                                
0         1  Major Thoroughfare  tourism       0.030771  {'paths': [[[13.393477, 52.556251], [13.427148, 52.552371], [13.454957, 52.541487], [13.448451, ...‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

second_line = [[13.430386, 52.510264], [13.434763, 52.508539], [13.440770, 52.504777], [13.446519, 52.502895]]
record_template.SHAPE['paths'] = [second_line]
record_template.purpose = 'development'
record_template.roadway = 'Business Access'

new_df = new_df.append(record_template, ignore_index=True)
print(new_df.to_string())
    OBJECTID       roadway            purpose     Shape__Length        SHAPE
0      1      Major Thoroughfare      tourism       0.030771     {'paths': [[[13.430386, 52.510264], [13.434763, 52.508539], [13.44077, 52.504777], [13.446519, 5...
0      1     Business Access        development     0.030771     {'paths': [[[13.430386, 52.510264], [13.434763, 52.508539], [13.44077, 52.504777], [13.446519, 5...‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The geometry in the record_template.SHAPE['paths'] list overwrites the list in the SHAPE column. The append operation looks like it performs a shallow copy of the record_template to the SHAPE column.

When I changed line 21 above to:

record_template.SHAPE = {'paths' : [second_line]}

The append worked successfully:

  OBJECTID   roadway          purpose    Shape__Length     SHAPE
0  1      Major Thoroughfare  tourism     0.030771       {'paths':[[[13.393477, 52.556251], [13.427148, 52.552371], [13.454957, 52.541487], [13.448451, ...
0  1     Business Access     development  0.030771       {'paths': [[[13.430386, 52.510264], [13.434763, 52.508539], [13.44077, 52.504777], [13.446519, 5...‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Oisin Slevin  - if you change record_to_append.SHAPE['paths']=[new_line] to record_to_append.SHAPE = {'paths' : [new_line]} does that work? Credit Joshua Bixby if so...his code referencing the SHAPE as a dictionary instead of referencing the `paths` value was the key.

OisinSlevin
New Contributor III

Hi, Sorry for the late reply, 
I've Attempted to use your solution which appears to work while modifying the records,

the original issue of overwriting no longer occurs

  record_to_append.SHAPE = {'paths' : [new_line]}

record_to_append.SHAPE

{'paths': [[[322058.68479999993, 226573.08400000073], [322061.67789999954, 226573.96309999935], [322063.7945999997, 226569.7296999991], [322098.81620000023, 226533.6378000006], [322066.5378999999, 226500.82509999909], [322049.0729, 226482.97609999962]]], 'spatialReference': {'wkid': 29903, 'latestWkid': 29903}}

but attempting to use record_to_append.SHAPE then throws errors that geometry type cant be dict

record_to_append.SHAPE.buffer(0.5)
AttributeError: 'dict' object has no attribute 'buffer'

The same issue occurs when attempting to export the SDF to a feature class 

Is there some way of creating a "geometry object" in arcgis api  that can then be used to set 

record_to_append.SHAPE = "geometry object"

 

0 Kudos
JoshuaBixby
MVP Esteemed Contributor
0 Kudos
OisinSlevin
New Contributor III

Answer For this is when setting a geometry when appending to the SDF:
record_to_append.SHAPE=geometry.Geometry({'paths': [new_line], 'spatialReference': {'wkid': 29903, 'latestWkid': 29903}}

fixed this issue 

0 Kudos