I've created a blank spatially Enabled Dataframe with columns,
I'm attempting to append an a row to the Dataframe, which I'm doing successfully,
but as I append a row it overwrites the SHAPE column of all previous rows.
sdf.append(some_row, ingore_index=True)
Solved! Go to Solution.
Hi oisins
I'm seeing the same behavior you document. I'll log a bug into the system and keep you posted about progress and any workaround if I dig one up.
John Yaist, if you can reproduce the issue, why does this code work on my machine?
>>> import pandas as pd
>>> from arcgis.features import FeatureSet
>>>
>>> esri_json = {
... "geometryType":"esriGeometryPolygon",
... "spatialReference":{"wkid":102100, "latestWkid":3857},
... "fields":[
... {"name":"Field1", "type":"esriFieldTypeString"}
... ],
... "features":[
... {
... "attributes":{"Field1":"Hello"},
... "geometry":{"rings":[[[0,0],[0,10],[10,10],[10,0],[0,0]]]}
... },{
... "attributes":{"Field1":"World"},
... "geometry":{"rings":[[[11,11],[11,21],[21,21],[21,11],[11,11]]]}
... }
... ]
... }
>>>
>>> sdf = FeatureSet.from_dict(esri_json).sdf
>>> print(sdf.to_string())
Field1 OBJECTID SHAPE
0 Hello 1 {"rings": [[[0, 0], [0, 10], [10, 10], [10, 0]...
1 World 2 {"rings": [[[11, 11], [11, 21], [21, 21], [21,...
>>>
>>> rec = sdf.loc[0].copy(deep=True)
>>> rec.OBJECTID = None
>>> rec.Field1 = "Bye"
>>> rec.SHAPE = {"rings":[[[0,11],[5,11],[5,16],[0,16],[0,11]]]}
>>>
>>> sdf = sdf.append(rec, ignore_index=True)
>>> print(sdf.to_string())
Field1 OBJECTID SHAPE
0 Hello 1 {'rings': [[[0, 0], [0, 10], [10, 10], [10, 0]...
1 World 2 {'rings': [[[11, 11], [11, 21], [21, 21], [21,...
2 Bye None {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
>>>
UPDATE: If I change line #30 to
rec.SHAPE["rings"] = [[[0,11],[5,11],[5,16],[0,16],[0,11]]]
the outcome becomes:
>>> print(sdf.to_string())
Field1 OBJECTID SHAPE
0 Hello 1 {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
1 World 2 {'rings': [[[11, 11], [11, 21], [21, 21], [21,...
2 Bye None {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
>>>
Oisin Slevin, regarding:
but as I append a row it overwrites the SHAPE column of all previous rows.
All previous rows or just the previous row, or maybe the first row?
In terms of using pandas.DataFrame.copy(deep=True), the documentation notes are key to understanding what is happening: pandas.DataFrame.copy — pandas 0.23.4 documentation :
Notes
When
deep=True
, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data (see examples below).
In short, Pandas deepcopy isn't quite a deep copy.
UPDATE 2: It appears deepcopy used to behave differently: pd.DataFrame.__deepcopy__ is broken · Issue #17406 · pandas-dev/pandas · GitHub
UPDATE 3: I think a different approach is warranted overall. That said, I think the most straightforward workaround is to use copy.deepcopy on a dictionary of the template record since pandas.DataFrame.append — pandas 0.23.4 documentation works with "dict-like" objects. Using the following
>>> rec = copy.deepcopy(sdf.loc[0].to_dict())
>>> rec["OBJECTID"] = None
>>> rec["Field1"] = "Bye"
>>> rec["SHAPE"]["rings"] = [[[0,11],[5,11],[5,16],[0,16],[0,11]]]
results in:
Field1 OBJECTID SHAPE
0 Hello 1 {'rings': [[[0, 0], [0, 10], [10, 10], [10, 0]...
1 World 2 {'rings': [[[11, 11], [11, 21], [21, 21], [21,...
2 Bye None {'rings': [[[0, 11], [5, 11], [5, 16], [0, 16]...
Thank you for the code, and you are spot on.
It certainly looks like a deepcopy quirk for pandas.Series.copy (Joshua you pointed to the pandas.Dataframe.copy document, but a row is actually a pandas.Series. Nevertheless, copy(deep=true) behaves the same way in both Series and DataFrame. ), It is important to know, so I am sort of repeating it here:
pandas.Series.copy — pandas 0.24.0 documentation
When
deep=True
, data is copied but actual Python objects will not be copied recursively, only the reference to the object. This is in contrast to copy.deepcopy in the Standard Library, which recursively copies object data
As to the question you asked Oisin Slevin: All previous rows or just the previous row, or maybe the first row?
My answer is, it would be the first row and and all the appended rows. Why? because for all these records, the following key (a variable) is pointing to the same memory address.
record_to_append.SHAPE['rings']
In you experiment, the second row will not be affected from the other rows, as its SHAPE['rings'] is pointing to a different address. Oisin does not have a second initial row, so for him, the SHAPE['paths'] for all the records will be the same.
I think you also provided the solution in your code:
rec.SHAPE = {"rings":[[...]]}
copy.deepcopy() might work as another solution as well.
Hey Oisin Slevin and Joshua Bixby
It looks like the reason I was able to reproduce this behavior was because of this line:
record_to_append.SHAPE['paths']=[new_line]
The following code reproduced the behavior:
line_itm = gis.content.search('empty_line-lyr')[0]
line_lyr = line_itm.layers[0]
line_df_empty = line_lyr.query(where='1=1', out_sr=4326, as_df=True)
line_df_with_data = line_data = gis.content.search('line_lyr_with_data')[0].layers[0].query(where='1=1', as_df=True)
record_template = line_lyr_with_data.loc[0].copy(deep=True)
new_line = [[13.393477, 52.556251], [13.427148, 52.552371], [13.454957, 52.541487],
[13.448451, 52.537313], [13.441882, 52.531468]]
record_template.SHAPE['paths'] = [new_line]
record_template.purpose = 'tourism'
record_template.roadway = 'Major Thoroughfare'
new_df = line_df_empty.append(record_template, ignore_index=True)
print(new_df.to_string())
OBJECTID roadway purpose Shape__Length SHAPE
0 1 Major Thoroughfare tourism 0.030771 {'paths': [[[13.393477, 52.556251], [13.427148, 52.552371], [13.454957, 52.541487], [13.448451, ...
second_line = [[13.430386, 52.510264], [13.434763, 52.508539], [13.440770, 52.504777], [13.446519, 52.502895]]
record_template.SHAPE['paths'] = [second_line]
record_template.purpose = 'development'
record_template.roadway = 'Business Access'
new_df = new_df.append(record_template, ignore_index=True)
print(new_df.to_string())
OBJECTID roadway purpose Shape__Length SHAPE
0 1 Major Thoroughfare tourism 0.030771 {'paths': [[[13.430386, 52.510264], [13.434763, 52.508539], [13.44077, 52.504777], [13.446519, 5...
0 1 Business Access development 0.030771 {'paths': [[[13.430386, 52.510264], [13.434763, 52.508539], [13.44077, 52.504777], [13.446519, 5...
The geometry in the record_template.SHAPE['paths'] list overwrites the list in the SHAPE column. The append operation looks like it performs a shallow copy of the record_template to the SHAPE column.
When I changed line 21 above to:
record_template.SHAPE = {'paths' : [second_line]}
The append worked successfully:
OBJECTID roadway purpose Shape__Length SHAPE
0 1 Major Thoroughfare tourism 0.030771 {'paths':[[[13.393477, 52.556251], [13.427148, 52.552371], [13.454957, 52.541487], [13.448451, ...
0 1 Business Access development 0.030771 {'paths': [[[13.430386, 52.510264], [13.434763, 52.508539], [13.44077, 52.504777], [13.446519, 5...
Oisin Slevin - if you change record_to_append.SHAPE['paths']=[new_line] to record_to_append.SHAPE = {'paths' : [new_line]} does that work? Credit Joshua Bixby if so...his code referencing the SHAPE as a dictionary instead of referencing the `paths` value was the key.
Hi, Sorry for the late reply,
I've Attempted to use your solution which appears to work while modifying the records,
the original issue of overwriting no longer occurs
record_to_append.SHAPE = {'paths' : [new_line]}
record_to_append.SHAPE
{'paths': [[[322058.68479999993, 226573.08400000073], [322061.67789999954, 226573.96309999935], [322063.7945999997, 226569.7296999991], [322098.81620000023, 226533.6378000006], [322066.5378999999, 226500.82509999909], [322049.0729, 226482.97609999962]]], 'spatialReference': {'wkid': 29903, 'latestWkid': 29903}}
but attempting to use record_to_append.SHAPE then throws errors that geometry type cant be dict
record_to_append.SHAPE.buffer(0.5)
AttributeError: 'dict' object has no attribute 'buffer'
The same issue occurs when attempting to export the SDF to a feature class
Is there some way of creating a "geometry object" in arcgis api that can then be used to set
record_to_append.SHAPE = "geometry object"
Yes, the Geometry class of arcgis.geometry module — arcgis 1.5.3 documentation
Answer For this is when setting a geometry when appending to the SDF:
record_to_append.SHAPE=geometry.Geometry({'paths': [new_line], 'spatialReference': {'wkid': 29903, 'latestWkid': 29903}}
fixed this issue