Select to view content in your preferred language

Duplicate Records Returned By Python API since version 2.3

1400
10
10-15-2024 06:30 AM
Justin_Greco
Frequent Contributor

I have a premium support case open for this issue, but wanted to see if anyone else in the developer community has experienced duplicate features being returned when querying feature layers.  I have only been able to replicate this issue with services published to ArcGIS Enterprise from an Oracle database, not able to reproduce with hosted feature layers or services published from file geodatabases or SQL Server.  Also this issue is not reproducible on version prior to version 2.3.

This issue is when a layer this is coming from Oracle is queried using layer.query(), the result is returning duplicate features, with duplicate OBJECTIDs.  It will return the correct number of records, but after dropping the duplicates, there ends up being less features than their should be, so it is also not returning every record. I have confirmed that the duplicates are there if you return the results as a feature set or as a dataframe.

Instead of returning all records in one query, I decided to try using the API with return_all_records set to False and page through the requests result_record_count and result_offset to get the data in batches of the max record count the layer.  Doing so does not result in duplicate records and also there are no missing records. This indicated that there is not an issue with the dataset itself, but with the Python API.

A major issue with their being duplicate OBJECTIDs, is that when you convert a data frame to a feature set using df.spatial.to_featureset() and there are duplicate OBJECTIDs, the OBJECTIDs for the resulting feature set are reset to start with 1 (might be another bug). This has caused some records to be updated incorrectly, or causing the update to fail since an invalid OBJECTID is sent. 

For now I have asked staff to page through the requests, rather than get them in a single query.  However, this does require more lines of code.  Also if they do just use a single query to drop duplicates before converting to a feature set, so the OBJECTID is maintained.

The below sample is querying for all records and just returning the OBJECTID with no geometry.  The value_counts of the OBJECTID field of the dataframe shows that there are duplicate OBJECTIDs.  

fs = layer.query(out_fields=['OBJECTID'], return_geometry=False)
fs.sdf.OBJECTID.value_counts()
#OBJECTID
#528193    2
#528306    2
#528313    2
#528312    2
#528311    2
#         ..
#525523    1
#525522    1
#525521    1
#525520    1
#528097    1
#Name: count, Length: 2198, dtype: Int64

 

10 Replies
Clubdebambos
MVP Regular Contributor

Sounds like gremlins in the machine 😅

Is upgrading to version 2.4 in a non-production environment and testing if the issue still persists an option? 

~ learn.finaldraftmapping.com
0 Kudos
Justin_Greco
Frequent Contributor

I did install 2.4 on my local environment and it is still an issue.  I've also tested on 2.2.0.1 and 1.8.2, where its not an issue.

 

KyleGallagher16
Regular Contributor

I am getting this same result using 2.3 and a feature service published from SQL Server. Feature count output is correct, but two duplicate OBJECTID/GLOBALIDs exist (5110 & 5111 below) and two features are not being output.

I can see the two missing features back in the Pro attribute table but not when querying the feature service with API code.

Any help on this is greatly appreciated!

KyleGallagher16_0-1730926750229.png

 

0 Kudos
Justin_Greco
Frequent Contributor

So I had assumed it was just Oracle, since I couldn't replicate with services pointed to SQL Server.  The only workaround I have found is to set return_all_records to False on the query request and page through all the data manually 1000 records (or whatever the max record count is set to) at a time and storing the features returned into a list after each request.  Then creating a feature set from all the features in a list. 

I have submitted this to premium support and it was marked as a bug, but is still under review.

Justin_Greco
Frequent Contributor

And a warning, from this I learned that if you convert a dataframe to a featureset with duplicate OBJECTIDs, the OBJECTIDs are all reset to start with 1, which could really mess your data up big time if you send the featureset as an update.  Luckily I was dropping duplicates in my script before converting to a feature set.

0 Kudos
KyleGallagher16
Regular Contributor

I just had a colleague run the exact same code against the exact same feature services using Pro 3.1 and API 2.1 and I got the desired results shown below.

I have two feature services with the exact same data, one is edited and the other only receives the updates based on GLOBALID, lastupdate, etc. For the insert process, I generate a list and count of GIDs on the edit side (11341 below), and a list and count of GIDs on the non-edit side (11339). If a GID exists on the edit side but not on the non-edit side,  the GID gets inserted using edit_features(adds). I would expect the two GIDs shown below to be inserted to the non-edit side.

Output using Pro 3.1 and API 2.1:

KyleGallagher16_0-1730993645196.png

However, after recently updating to 3.3 and API 2.3, when executing the script it was not outputting these two GIDs to be inserted. Instead the list included two specific duplicate GIDs and left the two GIDs that should be inserted ^^^ out of the list.

Output using Pro 3.3 and API 2.3:

KyleGallagher16_1-1730994031273.png

KyleGallagher16_2-1730994147849.png

KyleGallagher16_3-1730994484946.png

We also have this script task scheduled on a machine with Pro 3.2.3 and API 2.2.0.1 and are encountering some odd behavior with massive amounts of inserts and deletes that shouldn't be occurring.

0 Kudos
NathanWarmerdam2
New Contributor

Having the exact same issue.  Will be logging a case.

0 Kudos
benedictmueller
New Contributor

Hey community.

Are there any updates on that one?

We're running into the same issues right now in a pretty big project. Could anyone maybe provide a workaround for it? We also tested GeoAccessor.from_layer() and query(). Both return double entries and other entries missing. Every edited, or added feature is double and seems to be replacing random other features, after turning it into a df.

Thanks for your help.

Justin_Greco
Frequent Contributor

It was logged as BUG-000171499 and fixed at 2.4.1.  I haven't upgraded to 2.4.1 to test yet.

0 Kudos