Updating an attribute table of a Feature Layer using Python Notebook and elements from 2 Surveys

SamanthaAPCC · ‎05-03-2024

Let me preface by saying that my experience with python is very limited and I am using elements from 2 codes to try to make this work. I am using AGOL Python Notebook to write to a hosted feature layer using attributes from two Survey123 surveys. Eventually I will need to schedule this code to run daily. The code seems to be working and is printing my final message, but it is not writing to the attribute table of my cyanoponds layer. Am I not using the correct python language? I am very novice at python but this is our only option for the type of updates we need. I am taking data from a survey and based on a unique ID (CCC_GIS_ID) I am looking for the most recent sample as well as the one with the highest cyano_status ranking. It also appears to be skipping CCC_GIS_IDs when there is more than 1 sample at the pond location, which means I'm not incorporating lines 46-57 correctly. I will also need to add a component that changes the status to "No Updated Information" if the most recent sample is older than 30 days, which I haven't quite figured out how to add in yet.

Also, when this code is running daily it will need to overwrite the table with the new information.

I took elements from this article in order to come up with the basis of this code. The rest I wrote or used from a code that was handed down to me for this project back when it used to be run on a local machine. Any help is greatly appreciated!!

from arcgis.gis import GIS
from arcgis.features import FeatureLayer
import pandas as pd
from arcgis.features import FeatureCollection
from datetime import datetime, timedelta

gis = GIS("home")

def update_CyanoData():
    
    one_day = datetime.today() - timedelta(days=1)
    string_day = one_day.strftime('%Y-%m-%d')
    where_query = f"fluorometry_date >= DATE '{string_day}'"
    where_querymf = f"microscopy_date >= DATE '{string_day}'"
    days_valid = 30 #How do I incorporate this so that the Cyano_Status field changes to 'No Updated Information' if all samples for that
    # pond are older than 30 days?
    
    cyanoponds = gis.content.get('a13f1b2e80ff4580a27620fc9d47f757')
    cyanoponds_lyr = cyanoponds.layers[0]
    cyanoponds_fset = cyanoponds_lyr.query()                              
    cyanoponds_features = cyanoponds_fset.features
    cyanoponds_sdf = cyanoponds_fset.sdf
       
    fluorsamples = gis.content.get('db342bf34b1742c0abb96d6ebff2f396') #Public facing View of the Fluorometry survey, final_review = yes
    fluorsamples_lyr = fluorsamples.layers[0]
    fluorsamples_fset = fluorsamples_lyr.query(where=where_query) 
    fluorsamples_features = fluorsamples_fset.features
    fluorsamples_sdf = fluorsamples_fset.sdf 
    
    #I will maybe need to use this survey to populate dominant genus, scum notes & pond temperatures if I can't do that with arcade and UniqueIDs in pop-ups
    microfieldsamples = gis.content.get('dc204479b5bf4f34a2efed96d0b2a340') #View of Joined Microscopy & Field samples
    microfieldsamples_lyr = microfieldsamples.layers[0]
    microfieldsamples_fset = microfieldsamples_lyr.query(where=where_querymf)
    microfieldsamples_features = microfieldsamples_fset.features
    microfieldsamples_sdf = microfieldsamples_fset.sdf    
    
    df = fluorsamples_sdf.sort_values(by=['fluorometry_date'], ascending=False)
    # Add cyano status rating to enable easy selection of the worst status for each pond later. This is relevant for cases where there
    # is more than one sampling location at a single pond. There are only a few ponds like this, but it won't cause any issues if
    # we add the rating for all ponds.
    df['cyano_rate'] = 0
    df.loc[df.cyano_risk_tier=="Acceptable", "cyano_rate"] = 1
    df.loc[df.cyano_risk_tier=="Potential for Concern", "cyano_rate"] = 2
    df.loc[df.cyano_risk_tier=="Use Restriction Warranted", "cyano_rate"] = 3
    
    # SELECT ONE MOST RECENT RECORD FOR EACH POND. First, create a function (get_maxdate) to get the most recent record from a group.
    def get_maxdate(group): return group[group.fluorometry_date==group.fluorometry_date.max()]
    # Group by pond ID and apply get_maxdate to get most recent record(s) for each pond.
    recent_pre = df.groupby("CCC_GIS_ID").apply(get_maxdate).reset_index(drop=True)
    # For ponds with more than one sample location, we'll assign them the worst worst status found at the pond that day.
    # First, create a function (get_worststatus) that gets the worst cyano status (based on cyano_rate) from a group.
    def get_worststatus(group): return group[group.cyano_rate==group.cyano_rate.max()]
    # Then, group by pond ID and apply get_worststatus function.
    recent_pre = df.groupby("CCC_GIS_ID").apply(get_worststatus).reset_index(drop=True)
    # Among the ponds with multiple sample locations, if a pond has the same cyano status at more than one location, it will have
    # subsetted more than one record for the pond. This code simply takes the first listed one.
    recent_pre.drop_duplicates(subset = ['CCC_GIS_ID'], keep = 'first', inplace = True)
    
    df["date_str"] = pd.to_datetime(df['fluorometry_date']).dt.strftime('%m/%d/%Y')
    
    #### FIND THE SECOND MOST RECENT RECORD FOR EACH POND. Create a new data frame that excludes the most recent records for 
    # each pond. Apply the above code to this subset, thereby finding the SECOND most recent record for each pond and collecting
    # them in a data frame (secondmostrecent).
    excludingrecent= df[~df.sampleID.isin(recent_pre.sampleID)]
    secmostrecent  = excludingrecent.groupby("CCC_GIS_ID").apply(get_maxdate).reset_index(drop=True)
    if secmostrecent.empty:
        secmostrecent=excludingrecent
    secmostrecent  = secmostrecent.groupby("CCC_GIS_ID").apply(get_worststatus).reset_index(drop=True)
    if secmostrecent.empty:
        secmostrecent=excludingrecent
    secmostrecent.drop_duplicates(subset = ['CCC_GIS_ID'], keep = 'first', inplace = True)
    # Create a field that lists the date and cyano_status from the second most recent record
    secmostrecent["secprior_sample"] = secmostrecent["date_str"] + ": " + secmostrecent["cyano_risk_tier"]
    secmostrecent = secmostrecent.loc[:,["CCC_GIS_ID","secprior_sample","sampleID"]]
    secmostrecent.rename(columns={"sampleID":"secsampleID"},inplace=True)
    
    #### FIND THE THIRD MOST RECENT RECORD FOR EACH POND. Create a new data frame that excludes the most recent and 2nd most recent
    # records for each pond. Apply the above code to this subset, thereby finding the THIRD most recent record for each pond and
    # collecting them in a data frame (thirdmost recent).
    excluding2recent = excludingrecent[~excludingrecent.sampleID.isin(secmostrecent.secsampleID)]
    thirdmostrecent  = excluding2recent.groupby("CCC_GIS_ID").apply(get_maxdate).reset_index(drop=True)
    if thirdmostrecent.empty:
        thirdmostrecent=excluding2recent
    thirdmostrecent  = thirdmostrecent.groupby("CCC_GIS_ID").apply(get_worststatus).reset_index(drop=True)
    if thirdmostrecent.empty:
        thirdmostrecent=excluding2recent
    thirdmostrecent.drop_duplicates(subset = ['CCC_GIS_ID'], keep = 'first', inplace = True)
    # Create a field that lists the date and cyano_status from the third most recent record
    thirdmostrecent["thirdprior_sample"] = thirdmostrecent["date_str"] + ": " + thirdmostrecent["cyano_risk_tier"]
    thirdmostrecent = thirdmostrecent.loc[:,["CCC_GIS_ID","thirdprior_sample","sampleID"]]
    thirdmostrecent.rename(columns={"sampleID":"thirdsampleID"},inplace=True)
    
    #### CREATE PRIOR ACTIVITY FIELD
    # Merge data frames recent_pre, secmostrecent, and thirdmostrecent together into "recent." Create field called prior_activity that
    # lists sample dates and statuses from the second and third most recent sampling dates for a pond.
    recent = pd.merge(recent_pre,secmostrecent,on="CCC_GIS_ID",how="outer"); recent = pd.merge(recent,thirdmostrecent,on="CCC_GIS_ID",how="outer")
    recent["prior_activity"] = recent["secprior_sample"] + ", " + recent["thirdprior_sample"].fillna("")
    recent["prior_activity"] = recent["prior_activity"].str.strip().str.rstrip(',') # <-- remove hanging comma and space at end, in cases where there is only 2nd most recent
    recent = recent.sort_values(by="fluorometry_date", ascending=False)
        
    overlap_rows = pd.merge(left = cyanoponds_sdf, right = recent, how = 'inner', on='CCC_GIS_ID' )
    cyanopond_features = cyanoponds_fset.features
    fluorsamples_updates = fluorsamples_fset.features
    fluorsamples_updates.reverse()
     
    def update(cyanopond, fluorometry):
        for CCC_GIS_ID in overlap_rows['CCC_GIS_ID']:
            try:
                cyanopond_feature = [f for f in cyanopond_features if f.attributes['CCC_GIS_ID']==CCC_GIS_ID][0]
                fluorsample_feature = [f for f in fluorsamples_features if f.attributes['CCC_GIS_ID']==CCC_GIS_ID][0]
                cyanopond_feature.attributes['Sample_Date'] = fluorsample_feature.attributes['fluorometry_date']
                cyanopond_feature.attributes['Cyano_Status']= fluorsample_feature.attributes['cyano_risk_tier']
                #cyanopond_feature.attributes['Recent_Activity']= I need this to populate with the 'prior_activity' column that we wrote to the 'recent' data frame 
                cyanopond_feature.attributes['Town_action']= fluorsample_feature.attributes['town_actions']
                cyanopond_feature.attrbutes['Notes']=fluorsample_feature.attributes['town_action_notes']
                #cyanopond_feature.attributes['Dominance']= eventually will need these three fields populated by micrcofieldsamples OR use UniqueID to populate in the pop-up
                #cyanopond_feature.attributes['Scum']=
                #cyanopond_feature.attirubtes['Water_Temp']= 
                cyanoponds_lyr.edit_features(updates=[cyanopond_feature])
                print(f"Updated {cyanopond_feature.attributes['CCC_GIS_ID']} cyano status to {cyanopond_feature.attributes['Cyano_Status']}",flush=True)
            except Exception as e:
                # Print exception details for debugging.
                print(f"Error updating feature with CCC_GIS_ID {CCC_GIS_ID}: {e}")
    update(cyanoponds_features, fluorsamples_updates)

update_CyanoData()

SamanthaAPCC · ‎05-14-2024

Thanks Joseph! I tried this but now I'm getting this error:

Error updating feature with CCC_GIS_ID OR-176: Circular reference detected
Error updating feature with CCC_GIS_ID OR-153: Circular reference detected
Error updating feature with CCC_GIS_ID BA-1171: Circular reference detected

I'm not sure if it has to do with cyanopond_features vs cyanopond_feature vs cyanoponds_features. In your article I noticed that the def update used a different name ('basins') that didn't appear to be used anywhere else. And you use catch_basin_features and catch_basins_features, so I tried to mimic this in my code. Just an idea, I'm not sure exaclty what that error means.

JosephRhodes2 · ‎05-14-2024

Hi Samantha, can you remove the try-except so it just errors out with a traceback, so we can see what line it's tripping up on?

Regarding the other post, I noticed that error. I'm reaching to out Esri to ask for permission to edit it (seems like you can't edit old posts).

SamanthaAPCC · ‎05-14-2024

Oh okay, good to know about that post!

Here is the full error:

ValueError                                Traceback (most recent call last)
/tmp/ipykernel_19/2573390661.py in <cell line: 125>()
    123     update(cyanoponds_features, overlap_rows, cyanoponds_lyr)
    124 
--> 125 update_CyanoData()

/tmp/ipykernel_19/2573390661.py in update_CyanoData()
    121             # Print exception details for debugging.
    122             print(f"Error updating feature with CCC_GIS_ID {CCC_GIS_ID}: {e}")
--> 123     update(cyanoponds_features, overlap_rows, cyanoponds_lyr)
    124 
    125 update_CyanoData()

/tmp/ipykernel_19/2573390661.py in update(cyanoponds_features, overlap_rows, cyanoponds_lyr)
    114             #cyanopond_feature.attributes['Scum']=
    115             #cyanopond_feature.attributes['Water_Temp']=
--> 116             resp = cyanoponds_lyr.edit_features(updates=[cyanopond_feature])
    117             print (f"Attempted to update Cyano_Status for {CCC_GIS_ID} to {row['cyano_risk_tier']}. Response: {resp}")
    118             #print(f"Updated {cyanopond_feature.attributes['CCC_GIS_ID']} cyano status to {cyanopond_feature.attributes['Cyano_Status']}",flush=True)

/opt/conda/lib/python3.9/site-packages/arcgis/features/layer.py in edit_features(self, adds, updates, deletes, gdb_version, use_global_ids, rollback_on_failure, return_edit_moment, attachments, true_curve_client, session_id, use_previous_moment, datum_transformation, future)
   3397                 )
   3398             elif isinstance(updates[0], Feature):
-> 3399                 params["updates"] = json.dumps(   3400                     [f.as_dict for f in updates], default=_date_handler
   3401                 )

/opt/conda/lib/python3.9/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    232     if cls is None:
    233         cls = JSONEncoder
--> 234     return cls(    235         skipkeys=skipkeys, ensure_ascii=ensure_ascii,
    236         check_circular=check_circular, allow_nan=allow_nan, indent=indent,

/opt/conda/lib/python3.9/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/opt/conda/lib/python3.9/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

ValueError: Circular reference detected

JosephRhodes2 · ‎05-14-2024

Try this:

import copy

def update_cyano_status(cyanoponds_features, overlap_rows, cyanoponds_lyr):
    for CCC_GIS_ID in overlap_rows['CCC_GIS_ID']:
        cyanopond_feature = [f for f in cyanoponds_features if f.attributes['CCC_GIS_ID'] == CCC_GIS_ID][0]
        row = overlap_rows[overlap_rows['CCC_GIS_ID'] == CCC_GIS_ID].iloc[0]
        cyanopond_feature_copy = copy.deepcopy(cyanopond_feature)
        cyanopond_feature_copy.attributes['Cyano_Status'] = row['cyano_risk_tier']
        resp = cyanoponds_lyr.edit_features(updates=[cyanopond_feature_copy])
        print(f"Attempted to update Cyano_Status for {CCC_GIS_ID} to {row['cyano_risk_tier']}. Response: {resp}")

SamanthaAPCC · ‎05-14-2024

I tried that and I'm still getting the circular reference detected error. I also tried a few suggestions from ChatGPT but I continue to get this same message.

SamanthaAPCC · ‎05-14-2024

I fixed the circular error by replacing blank values with NA in the below script:

def update(cyanoponds_features, overlap_rows, cyanoponds_lyr):
        overlap_rows['cyano_risk_tier'] = overlap_rows['cyano_risk_tier'].fillna('No Updated Information')
        overlap_rows['town_actions'] = overlap_rows['town_actions'].fillna('<NA>')
        overlap_rows['town_action_notes'] = overlap_rows['town_action_notes'].fillna('<NA>')
        overlap_rows['prior_activity'] = overlap_rows['prior_activity'].fillna('<NA>')

        for CCC_GIS_ID in overlap_rows['CCC_GIS_ID']:
            try:
                cyanopond_feature = [f for f in cyanopond_features if f.attributes['CCC_GIS_ID'] == CCC_GIS_ID][0]
                row = overlap_rows[overlap_rows['CCC_GIS_ID'] == CCC_GIS_ID].iloc[0]
            
                # Sanitize the content before updating the feature attributes
                notes_content = html.escape(row['town_action_notes']) if row['town_action_notes'] else '<NA>'
                prior_content = html.escape(row['prior_activity']) if row['prior_activity'] else '<NA>'
                
                # Create a copy of attributes only to avoid circular reference issues
                updated_attributes = {
                    'Sample_Date': row['fluorometry_date'],
                    'Cyano_Status': row['cyano_risk_tier'],
                    'Recent_Activity': prior_content,
                    'Town_action': row['town_actions'],
                    'Notes': notes_content,
                    # Add other fields as necessary
                }
              
                cyanopond_feature_copy = copy.deepcopy(cyanopond_feature)
                cyanopond_feature_copy.attributes.update(updated_attributes)
                
                resp = cyanoponds_lyr.edit_features(updates=[cyanopond_feature_copy])
                print (f"Attempted to update Cyano_Status for {CCC_GIS_ID} to {row['cyano_risk_tier']}. Response: {resp}")
                print (resp)
            except Exception as e:
                print(f"Error updating feature with CCC_GIS_ID {CCC_GIS_ID}: {e}")

    update(cyanoponds_features, overlap_rows, cyanoponds_lyr)

Now I just need to add in the 30 days code and the microscopy/field data fields and it appears like this may finally be working completely!

JosephRhodes2 · ‎05-14-2024

For Issue #2: try this to get a list of CCC_GIS_ID values where there has not been a sample in 30 days:

def query_for_no_inspection_in_thirty_days(layer):
    threshold_date = datetime.now() - timedelta(days=30)
    query_result = layer.query(where=f"fluorometry_date < '{threshold_date}'", out_fields='CCC_GIS_ID')
    ccc_gis_ids = [record.attributes['CCC_GIS_ID'] for record in query_result.features]
    return ccc_gis_ids

no_fluorometry_in_30 = query_for_no_inspection_in_thirty_days(fluorsamples_lyr)
print(no_fluorometry_in_30)

Then you can loop over those CCC_GIS_ID values and update the ponds:

for CCC_GIS_ID in no_fluorometry_in_30:
    cyanopond_feature = [f for f in cyanoponds_features if f.attributes['CCC_GIS_ID'] == CCC_GIS_ID]
    cyanopond_feature.attributes['Cyano_Status'] = 'No updated information'
    resp = cyanoponds_lyr.edit_features(updates=[cyanopond_feature])

Then do the same for the microscopy. Let me know how that works.