Python Notebook to update attributes to a feature layer

SamanthaAPCC · ‎05-02-2024

I am using AGOL Python Notebook to write to a hosted feature layer using attributes from two Survey123 surveys. Eventually I will need to schedule this code to run daily. The code seems to be working and is printing my final message, but it is not writing to the attribute table of my cyanoponds layer. Am I not using the correct python language? I am very novice at python but this is our only option for the type of updates we need. I am taking data from a survey and based on a unique ID (CCC_GIS_ID) I am looking for the most recent sample as well as the one with the highest cyano_status ranking. It also appears to be skipping CCC_GIS_IDs when there is more than 1 sample at the pond location, which means I'm not incorporating lines 46-57 correctly. I will also need to add a component that changes the status to "No Updated Information" if the most recent sample is older than 30 days, which I haven't quite figured out how to add in yet.

Also, when this code is running daily it will need to overwrite the table with the new information.

I took elements from this article in order to come up with the basis of this code.

from arcgis.gis import GIS
from arcgis.features import FeatureLayer
import pandas as pd
from arcgis.features import FeatureCollection
from datetime import datetime, timedelta

gis = GIS("home")

def update_CyanoData():
    
    one_day = datetime.today() - timedelta(days=1)
    string_day = one_day.strftime('%Y-%m-%d')
    where_query = f"fluorometry_date >= DATE '{string_day}'"
    where_querymf = f"microscopy_date >= DATE '{string_day}'"
    days_valid = 30 #How do I incorporate this so that the Cyano_Status field changes to 'No Updated Information' if all samples for that
    # pond are older than 30 days?
    
    cyanoponds = gis.content.get('a13f1b2e80ff4580a27620fc9d47f757')
    cyanoponds_lyr = cyanoponds.layers[0]
    cyanoponds_fset = cyanoponds_lyr.query()                              
    cyanoponds_features = cyanoponds_fset.features
    cyanoponds_sdf = cyanoponds_fset.sdf
       
    fluorsamples = gis.content.get('db342bf34b1742c0abb96d6ebff2f396') #Public facing View of the Fluorometry survey, final_review = yes
    fluorsamples_lyr = fluorsamples.layers[0]
    fluorsamples_fset = fluorsamples_lyr.query(where=where_query) 
    fluorsamples_features = fluorsamples_fset.features
    fluorsamples_sdf = fluorsamples_fset.sdf 
    
    #I will maybe need to use this survey to populate dominant genus, scum notes & pond temperatures if I can't do that with arcade and UniqueIDs in pop-ups
    microfieldsamples = gis.content.get('dc204479b5bf4f34a2efed96d0b2a340') #View of Joined Microscopy & Field samples
    microfieldsamples_lyr = microfieldsamples.layers[0]
    microfieldsamples_fset = microfieldsamples_lyr.query(where=where_querymf)
    microfieldsamples_features = microfieldsamples_fset.features
    microfieldsamples_sdf = microfieldsamples_fset.sdf    
    
    df = fluorsamples_sdf.sort_values(by=['fluorometry_date'], ascending=False)
    # Add cyano status rating to enable easy selection of the worst status for each pond later. This is relevant for cases where there
    # is more than one sampling location at a single pond. There are only a few ponds like this, but it won't cause any issues if
    # we add the rating for all ponds.
    df['cyano_rate'] = 0
    df.loc[df.cyano_risk_tier=="Acceptable", "cyano_rate"] = 1
    df.loc[df.cyano_risk_tier=="Potential for Concern", "cyano_rate"] = 2
    df.loc[df.cyano_risk_tier=="Use Restriction Warranted", "cyano_rate"] = 3
    
    # SELECT ONE MOST RECENT RECORD FOR EACH POND. First, create a function (get_maxdate) to get the most recent record from a group.
    def get_maxdate(group): return group[group.fluorometry_date==group.fluorometry_date.max()]
    # Group by pond ID and apply get_maxdate to get most recent record(s) for each pond.
    recent_pre = df.groupby("CCC_GIS_ID").apply(get_maxdate).reset_index(drop=True)
    # For ponds with more than one sample location, we'll assign them the worst worst status found at the pond that day.
    # First, create a function (get_worststatus) that gets the worst cyano status (based on cyano_rate) from a group.
    def get_worststatus(group): return group[group.cyano_rate==group.cyano_rate.max()]
    # Then, group by pond ID and apply get_worststatus function.
    recent_pre = df.groupby("CCC_GIS_ID").apply(get_worststatus).reset_index(drop=True)
    # Among the ponds with multiple sample locations, if a pond has the same cyano status at more than one location, it will have
    # subsetted more than one record for the pond. This code simply takes the first listed one.
    recent_pre.drop_duplicates(subset = ['CCC_GIS_ID'], keep = 'first', inplace = True)
    
    df["date_str"] = pd.to_datetime(df['fluorometry_date']).dt.strftime('%m/%d/%Y')
    
    #### FIND THE SECOND MOST RECENT RECORD FOR EACH POND. Create a new data frame that excludes the most recent records for 
    # each pond. Apply the above code to this subset, thereby finding the SECOND most recent record for each pond and collecting
    # them in a data frame (secondmostrecent).
    excludingrecent= df[~df.sampleID.isin(recent_pre.sampleID)]
    secmostrecent  = excludingrecent.groupby("CCC_GIS_ID").apply(get_maxdate).reset_index(drop=True)
    if secmostrecent.empty:
        secmostrecent=excludingrecent
    secmostrecent  = secmostrecent.groupby("CCC_GIS_ID").apply(get_worststatus).reset_index(drop=True)
    if secmostrecent.empty:
        secmostrecent=excludingrecent
    secmostrecent.drop_duplicates(subset = ['CCC_GIS_ID'], keep = 'first', inplace = True)
    # Create a field that lists the date and cyano_status from the second most recent record
    secmostrecent["secprior_sample"] = secmostrecent["date_str"] + ": " + secmostrecent["cyano_risk_tier"]
    secmostrecent = secmostrecent.loc[:,["CCC_GIS_ID","secprior_sample","sampleID"]]
    secmostrecent.rename(columns={"sampleID":"secsampleID"},inplace=True)
    
    #### FIND THE THIRD MOST RECENT RECORD FOR EACH POND. Create a new data frame that excludes the most recent and 2nd most recent
    # records for each pond. Apply the above code to this subset, thereby finding the THIRD most recent record for each pond and
    # collecting them in a data frame (thirdmost recent).
    excluding2recent = excludingrecent[~excludingrecent.sampleID.isin(secmostrecent.secsampleID)]
    thirdmostrecent  = excluding2recent.groupby("CCC_GIS_ID").apply(get_maxdate).reset_index(drop=True)
    if thirdmostrecent.empty:
        thirdmostrecent=excluding2recent
    thirdmostrecent  = thirdmostrecent.groupby("CCC_GIS_ID").apply(get_worststatus).reset_index(drop=True)
    if thirdmostrecent.empty:
        thirdmostrecent=excluding2recent
    thirdmostrecent.drop_duplicates(subset = ['CCC_GIS_ID'], keep = 'first', inplace = True)
    # Create a field that lists the date and cyano_status from the third most recent record
    thirdmostrecent["thirdprior_sample"] = thirdmostrecent["date_str"] + ": " + thirdmostrecent["cyano_risk_tier"]
    thirdmostrecent = thirdmostrecent.loc[:,["CCC_GIS_ID","thirdprior_sample","sampleID"]]
    thirdmostrecent.rename(columns={"sampleID":"thirdsampleID"},inplace=True)
    
    #### CREATE PRIOR ACTIVITY FIELD
    # Merge data frames recent_pre, secmostrecent, and thirdmostrecent together into "recent." Create field called prior_activity that
    # lists sample dates and statuses from the second and third most recent sampling dates for a pond.
    recent = pd.merge(recent_pre,secmostrecent,on="CCC_GIS_ID",how="outer"); recent = pd.merge(recent,thirdmostrecent,on="CCC_GIS_ID",how="outer")
    recent["prior_activity"] = recent["secprior_sample"] + ", " + recent["thirdprior_sample"].fillna("")
    recent["prior_activity"] = recent["prior_activity"].str.strip().str.rstrip(',') # <-- remove hanging comma and space at end, in cases where there is only 2nd most recent
    recent = recent.sort_values(by="fluorometry_date", ascending=False)
        
    overlap_rows = pd.merge(left = cyanoponds_sdf, right = recent, how = 'inner', on='CCC_GIS_ID' )
    cyanopond_features = cyanoponds_fset.features
    fluorsamples_updates = fluorsamples_fset.features
    fluorsamples_updates.reverse()
     
    def update(cyanopond, fluorometry):
        for CCC_GIS_ID in overlap_rows['CCC_GIS_ID']:
            try:
                cyanopond_feature = [f for f in cyanopond_features if f.attributes['CCC_GIS_ID']==CCC_GIS_ID][0]
                fluorsample_feature = [f for f in fluorsamples_features if f.attributes['CCC_GIS_ID']==CCC_GIS_ID][0]
                cyanopond_feature.attributes['Sample_Date'] = fluorsample_feature.attributes['fluorometry_date']
                cyanopond_feature.attributes['Cyano_Status']= fluorsample_feature.attributes['cyano_risk_tier']
                #cyanopond_feature.attributes['Recent_Activity']= I need this to populate with the 'prior_activity' column that we wrote to the 'recent' data frame 
                cyanoponds_lyr.edit_features(updates=[cyanopond_feature])
                print(f"Updated {cyanopond_feature.attributes['CCC_GIS_ID']} cyano status to {cyanopond_feature.attributes['Cyano_Status']}",flush=True)
            except Exception as e:
                # Print exception details for debugging.
                print(f"Error updating feature with CCC_GIS_ID {CCC_GIS_ID}: {e}")
    update(cyanoponds_features, fluorsamples_updates)

update_CyanoData()

AngelaSchirck · ‎05-03-2024

Just FYI you might get more response on this in the Developer's (Python) community.

Have you tried using arcpy api (Update cursor)?

SamanthaAPCC · ‎05-03-2024

Thanks Angela, I'll try posting there. I'm not a developer and my experience with python is limited so I didn't even think to post there. I'm trying to combine two codes (one of which is from the article I posted) so I'll look into the Update cursor. I was trying to avoid python altogether for this project but the abilities of using Arcade for symbology is too limited for what I need to accomplish. Unfortunately this is my only option!