Select to view content in your preferred language

Improving Expression Performance: A Custom Function

2340
16
05-12-2023 02:09 PM
jcarlson
MVP Esteemed Contributor
16 16 2,340

I've been thinking a lot about ways to optimize Arcade expressions lately. A lot of our users need things that are just beyond the capabilities of a layer as it is built, but which can be accomplished through Arcade in some way, usually with a Data Expression.

Consider the following expression, though:

 

 

var fs1 = FeatureSetByPortalItem(
    Portal('https://arcgis.com'),
    'some itemID',
    0,
    ['shared_field'],
    false
)

var fs2 = FeatureSetByPortalItem(
    Portal('https://arcgis.com'),
    'another itemID',
    0,
    ['shared_field'],
    false
)

for (var f in fs1) {
    var match = Filter(fs2, `shared_field = '${f['shared_field']}'`

    // do something with the matched feature
}

 

 

The way it feels when I write it is that I want to make two queries to the server, get two FeatureSets, and then loop through the first and pull out matching features in the second.

The way it works when you execute this expression is that for n features in the first FeatureSet, the browser sends n queries to the feature service for the second.

Why does this matter? Well, I don't know about you, but I have more RAM than bandwidth, and I prefer not to hammer my server with thousands of requests. I'd gladly just pull both FeatureSets into memory and work with them directly. As of writing this, there's nothing in Arcade that does this off the shelf, but a custom function can handle it nicely.

Memorize

I want to get FeatureSets into memory, so I thought Memorize was a fitting name. Pseudo-code:

  1. Take a FeatureSet
  2. Create a placeholder dictionary
  3. Loop through the FeatureSet, pushing each feature into the dictionary
  4. Use the dictionary to create a new FeatureSet

Voila!

Real code:

 

 

function Memorize(fs) {
    var temp_dict = {
        fields: Schema(fs)['fields'],
        geometryType: '',
        features: []
    }

    for (var f in fs) {
        var attrs = {}

        for (var attr in f) {
            attrs[attr] = Iif(TypeOf(f[attr]) == 'Date', Number(f[attr]), f[attr])
        }

        Push(
            temp_dict['features'],
            {attributes: attrs}
        )
    }

    return FeatureSet(Text(temp_dict))
}

 

 

In Practice

So, how does it do? I can tell you it works, but does it work better than just using the FeatureSets like normal?

Here's a test expression:

 

 

var start = Now()
Console(`Start time: ${Text(start, 'hh:mm:ss')}`)

Console(`Get States: ${DateDiff(Now(), start)} ms`)
var states = FeatureSetByPortalItem(
  Portal('https://arcgis.com'),
  '8c2d6d7df8fa4142b0a1211c8dd66903',
  0,
  ['STATE_FIPS', 'POPULATION'],
  false
)

Console(`Get Counties: ${DateDiff(Now(), start)} ms`)
var counties = FeatureSetByPortalItem(
  Portal('https://arcgis.com'),
  '3c164274a80748dda926a046525da610',
  0,
  ['NAME', 'STATE_FIPS', 'POPULATION'],
  false
)

// output dictionary
var out_dict = {
  fields: [
    {name: 'county_name', type: 'esriFieldTypeString'},
    {name: 'state_pop', type: 'esriFieldTypeInteger'},
    {name: 'county_pop', type: 'esriFieldTypeInteger'},
    {name: 'county_pct_state', type: 'esriFieldTypeDouble'}
  ],
  geometryType: '',
  features: []
}

// loop through counties, get parent state and compare populations
Console(`Begin Loop: ${DateDiff(Now(), start)} ms`)

var i = 0

for (var c in counties) {

  var the_state = First(Filter(states, `STATE_FIPS = '${c['STATE_FIPS']}'`))

  Push(
    out_dict['features'],
    {
      attributes: {
        county_name: c['NAME'],
        state_pop: the_state['POPULATION'],
        county_pop: c['POPULATION'],
        county_pct_state: c['POPULATION'] / the_state['POPULATION']
      }
    }
  )

  if (i % 100 == 0) { Console(`${i} loops: ${DateDiff(Now(), start)} ms`)}

  i ++
}

Console(`End time: ${Text(Now(), 'hh:mm:ss')}`)

Console(`Duration: ${DateDiff(Now(), start)} ms`)

return FeatureSet(Text(out_dict))

 

 

 To be clear, there are ways this could be written better. But that's not the point! Here are my console logs:

Start time: 03:56:34
Get States: 0 ms
Get Counties: 132 ms
Begin Loop: 282 ms
0 loops: 500 ms
100 loops: 1986 ms
200 loops: 3389 ms
...
3100 loops: 48782 ms
End time: 03:57:23
Duration: 49424 m

Now to compare: I am going to use the same script, but use my Memorize function at the top, then use "memorized" copies of those FeatureSets.

Here are the console logs:

Start time: 04:02:30
Get States: 1 ms
Get Counties: 560 ms
Begin Loop: 2047 ms
0 loops: 2375 ms
100 loops: 2556 ms
200 loops: 2798 ms
...
3100 loops: 7319 ms
End time: 04:02:37
Duration: 7385 ms

Results

In the "traditional" model, that was nearly 50 seconds waiting for my Dashboard to load, and over 3000 pings to the Esri servers. Sorry!

In the "memorized" model, it's still over 7 seconds, but that is a huge improvement. Oh, and there are only 2 pings to the servers, so that's a pretty good improvement there, too.

To be clear, I don't think this is some panacea for bad Arcade expressions. But I think if you've got some inter-layer operations happening, you should check it out.

16 Comments
JohannesLindner
MVP Frequent Contributor

This is a very good point.

I don't really know how Arcade interacts with servers (and I think that's shared by most users). All we have to lead us are common tips like "don't load geometries if not absolutely necessary", but I often feel like excluding geometries and fields doesn't really have much impact on performance. But I 100% used Filter() in loops, thinking that it would only communicate with the server once. Same goes for other functions like Intersects(), which probably has the same outcome of hammering the server.

 

It would be good to have a native function to load Featuresets into RAM. Maybe even an optional argument in the FeaturesetBy*() functions.

var fs = FeatureSetByPortalItem(portalObject, itemId, layerId?, fields?, includeGeometry?, memorize?)

 

 

If you turn this blog into an Idea, I'd be sure to upvote.

JohannesLindner
MVP Frequent Contributor

I just helped a user with performance and tried your tip.

I realized that you ignore geometries here. This means that you can't call geometry related functions like Intersects() on the memorized Featureset anymore, because (as I understand it) these would normally be evaluated on the server, but now have to be evaluated locally, and that doesn't give the correct results if there are no geometries.

 

To remedy that, we can modify your function a little bit (lines 4 & 17):

function Memorize(fs) {
    var temp_dict = {
        fields: Schema(fs)['fields'],
        geometryType: Schema(fs).geometryType,
        features: []
    }

    for (var f in fs) {
        var attrs = {}

        for (var attr in f) {
            attrs[attr] = Iif(TypeOf(f[attr]) == 'Date', Number(f[attr]), f[attr])
        }

        Push(
            temp_dict['features'],
            {attributes: attrs, geometry: Geometry(f)}
        )
    }

    return FeatureSet(Text(temp_dict))
}

 

 

Care has to be taken to actually load the geometries:

var fs = FeaturesetByPortalItem(some_portal, some_id, some_subid, ["*"], true)
fs = Memorize(fs)

 

 

In your test script, I replaced line 41 with this:

var the_state = First(Intersects(states, c))

 

 

I actually cancelled the execution with the "raw" featuresets after 5 minutes. Poor server... The script with the memorized featuresets took 5 seconds, so that is an amazing time saver.

jcarlson
MVP Esteemed Contributor

@JohannesLindner  good point! I ignored geometries because in my example, I didn't need them, but your amendments to the expression are great.

Idea posted here: https://community.esri.com/t5/arcgis-online-ideas/arcade-featureset-functions-should-have-the-option...

JustinReynolds
Regular Contributor

@JohannesLindner @jcarlson I'll be looking at were I can implement this in some of my expressions.  I have a similar request to help performance which is to have an optional SQL parameter on the the FeatureSetBy functions themselves.  Filter is nice if once you have a feature set, but I'd rather fetch a subset of features from the get go.

https://community.esri.com/t5/arcgis-field-maps-ideas/arcade-s-featuresetby-functions-needs-an-optio...

 

jcarlson
MVP Esteemed Contributor

@JustinReynolds 

FYI, writing an expression Filter(FeatureSetBy(...)) in execution is fetching a subset of the data.

There are times when moving the FeatureSet to RAM is not beneficial, and "filter from the get-go" (outside of a loop) is one of them.

DougBrowning
MVP Esteemed Contributor

We use arrays a lot to get around this also.

Here is using an array to get a list of a key that is in 2 different layers

 

var p = 'https://arcgis.com/';
var tbl = FeatureSetByPortalItem(Portal(p),'7133b618',0,['Project','PointID','StreamName','PointType','OrderCode','EvalStatus','Trip'],true);

//This is the schema I want to append data into.
var Dict = {  
    'fields': [{ 'name': 'Project', 'type': 'esriFieldTypeString' },
            { 'name': 'PointID', 'type': 'esriFieldTypeString' },
            { 'name': 'StreamName', 'type': 'esriFieldTypeString' },
            { 'name': 'PointType', 'type': 'esriFieldTypeString' },
            { 'name': 'OrderCode', 'type': 'esriFieldTypeString' },
            { 'name': 'EvalStatus', 'type': 'esriFieldTypeString' },
            { 'name': 'Trip', 'type': 'esriFieldTypeString' },
            { 'name': 'CountUnresolved', 'type': 'esriFieldTypeString' }],  
    'geometryType': 'esriGeometryPoint',   
    'features': []};  
var index = 0;

var sql2 = "ResponseType = 'Log an issue' And (Resolved IS NULL Or Resolved = 'No')"
var tbl2All = Filter(FeatureSetByPortalItem(Portal(p),'713618',10,['PointID','ResponseType','Resolved'],false), sql2);
var tbl2text = []
for (var i in tbl2All) {
    Push(tbl2text,i.PointID)
}

var isUnresolved = ''
//Cycles through each record in the input table
for (var f in tbl) {
    if (Includes(tbl2text,f.PointID)) {
        isUnresolved = 'Yes'
    }
    else {
	isUnresolved = 'No'
    }
     //This section writes values from tbl into output table and then fills the variable fields
    Dict.features[index] = {
        'attributes': {   
            'Project': f.Project,
            'PointID': f.PointID,
            'StreamName': f.StreamName,
            'PointType': f.PointType,
            'OrderCode': f.OrderCode,
            'EvalStatus': f.EvalStatus,
            'Trip': f.Trip,
            'CountUnresolved': isUnresolved
        },
	'geometry': Geometry(f)};   
    ++index;
    
}
return FeatureSet(Text(Dict));
Vinzafy
Occasional Contributor

THANK YOU for this workaround @jcarlson!

One of my use cases is a single table join where I'm trying to get away from joined view layers and use data expressions instead so the feature layer can still be dynamic (i.e., new fields can be added if needed).

Another use case is a data expression where I joined three related tables which is only possible programmatically. The current script I has works but is incredibly inefficient.

For the simpler use case, initial tests using this method resulted in load times that are 47 times faster than the previous method...awesome!

I haven't updated the script for the more complex use case, but I am excited to see just how significantly faster it loads with this new method.

RobertAnderson3
MVP Regular Contributor

This is incredible @jcarlson and exactly what I need, I've been having complaints on dashboards loading slowly so I can't wait to implement this.

For my 3 brain cells this week that can't seem to grasp implementing this, when you add the Memorize function into the start of your example, what code in your example are you replacing? Like where/how is Memorize called? I'm trying to join two tables in this expression from Survey123, main layer and the repeat.

 

function Memorize(fs) {
    var temp_dict = {
        fields: Schema(fs)['fields'],
        geometryType: '',
        features: []
    }
    for (var f in fs) {
        var attrs = {}
        for (var attr in f) {
            attrs[attr] = Iif(TypeOf(f[attr]) == 'Date', Number(f[attr]), f[attr])
        }
        Push(
            temp_dict['features'],
            {attributes: attrs}
        )
    }
    return FeatureSet(Text(temp_dict))
}

var portal = Portal("https://www.arcgis.com/");
var polyfs = FeatureSetByPortalItem(
    portal,
    "itemID#####",
    0,
    ["*"],
    false
);

var polyfs = Memorize(polyfs);

 

EDIT: I cut out a chunk of my code since it had nothing to do with my problem after all and it was way too long on this post

jcarlson
MVP Esteemed Contributor

@RobertAnderson3  After defining it, I just wrap my FeatureSetByPortalItem functions in it.

var fs = Memorize(
  FeatureSetByPortalItem(
    Portal('...'),
    '1234zxcv',
    0,
    ['*'],
    false
  )
)
RobertAnderson3
MVP Regular Contributor

@jcarlson First, the speediness of your reply is incredible and appreciated, and then ahh okay that makes sense. 

I think I was calling it right but I'm not getting any data returned with this for some reason, I set up the function, call it as you show, try doing the rest of my code, the table shows but no results. 

If I put return fs; right after the call above, it's empty. Should it have entries at that point? 

jcarlson
MVP Esteemed Contributor

Are there domains in the source layers? I seem to recall Memorize not liking certain domains at one point.

If you call return fs without using Memorize, do you get results?

RobertAnderson3
MVP Regular Contributor

@jcarlson There are domains on the source layers yes, as Survey123 automatically creates them for the select_one question type. 

So I tried again, it does not want to fill anything in from the base (0) layer that's the point layer, but when I tried memorize with both different repeat section tables I have it returned values. All have fields with domains on them. I'll have to see if I can find which field in particular Memorize does not like.

I do get results from the (0) layer without the Memorize function around it.

UPDATE: I got it working! The issue was indeed domains, it seems that the Memorize function has some issue when a field has a domain, but has values that are not in that domain. You can get around it by selecting the fields you want returned in the FeatureSetByPortalItem() function, or in my case I just added the missing value to the domain list because it was a weird case that caused this anyways.

Thank you so much @jcarlson for this post and the help!

DougBrowning
MVP Esteemed Contributor

123 only makes the domains at the beginning but does not keep them in sync if you add more later.  I think it now warns you but still does not update?  I forget.

RobertAnderson3
MVP Regular Contributor

@DougBrowning I'm pretty sure it does update/add them now, that's my understanding of the warning/prompt message that comes up, but it doesn't clean up old ones at all. You just need to click on the option.

RobertAnderson3_0-1702323140591.png

My issue was I started with a question as a select_one so it had a domain, but then I just changed to a text field hardcoded the value cause it ended up always being the same, and never added it as a domain.

DougBrowning
MVP Esteemed Contributor

Oh yea maybe that was it.  I knew they added something.  I think the old ones stay in case there is historical data in there they still work.

RobertAnderson3
MVP Regular Contributor

Hi @jcarlson 

I've been noticing some issues with the code, for some reason it seems to be missing certain entries and I can't figure out why they're not being returned. Have you had any issues with this?

I posted it as a separate question here:
https://community.esri.com/t5/arcgis-dashboards-questions/arcade-join-tables/m-p/1381304

EDIT: As we ran into above, Memorize does not like domains, I feel like that is probably the issue here. I changed my code from retrieving ALL fields ["*"] to just the ones I actually needed and it works properly now.

About the Author
I'm a GIS Analyst for Kendall County, IL. When I'm not on the clock, you can usually find me contributing to OpenStreetMap, knitting, or nattering on to my family about any and all of the above.