Improving Expression Performance: A Custom Function

jcarlson · ‎05-12-2023

I've been thinking a lot about ways to optimize Arcade expressions lately. A lot of our users need things that are just beyond the capabilities of a layer as it is built, but which can be accomplished through Arcade in some way, usually with a Data Expression.

Consider the following expression, though:

var fs1 = FeatureSetByPortalItem(
    Portal('https://arcgis.com'),
    'some itemID',
    0,
    ['shared_field'],
    false
)

var fs2 = FeatureSetByPortalItem(
    Portal('https://arcgis.com'),
    'another itemID',
    0,
    ['shared_field'],
    false
)

for (var f in fs1) {
    var match = Filter(fs2, `shared_field = '${f['shared_field']}'`

    // do something with the matched feature
}

The way it feels when I write it is that I want to make two queries to the server, get two FeatureSets, and then loop through the first and pull out matching features in the second.

The way it works when you execute this expression is that for n features in the first FeatureSet, the browser sends n queries to the feature service for the second.

Why does this matter? Well, I don't know about you, but I have more RAM than bandwidth, and I prefer not to hammer my server with thousands of requests. I'd gladly just pull both FeatureSets into memory and work with them directly. As of writing this, there's nothing in Arcade that does this off the shelf, but a custom function can handle it nicely.

Memorize

I want to get FeatureSets into memory, so I thought Memorize was a fitting name. Pseudo-code:

Take a FeatureSet
Create a placeholder dictionary
Loop through the FeatureSet, pushing each feature into the dictionary
Use the dictionary to create a new FeatureSet

Voila!

Real code:

function Memorize(fs) {
    var temp_dict = {
        fields: Schema(fs)['fields'],
        geometryType: '',
        features: []
    }

    for (var f in fs) {
        var attrs = {}

        for (var attr in f) {
            attrs[attr] = Iif(TypeOf(f[attr]) == 'Date', Number(f[attr]), f[attr])
        }

        Push(
            temp_dict['features'],
            {attributes: attrs}
        )
    }

    return FeatureSet(Text(temp_dict))
}

In Practice

So, how does it do? I can tell you it works, but does it work better than just using the FeatureSets like normal?

Here's a test expression:

var start = Now()
Console(`Start time: ${Text(start, 'hh:mm:ss')}`)

Console(`Get States: ${DateDiff(Now(), start)} ms`)
var states = FeatureSetByPortalItem(
  Portal('https://arcgis.com'),
  '8c2d6d7df8fa4142b0a1211c8dd66903',
  0,
  ['STATE_FIPS', 'POPULATION'],
  false
)

Console(`Get Counties: ${DateDiff(Now(), start)} ms`)
var counties = FeatureSetByPortalItem(
  Portal('https://arcgis.com'),
  '3c164274a80748dda926a046525da610',
  0,
  ['NAME', 'STATE_FIPS', 'POPULATION'],
  false
)

// output dictionary
var out_dict = {
  fields: [
    {name: 'county_name', type: 'esriFieldTypeString'},
    {name: 'state_pop', type: 'esriFieldTypeInteger'},
    {name: 'county_pop', type: 'esriFieldTypeInteger'},
    {name: 'county_pct_state', type: 'esriFieldTypeDouble'}
  ],
  geometryType: '',
  features: []
}

// loop through counties, get parent state and compare populations
Console(`Begin Loop: ${DateDiff(Now(), start)} ms`)

var i = 0

for (var c in counties) {

  var the_state = First(Filter(states, `STATE_FIPS = '${c['STATE_FIPS']}'`))

  Push(
    out_dict['features'],
    {
      attributes: {
        county_name: c['NAME'],
        state_pop: the_state['POPULATION'],
        county_pop: c['POPULATION'],
        county_pct_state: c['POPULATION'] / the_state['POPULATION']
      }
    }
  )

  if (i % 100 == 0) { Console(`${i} loops: ${DateDiff(Now(), start)} ms`)}

  i ++
}

Console(`End time: ${Text(Now(), 'hh:mm:ss')}`)

Console(`Duration: ${DateDiff(Now(), start)} ms`)

return FeatureSet(Text(out_dict))

To be clear, there are ways this could be written better. But that's not the point! Here are my console logs:

Start time: 03:56:34
Get States: 0 ms
Get Counties: 132 ms
Begin Loop: 282 ms
0 loops: 500 ms
100 loops: 1986 ms
200 loops: 3389 ms
...
3100 loops: 48782 ms
End time: 03:57:23
Duration: 49424 m

Now to compare: I am going to use the same script, but use my Memorize function at the top, then use "memorized" copies of those FeatureSets.

Here are the console logs:

Start time: 04:02:30
Get States: 1 ms
Get Counties: 560 ms
Begin Loop: 2047 ms
0 loops: 2375 ms
100 loops: 2556 ms
200 loops: 2798 ms
...
3100 loops: 7319 ms
End time: 04:02:37
Duration: 7385 ms

Results

In the "traditional" model, that was nearly 50 seconds waiting for my Dashboard to load, and over 3000 pings to the Esri servers. Sorry!

In the "memorized" model, it's still over 7 seconds, but that is a huge improvement. Oh, and there are only 2 pings to the servers, so that's a pretty good improvement there, too.

To be clear, I don't think this is some panacea for bad Arcade expressions. But I think if you've got some inter-layer operations happening, you should check it out.