Understanding core-hours in ArcGIS GeoAnalytics Engine

SBattersby

There are two licensing models for ArcGIS GeoAnalytics Engine: Connected and Disconnected. In this post we’ll explore the concept of core-hours associated with the Connected licensing model, learn about the functions for tracking core-hours usage, and examine how many core-hours are used for a few example tasks. Core-hours are the unit that is used in GeoAnalytics Engine to track the resources used in processing and analyzing your spatial data.

The Connected licensing model includes a given quantity of core-hours for processing your data. As you work with the functions and tools in the library, the usage of the library is tracked and debited from the core-hours included with the license. The Disconnected licensing model does not track core-hours of usage as it does not require authentication and reporting of usage via ArcGIS. For more information about the general licensing models, you can see the ArcGIS Developers documentation here.

An important note about core-hour usage is that this is different than the computing resource consumption that you might also be using – for instance, if you are working in Azure, AWS, or Databricks you are also paying for the computing cluster that you use. You are NOT using GeoAnalytics Engine core-hours for the entire duration that your computing cluster is running. You only use core-hours when a GeoAnalytics Engine function, tool, or data source is actively being used. In other words, while your entire computing process might take 1 hour, your GeoAnalytics Engine processes are likely only a portion of that, so your core-hours of usage are only being tracked for a subset of the time that the computing cluster is running.

About usage collecting and reporting

With the Connected license, any Spark job that includes GeoAnalytics Engine functionality will be measured in compute unit-milliseconds and reported back to Esri to deduct from the available core-hours remaining in the license.
As an example of how this works, if you have a Connected license and you run a Spark job using a GeoAnalytics Engine function, tool, or data source on a cluster with 60 cores for 1 minute (60,000 milliseconds), the total usage reported would be 60 cores times 60,000 milliseconds for a total of 3,600,000 compute unit-milliseconds or 1.00 core-hours. This value is reported to Esri and is deducted from the available core-hours in your prepaid plan.

There are two easy ways to track your usage. You can use the GeoAnalytics Engine dashboard associated with your account to view the overall usage and graphically explore your past usage. Note that this dashboard isn’t updated immediately as the data is collected and there may be a short lag between completing a process and the dashboard being updated.

The GeoAnalytics Engine Dashboard for tracking core-hour usage

Another way to track usage is directly in the notebook. After you have authenticated your license, you can use the geoanalytics.auth_info() or geoanalytics.usage() functions to explore the current usage status. They both provide slightly different sets of information.

Information returned from geoanalytics.auth_info()

Information returned from geoanalytics.usage()

auth_info() lists the authorization information for the authorized user. That includes user name, session time, total core-hours available, and the current session usage. Session usage time is reported in core-milliseconds.

usage() returns just the usage information for the authorized user. The information returned can be tailored to provide details about usage for a specific time span of interest using the span and period parameters. Usage time is reported in core-milliseconds.

Using these two functions you can explore your usage as you go through a notebook – for instance, to understand how many core-hours are used for a specific operation on your data, how many core-hours are used throughout an entire notebook of calculations, or to understand the overall usage on your account.

Core-hour tracking in action

Let’s dig in and track some calculations to see how many core-hours are used – and we’ll see just how efficient GeoAnalytics Engine can be! For the examples, I’ll use results from some analyses using GeoAnalytics Engine in a multi-node cluster in Databricks, and will add in results from a single-node cluster at the end for comparison.

Before I run any operations, I can check the number of core-milliseconds used (reporting via auth_info() or usage() is always in core-milliseconds, even though the overall unit for the Connected license is in core-hours), and then as I complete operations, I can see what has been used. In this case, I started at 0, read in a csv and the usage is still 0 – because reading CSV files is a Spark function, not a GeoAnalytics Engine-specific function.

Reading in a CSV of data does not use any core hours as it is a spark function, not specific to GeoAnalytics Engine. The graphic shows 0 core milliseconds used

After using the ST_Point function from GeoAnalytics Engine to create a point geometry from the latitude and longitude fields in the CSV, the usage is now 3475 core-milliseconds. That translates to ~0.0009 core-hour. Not very much.

After using the ST_Point function to generate a point geometry, the core millisecond usage is up to 3475

On the other hand, if I read in an Esri feature service, we would expect that the session usage to increase since reading feature services is a GeoAnalytics Engine-specific functionality. Let’s see what happens:

Example of reading in a feature service. The core milliseconds used did not change.

After reading the feature service the session usage is still 3475! This is an important thing to note – Spark uses lazy evaluation, so when we “read” that feature service, we didn’t actually “read” it. What??

Lazy evaluation is a fundamental concept in Spark – it defers calculations until specific actions (e.g., count, collect, show, etc.) take place, rather than running them immediately. This allows for creation of more efficient execution plans since all of the transformations that need to happen can be collected up to create an optimized execution plan for when the action is triggered.

This is an important note if you want to explore your core-hour usage before and after running operations. If you don’t have an action then no core-hours are used since nothing has actually run. Let’s see what happens if we add in an action – in this case we’ll persist the data (save it in memory) and then get a count of the number of records. This count action triggers the actual read of the feature service and then persists it in memory, which will make it much faster for subsequent actions.

An example showing the use of persist() to write the data to memory. This uses core-milliseconds since it is a feature service

After really reading in the feature service and persisting the data to memory, we’ve used some more core-hours. We’re up at 121,425 core-milliseconds, or ~0.033 core-hours.

Let’s add in a bigger spatial process to explore core-hour usage. For this, we’ll perform a reasonably sized spatial join with a about 2.4 million point records and the ~85,000 tracts we just read in from a feature service. First, we’ll read in a file geodatabase, then persist that data frame. That takes our session usage to 151,408 core-milliseconds, or 0.042 core-hours.

Example of reading in a geodatabase and persisting the dataframe.

After running a spatial join using ST_Within, our total core-millisecond usage for the entire workflow is at 400,174, or 0.111 core-hours.

Example of performing a spatial join with ST_Within

Comparing core-hour usage in different computing environments

With computing in a cloud environment like Databricks, you have the option to spin up computing resources of varying sizes. How much difference does that make in terms of the core-hour consumption? Not much! Core-hours in GeoAnalytics Engine are all about how long the processing takes across all of the cores * the number of cores. When you have larger computing environments using more cores, the computing should be quicker. So, in doing the math, the number of core-hours often end up fairly similar… fewer cores for longer time vs. more cores for shorter time.

Let’s look at the differences in two different Databricks environments – and, for fun, let’s explore overall usage with the datasets being persisted in memory vs. not:

		Total time using GeoAnalytics Engine
Environment	Total time to run notebook (sec)	Core-milliseconds	Core-hours
Persist datasets
Databricks single-node	146.266	388,448	0.108
Databricks multi-node	118.137	400,174	0.111
Do not persist datasets
Databricks single-node	162.703	486,476	0.135
Databricks multi-node	116.001	456,357	0.127

For reference, here are the specifications for the two computing clusters that I used:

Databricks single-node

1 Driver 14 GB Memory, 4 Cores
Runtime 13.3.x-scala2.12, Spark 3.4.1
Photon enabled

Databricks multi-node

2-8 Workers 28-112 GB Memory, 8-32 Cores
1 Driver 14 GB Memory, 4 Cores
Runtime 13.3.x-scala2.12, Spark 3.4.1
Photon enabled

Conclusion

In this blog post we looked at the concept of core-hours in GeoAnalytics Engine and examined how many core-hours were used for some basic processes like reading data and performing spatial joins. We also looked at the functionality in GeoAnalytics Engine to track how many core hours are consumed in various processes so that you can better understand and analyze your usage.

Hopefully, this post has been helpful in understanding how core-hours are tracked in GeoAnalytics Engine and how you can use them to keep tabs on your product usage. We’d love to hear how this technique is useful in analyzing your analytic workflows. If you have questions about this or any other GeoAnalytics Engine tools and functions, please feel free to provide feedback or ask questions in the comments section below.