Getting Started with Distributed Processing in ArcGIS Reality Studio

FelixRohrbach · ‎02-24-2025

Distributed processing within Reality Studio was designed with a focus on simplicity, robustness and productivity. It allows you to maximize the usage of your IT infrastructure and quickly react to changes in your project timelines. In this blog post, we will show you how easy it is to get started.

By distributing the workload across multiple machines, you gain several benefits:

Robustness: Our system remains stable, even if a single computer fails, ensuring uninterrupted processing.
Speed: Distributed processing significantly reduces time to completion by utilizing the combined power of several processing nodes.
Productivity: By queueing up multiple projects, processing nodes will continue to process around the clock.

This article will guide you through the essential steps to get started with distributed processing. Before continuing though, please ensure that you have correctly configured each of your processing nodes by completing the following steps:

Install and license ArcGIS Reality Studio on processing nodes that follow our hardware requirements
Configure network environment so that all machines can access the same locations when entering identical paths
Define a local temporary processing location with sufficient disk space

Here are some resources that help you with your basic setup:

Creating a Workspace

Creating a workspace in ArcGIS Reality Studio is straightforward. A workspace is simply a folder at a centralized location, accessible by all contributing nodes. Because of this, it is trivial to create multiple workspaces to have clusters working in parallel. A workspace is a central hub where all processing tasks and data are stored.

To create a workspace:

Create a new folder at a location that is accessible to all processing nodes
Done! (see how simple this is?!)

There is no extra software needed, no complicated configuration... we just need a basic folder, and your workspace is ready for use.

Workspace Strategies

Thanks to this simple process, there are no limits to your strategy for using workspaces. Some examples are:

A single centralized workspace for all your projects: straightforward and probably recommended for most organizations to maximize productivity.
Priority workspaces: divide your processing nodes between workspaces (e.g., low-priority / high-priority) and assign processing nodes accordingly. The more nodes you assign, the faster the throughput of your projects.
Versioned workspaces: workspaces need to be compatible with the software version you are using. When upgrading, it can make sense to maintain a workspace to finish old projects, while others already change to a newer version.

Important Considerations

There are certain key factors to keep in mind:

The location of the workspace needs to handle the resulting I/O traffic generated by processing nodes connecting to it.
Workspaces also contain the results of your processing. It is crucial to ensure that there is sufficient storage space at the location provided.
Moving a reconstruction from one workspace to another will reset its progress. Be mindful about this when submitting reconstructions to a workspace.

Connecting Processing Nodes to a Workspace

With the workspace set up, it is now time to connect each of the processing nodes to it. As we have already installed the software, it only takes a few clicks to start Reality Studio and connect each of the machines to the workspace you created:

Start the ArcGIS Reality Studio application and from the start page select Distributed Processing.
Verify that the temporary processing location has been set correctly and that there is sufficient free disk space for processing.
Click the Contribute to a Workspace button at the top, select the folder you created previously, and click OK.

That's it! Reality Studio will now automatically connect to the workspace and look for available tasks to process. Repeat these steps for all your remaining processing nodes.

Don't worry at this stage if processing does not start immediately! - First, we need to submit some jobs to this new workspace...

Submitting Jobs to a Workspace

Most of the time you will be working with Reality Studio on your personal workstation. When creating new reconstructions, you can now select the workspace you just created when defining your reconstruction settings. Make sure to finish with Create and Submit to add this reconstruction to the job queue of the workspace.

If you have only created the reconstruction within your Reality Studio project, you will find the option to submit it within the Reconstruction ribbon. This is also where you can cancel a reconstruction, which will stop all processing and withdraw the reconstruction job from the processing queue. Doing so preserves the progress of your reconstruction and you can then resubmit the reconstruction at any time to resume processing, which will place it at the end of the job queue.

Distributed Processing Logic

As you submit reconstruction jobs to your workspace, your processing nodes will automatically pick up tasks for processing. When searching for a task, processing nodes will go through the list of jobs in your queue to check for available tasks. If a task is available in the first job, the processing node will self-assign and begin processing. If there are no tasks available, the node will move on to the next job in the queue until it finds a job with available tasks.

It's important to understand that task availability within a job changes throughout the process. Reconstructions are completed in stages, each consisting of an Analysis task, one or more Processing tasks, and a Finalization task. Analysis and Finalization tasks are not distributed and there will therefore be no other tasks available within the same job while these are being executed. As a result, processing nodes will then grab processing tasks from other reconstruction jobs to avoid idling.

Example:
You have connected 4 processing nodes to your workspace. From your workstation, you submit the first reconstruction job. The first task that becomes available is the Analysis task of the Image Preparation step. The first processing node (Node 1) will self-assign to this task and start processing. The other 3 nodes will remain idle as there are no other tasks available.
You immediately submit a second reconstruction job to your workspace. Your processing nodes notice this second job and the available Analysis task of the Image Preparation step for this reconstruction. A second node (Node 2) will now become active, self-assign to this task, and start processing. Nodes 3 and 4 remain idle as there are no other tasks available.
Node 1 finishes the Analysis task of the first reconstruction, generating 20 additional tasks for the Processing phase of this stage. With these tasks now available within the first reconstruction job, the available nodes (Nodes 1, 3, and 4) will automatically grab one of these tasks each and start processing.
Node 2 finishes the Analysis task of the second reconstruction, generating 25 additional tasks. It checks the first reconstruction and finds available tasks for processing. It grabs a task and starts processing this task of the first reconstruction. The 25 tasks generated for the second reconstruction remain available, and nodes will select these tasks once there are no tasks available in the first reconstruction.

While processing, nodes will refresh their claim to a task until processing is finished. If processing fails, the task is released and returned to the workspace in an unprocessed state. The processing node will then look for other tasks to process, ignoring the one it just failed. Processing nodes can also fail to renew their claim on an ongoing task, for example, if the node itself crashes or is restarted. As a result of this timeout, the task automatically becomes available again for other nodes to grab.

With this logic, your processing nodes will keep processing as long as there are tasks available. Failed tasks are automatically retried, ensuring a robust processing environment. Keep submitting fresh jobs to the workspace to ensure there is always something for your nodes to process.

Monitoring your Workspace Activity

With an increasing number of processing nodes connected and jobs submitted to a workspace, it's important to monitor your workspace to ensure everything is working as expected.

Workspace Monitor

The workspace monitor in Reality Studio provides a holistic overview of your workspace. By selecting the respective tab, you can choose between visualizing connected Nodes or the Jobs currently in the queue.

The Nodes view lists each individual processing node with its current status, the processing progress, and the name of the job it is working on. This view allows you to check the following:

How many nodes are currently contributing to this workspace?
Are all my nodes actively processing, or are there nodes that are idling?
Are there nodes that are not responding? (This could indicate an abnormal exit caused by a hardware crash, machine reboot, or similar issues.)

From the Jobs tab, you get an overview of all submitted projects. The table provides details on the current processing status and progress of each job, as well as the number of processing nodes working on each reconstruction. This view helps you answer the following questions:

Are there sufficient reconstruction jobs in my queue to ensure continuous processing by my nodes?
How far along is the processing of my reconstruction jobs?
Are there jobs where task processing has failed? (Indicated by a warning icon.)

To investigate a single job, select the job from the list and open the Job Monitor.

Job Monitor

The job monitor presents all details related to a single reconstruction job. The main part is a table that lists all the tasks generated for this reconstruction. This allows you to investigate the following:

Are there pending processing tasks in this reconstruction?
Are there tasks that failed or keep failing? (Indicated by a warning icon.)
What causes this task to fail? (Revealed by hovering over the status of a task.)
Has this reconstruction been reprocessed already? (Indicated by the number of iterations.)

Based on this information, you can then take measures if needed to further investigate issues within your processing environment.

Summary

Distributed processing in ArcGIS Reality Studio revolutionizes the way you handle reality mapping projects. By leveraging multiple machines to share the workload, you can significantly enhance both speed and productivity. Our robust distributed processing ensures that even if one machine fails, the other nodes will continue working without interruption. Additionally, the ability to queue multiple projects keeps your resources fully utilized, maximizing productivity. Embrace this powerful approach to streamline your workflows and achieve faster, more reliable results in your reality mapping projects.