When processing airborne data within ArcGIS Reality Studio, a common challenge that users face is how to handle the huge amount of data and deal with the long processing times required to generate the desired output products.
One way to deal with this is to spread the processing across multiple machines, leveraging the distributed processing capabilities Reality Studio offers. Most customers will use a centralized network storage in order to make data available between processing nodes. That said, processing is certainly also possible for users that integrated RAID storage into their workstations. Some consideration is required, and this blog post aims to describe the best practices for working with such hardware.
The RAID storage technology allows combining multiple disks into a combined storage. Depending on the RAID level used, this can provide improved performance, as well as additional protection against data loss. By intelligently merging multiple disks, it allows creating data storage volumes that go beyond the limitations that single physical disks on the market have.
Data transfer is handled by a RAID controller, and both controller and disks are commonly integrated into the workstation. The workstation then recognizes the combined disks as a single drive.
Combining such RAID storage with powerful processing components can provide a good single-node processing environment for aerial imagery data.
Transitioning from processing on a single workstation (or multiple independent workstations) to distributing processing across multiple machines will require some considerations for the following points:
When processing in a distributed environment, Reality Studio machines expect data to be accessible from the same path. Paths can be defined by drive letters, network shares mapped as local drive letters, UNC paths of network shares or even cloud resources. The important thing is that accessing the data is consistent across all machines (as in: entering the same path will provide access to the same file).
There are different solutions presented below that all enable required data access. Options are shown in order of the configuration effort they require to set them up. Solution 3 is highly recommended as it drastically minimizes the potential of mistakes a user can make when working with Reality Studio.
Example scenario: Jack has 2 workstations at the office, each containing a local RAID drive for data storage. He wants to use the storage of both RAID drives for his distributed processing with Reality Studio.
Share the RAID drives of both workstations to make them available throughout the network. Once shared you can use UNC paths when working with Reality Studio.
By using UNC paths to the RAID drives, any machine will be able to access the same data by entering the same path. Downside is that the user has to remember to use UNC paths throughout the workflow and might be tempted to select files/folders directly from the local "D:" drive.
Workstations with shared RAID drives (blue) that can be accessed using their UNC paths
Example:
Jack's machines "Node1" and "Node2" both have their RAID's setup as "D:". To configure this solution he performs the following steps:
When working within Reality Studio, he always uses these UNC paths to access the data instead of directly clicking on the drives when selecting files and folders.
Accessing shared drives can be simplified by mapping them to a drive letter. To ensure that paths to files remain identical across machines, the structure of mapped drives needs to be identical.
Share RAID drives of both machines to make them available throughout the network. Continue by mapping each shared drive on all the machines, using the same drive letter association.
By mapping each RAID drive to a drive letter, the user has them always visible when browsing for files or specifying folder locations. This simplifies accessing the data from RAID drives of the different machines, but the risk remains that a user will accidentally access data directly from the local "D:" drive.
Setup with additional mapped drives (gray) that point to the RAID drives (blue) on workstations
Example:
Setting up this solution is very similar to solution 1:
This now simplifies his work within Reality Studio. He no longer has to enter the UNC paths, but can directly use the mapped drives "F:" and "G:" to access the data. He still needs to ensure to work with the correct mapped paths to enable access to the data.
Change the drive letter for the RAIDs so that they are unique among machines. Share the drives and on each machine map the drives so that they point to the same storage. This way a node will always find a file/folder at the same path location, but sometimes this will be on a physical RAID drive and other times through the network on a remote RAID drive. There is no more risk of picking files/folders from the wrong drive anymore.
Hybrid setup with physical RAID drives (blue) and mapped network drives (gray)
Example: Jack wants to prevent picking files from the "D:" drive as he knows that this will lead to processing errors. With this simple additional step, he can ensure correct access to all drives:
With this setup, Jack can now browse these drives on both machines to access files and folders. During the processing, Reality Studio will use the drive letters, and depending on which machine the processing task is executed, this drive letter will then either point to the physical RAID drive or the mapped drive pointing to the RAID drive on the other workstation.
Regarding fast data access there is no special consideration required when moving to a distributed processing set up. In both cases (local and distributed), the speed for accessing data can be improved when utilizing SSD disks and by paying attention to the input/output speeds of disks used.
An essential part of distributed processing is that data is transferred from/to processing machines across the network. Multiple machines will likely access data stored on the RAID drives when processing the same project. It is therefore recommended to select high-grade network cards with high transfer speeds.
For regular processing nodes, a 1gbit network connection will normally suffice. When it comes to NAS or workstations with RAID drives that serve data to multiple processing nodes, we recommend investing in 10gbit connections, as this will help greatly to improve data transfer between machines.
Distributed processing with RAID drives is natively supported by Reality Studio and can be simplified and optimized by considering above points.
If you have questions or need help configuring your processing environment, please reach out to your Esri Distributor, Esri Support, or leave a comment below.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.