High-Performance Computing and Analytics on 'Real-time' Geospatial and Upstream Big Data

Blog Post created by hlzhang525 on Dec 16, 2014

Today, data are exponentially growing everywhere, which make it so large and complex that it becomes difficult to handle those effectively through using traditional computing and algorithms, including geospatial data (GIS, remote sensing) and (upstream) big data.upstream-big-data.PNG

Big Data in E&P sector


The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.


For example, in an oilfiled, thousands of oil production wells are equipped with many sensors that daily capture massive amounts of information about the well flowing conditions. Yearly, tens of TB data are collected and stored into data stores, which are required to effectively process and analyze.

Similarly in geospatial domain, when dealing with (hundreds of GB to Terabytes of high-resolution) 'continuous' image data, most CPUs-based algorithms in research and development do not have sufficient computing power to perform traditional image processing tasks in a timely fashion ('on-demand' or real-time), even though the latest development of GIS and remote sensing can demonstrate the capability to deal with real-time 'discrete' event-based data /signal /network, for example, by ArcGIS GeoEvent and ESRI GIS tools for Hadoop. In operation, therefore, the processing power of the typical CPU desktop workstations can become a severe bottleneck in the process of viewing and enhancing high resolution image data, including color-balancing, bundle adjustment, real-time change detection, oil-spills, etc.

All those data applications require timely responses for swift decisions, which depend upon real-time /near real-time performance of algorithm analysis and imagery processing (like color balancing) in advanced IT environments (i.e., GPUs cluster computing, 'Cloud' computing, MapReduce-enabled applications).

These on-demand systems and applications can greatly benefit from high performance computing techniques and practices to speed up data processing and visulization, either after the data has been collected and transmitted to a ground station on Earth, or during the data collection procedure onboard the sensor, in real-time fashion.


Parallel and distributed computing facilities and algorithms as well as high-performance FPGA and DSP systems have become indispensable tools to tackle the issues of processing massive stream GIS, remote sensing, and upstream big data (below). In order to make use of those facilities and algorithms properly, some solution vendores offer the optimal tools like SAS® Grid Manager SAS Grid Manager | SAS to do so.

In recent years, GPUs have evolved into highly parallel many-core processors with tremendous computing power and high memory bandwidth to offer two to three orders of magnitude speedup over the CPUs. A cost-effective GPU computer has become an affordable alternative to an expensive CPU computer cluster for many engineers and researchers performing various engineering and scientific applications. Comparison of Laptop Graphics Cards - NotebookCheck.net Tech


In operation, many advanced high-performance computing algorithms of big data with SAS and R have been successfully applying to oil producing data analytics. However, the research and solution development in remote sensing community are still facing the innovative challenges to meet the operational requirements in real-time or MapReduce-enabled applications, especially, like automation of color-balancing and bundle adjustment, in addition to real-time detection of changes & oil spills, etc.


In practice, we should use effective tools to monitor /manage /diagnose ‘high availability’ computing performance, especially as a single computing system, like Parallel Processing and (load-balancing) cluster computing. As well know, those high performance architectures are usually performed as a single computing system, including N nodes (CPU-GPU), shared memory, and /or virtual machines, which are obviously performed unlike grid computing and cloud computing.


For example, the simplest task is to monitor how good /bad the geoprocessing tools from ArcGIS 10.3 and Pro 1.0 use the multiple cores and processors.




Linux and CUDA-enabled GPU Computing

CUDA® is a parallel computing platform and programming model invented by NVIDIA. It enables dramatic increases in computing performance by harnessing the power of the graphics processing unit (GPU).

Refer to Getting Started Linux :: CUDA Toolkit Documentation


Windows and CUDA-enabled GPU Computing


Refer to Getting Started Windows :: CUDA Toolkit Documentation


WebGL and the implementation of MPEG Dash –a streaming video standard that has been slowly picking up steam among industry players — in IE11


Microsoft (Finally) Confirms WebGL Support For Internet Explorer 11 | TechCrunch