JSON Data Structures - Working with Hierarchy and Multicardinality

RJSunderman · ‎07-24-2018

When speaking with customers who want to get started with ArcGIS GeoEvent Server, I'm often asked if GeoEvent Server has an input connector for a specific data vendor or type of device. My answer is almost always that we prefer to integrate via REST and the question you should be asking is: "Does the vendor or device offer a RESTful API whose endpoints a GeoEvent Server input can be configured to query?"

Ideally, you want to be able to answer two integration questions:

How is the data being sent to a GeoEvent Server input?
How is the data formatted; what does the data's structure look like?

For example, an input can be configured to accept data sent to a GeoEvent Server hosted REST endpoint. That answers the first question - integration will occur via REST with the vendor sending data as an HTTP/POST request to a GeoEvent Server endpoint. The second question, how is the data formatted, is the focus of this blog.

What does a typical JSON data record look like?

Typically, when a data vendor sends event data formatted as JSON, there will be multiple event records organized within a list such as this:

{
    "items": [{
                  "id": 3201,
                  "status": "",
                  "calibrated": 1521135120000,
                  "location": {
                         "latitude": -117.125,
                         "longitude": 31.125
                  }
           },
           {
                  "id": 5416,
                  "status": "offline",
                  "calibrated": 1521638100000,
                  "location": {
                         "latitude": -113.325,
                         "longitude": 33.325
                  }
           },
           {
                  "id": 9823,
                  "status": "error",
                  "calibrated": 1522291320000,
                  "location": {
                         "latitude": -111.625,
                         "longitude": 35.625
                  }
           }
    ]
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

There are three elements, or objects, in the block of JSON data illustrated above. It would be natural to think of each element as an event record with its own "id", "status", and "location". Each event record also has a date/time the item was last "calibrated" (expressed as an epoch long integer in milliseconds).

What do we mean when we refer to a "multi-cardinal" JSON structure?

The JSON data illustrated above is multi-cardinal because the data has been organized within an array. We say the data structure is multi-cardinal because its cardinality, in a mathematical sense of the number of elements in a group, is more than one. The array is enclosed within a pair of square brackets: "items": [ ... ]

If the array were a list of simple integers the data would look something like: "values": [ 1, 3, 5, 7, 9 ]

The data elements in the illustration above are not simple integers. Each item is bracketed within curl-braces which is how JSON identifies an object. For GeoEvent Server, it is important that both the array have a name and that each object within the array have a homogeneous structure, meaning that every event record should, generally speaking, use a common schema or collection of name/value pairs to communicate the item's data.

What do we mean when we refer to a "hierarchical" JSON structure?

The data elements in the array are themselves hierarchical. Values associated with "id", "status", and "calibrated" are simple numeric, string, or Boolean values. The "location" value, on the other hand, is an object which encapsulates two child values -- "latitude" and "longitude". Because "location" organizes its data within a sub-structure the overall structure of each data element in the array is considered hierarchical.

It should be noted that the coordinate values within the "location" sub-structure can be used to create a point geometry, but "location" itself is not a geometry. This is evident by examining how a GeoEvent Definition is used to represent the data contained in the illustrated block of JSON.

Different ways of looking viewing this data using a GeoEvent Definition

In GeoEvent Server, if you were to configure a new Receive JSON on a REST Endpoint input, leaving the JSON Object Name property unspecified, selecting to have an GeoEvent Definition created for you, and specifying that the inbound adapter not attempt to construct a geometry from received attribute values, the GeoEvent Definition created would match the one illustrated below:

GeoEvent Definition

Notice the cardinality of "items" is specified as Many (the infinity sign simply means "more than one"). Also, when the block of JSON data illustrated above is sent to the input via HTTP/POST, the input's event count only increments by one, indicating that only one event record was received.

Also notice that, in this configuration, "items" is a Group element type. This implies that in addition to the structure being multi-cardinal, it's also organized as a group of elements, which in JSON is typically an array.

Finally, notice that the "location" is also a Group element type. The cardinality of "location", however, is One not Many. This tells you that the value is a single element, not an array of elements or values.

Accessing data values

Working with the structure specified in the GeoEvent Definition illustrated above, if you wanted to access the coordinate values for "latitude" or "longitude" you would have to specify which latitude and longitude you wanted. Remember, the data was received as a single event record and "items" is a list or array of elements. Each element in the array has its own set of coordinate values. Consider the following expressions:

items[2].location.longitude

items[2].location.latitude

The expressions above specify that the third element in the "items" list is the one in which you are interested. You cannot refer to items.location.latitude because you have not specified an index to select one of the three elements in the "items" array. The array's index is zero-based, which means the first item is at index 0, the second is at index 1, and so on.

Ingesting this data as a single event record is probably not what you would want to do. It is unlikely that an arbitrary choice to use the third element's coordinates, rather than the first or second element in the list, would appropriately represent the items in the list. These three items have significantly different geographic locations, so we should find a way to ingest them as three separate event records.

Re-configuring the data ingest

When I first mentioned configuring a Receive JSON on a REST Endpoint input to allow the illustrated block of JSON to be ingested into GeoEvent Server for processing, I indicated that the JSON Object Name property should be left unspecified. This was done to support a discussion of the data's structure.

If the illustrated JSON data were representative of data you wanted to ingest, you should specify an explicit value for the JSON Object Name parameter when configuring the GeoEvent Server input. In this case, you would specify "items" as the root node of the data structure.

Specifying "items" as the JSON Object Name tells the input to handle the data as an array of values and to ingest each item from the array as its own event record. If you make this change to our input, and delete the GeoEvent Definition it created the last time the JSON data was received, you will get a slightly different GeoEvent Definition generated as illustrated below:

GeoEvent Definition

The first thing you should notice, when the illustrated block of JSON data is sent to the input, is the input's event count increments by three -- indicating that three event records were received by GeoEvent Server. Looking at the new GeoEvent Definition, notice there is no attribute named "items" -- the elements in the array have been split out so that the event records could be ingested separately. Also notice the cardinality of each of the event record attributes is now One. There are no lists or arrays of multiple elements in the structure specified by this GeoEvent Definition. The "location" is still a Group which is fine; each event record should have (one) location and the coordinate values can legitimately be organized as children within a sub-structure.

The updates to the structure specified in the GeoEvent Definition change how the coordinate values are accessed. Now that the event records have been separated, you can access each record's attributes without specifying one of several element indices to select an element from a list.

You should now be ready to re-configure the input to construct a geometry as well as make some minor updates to the data types of each attribute in the GeoEvent Definition in order to handle "id" as a Long and "calibrated" as a Date. You also need to add a new field of type Geometry to the GeoEvent Definition to hold the geometry being constructed.

GeoEvent Input

GeoEvent Definition

Hopefully this blog provided some additional insight on working with hierarchical and multi-cardinal JSON data structures in GeoEvent Server. If you have ideas for future blog posts, let me know, the team is always looking for ways to make you more successful with the Real-Time & Big Data GIS capabilities of ArcGIS.