JSON Data Structures - Working with Hierarchy and Multicardinality

42541
14
07-24-2018 05:26 PM
RJSunderman
Esri Regular Contributor
8 14 42.5K

When speaking with customers who want to get started with ArcGIS GeoEvent Server, I'm often asked if GeoEvent Server has an input connector for a specific data vendor or type of device. My answer is almost always that we prefer to integrate via REST and the question you should be asking is: "Does the vendor or device offer a RESTful API whose endpoints a GeoEvent Server input can be configured to query?"

Ideally, you want to be able to answer two integration questions:

  1. How is the data being sent to a GeoEvent Server input?
  2. How is the data formatted; what does the data's structure look like?

For example, an input can be configured to accept data sent to a GeoEvent Server hosted REST endpoint. That answers the first question - integration will occur via REST with the vendor sending data as an HTTP/POST request to a GeoEvent Server endpoint. The second question, how is the data formatted, is the focus of this blog.

What does a typical JSON data record look like?

Typically, when a data vendor sends event data formatted as JSON, there will be multiple event records organized within a list such as this:

{
    "items": [{
                  "id": 3201,
                  "status": "",
                  "calibrated": 1521135120000,
                  "location": {
                         "latitude": -117.125,
                         "longitude": 31.125
                  }
           },
           {
                  "id": 5416,
                  "status": "offline",
                  "calibrated": 1521638100000,
                  "location": {
                         "latitude": -113.325,
                         "longitude": 33.325
                  }
           },
           {
                  "id": 9823,
                  "status": "error",
                  "calibrated": 1522291320000,
                  "location": {
                         "latitude": -111.625,
                         "longitude": 35.625
                  }
           }
    ]
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

There are three elements, or objects, in the block of JSON data illustrated above. It would be natural to think of each element as an event record with its own "id", "status", and "location". Each event record also has a date/time the item was last "calibrated" (expressed as an epoch long integer in milliseconds).

What do we mean when we refer to a "multi-cardinal" JSON structure?

The JSON data illustrated above is multi-cardinal because the data has been organized within an array. We say the data structure is multi-cardinal because its cardinality, in a mathematical sense of the number of elements in a group, is more than one. The array is enclosed within a pair of square brackets:  "items": [ ... ]

If the array were a list of simple integers the data would look something like:  "values": [ 1, 3, 5, 7, 9 ]

The data elements in the illustration above are not simple integers. Each item is bracketed within curl-braces which is how JSON identifies an object. For GeoEvent Server, it is important that both the array have a name and that each object within the array have a homogeneous structure, meaning that every event record should, generally speaking, use a common schema or collection of name/value pairs to communicate the item's data.

What do we mean when we refer to a "hierarchical" JSON structure?

The data elements in the array are themselves hierarchical. Values associated with "id", "status", and "calibrated" are simple numeric, string, or Boolean values. The "location" value, on the other hand, is an object which encapsulates two child values -- "latitude" and "longitude". Because "location" organizes its data within a sub-structure the overall structure of each data element in the array is considered hierarchical.

It should be noted that the coordinate values within the "location" sub-structure can be used to create a point geometry, but "location" itself is not a geometry. This is evident by examining how a GeoEvent Definition is used to represent the data contained in the illustrated block of JSON.

Different ways of looking viewing this data using a GeoEvent Definition

In GeoEvent Server, if you were to configure a new Receive JSON on a REST Endpoint input, leaving the JSON Object Name property unspecified, selecting to have an GeoEvent Definition created for you, and specifying that the inbound adapter not attempt to construct a geometry from received attribute values, the GeoEvent Definition created would match the one illustrated below:

GeoEvent Definition

Notice the cardinality of "items" is specified as Many (the infinity sign simply means "more than one"). Also, when the block of JSON data illustrated above is sent to the input via HTTP/POST, the input's event count only increments by one, indicating that only one event record was received.

Also notice that, in this configuration, "items" is a Group element type. This implies that in addition to the structure being multi-cardinal, it's also organized as a group of elements, which in JSON is typically an array.

Finally, notice that the "location" is also a Group element type. The cardinality of "location", however, is One not Many. This tells you that the value is a single element, not an array of elements or values.

Accessing data values

Working with the structure specified in the GeoEvent Definition illustrated above, if you wanted to access the coordinate values for "latitude" or "longitude" you would have to specify which latitude and longitude you wanted. Remember, the data was received as a single event record and "items" is a list or array of elements. Each element in the array has its own set of coordinate values. Consider the following expressions:

  items[2].location.longitude

  items[2].location.latitude

The expressions above specify that the third element in the "items" list is the one in which you are interested. You cannot refer to items.location.latitude because you have not specified an index to select one of the three elements in the "items" array. The array's index is zero-based, which means the first item is at index 0, the second is at index 1, and so on.

Ingesting this data as a single event record is probably not what you would want to do. It is unlikely that an arbitrary choice to use the third element's coordinates, rather than the first or second element in the list, would appropriately represent the items in the list. These three items have significantly different geographic locations, so we should find a way to ingest them as three separate event records.

Re-configuring the data ingest

When I first mentioned configuring a Receive JSON on a REST Endpoint input to allow the illustrated block of JSON to be ingested into GeoEvent Server for processing, I indicated that the JSON Object Name property should be left unspecified. This was done to support a discussion of the data's structure.

If the illustrated JSON data were representative of data you wanted to ingest, you should specify an explicit value for the JSON Object Name parameter when configuring the GeoEvent Server input. In this case, you would specify "items" as the root node of the data structure.

Specifying "items" as the JSON Object Name tells the input to handle the data as an array of values and to ingest each item from the array as its own event record. If you make this change to our input, and delete the GeoEvent Definition it created the last time the JSON data was received, you will get a slightly different GeoEvent Definition generated as illustrated below:

 GeoEvent Definition

The first thing you should notice, when the illustrated block of JSON data is sent to the input, is the input's event count increments by three -- indicating that three event records were received by GeoEvent Server. Looking at the new GeoEvent Definition, notice there is no attribute named "items" -- the elements in the array have been split out so that the event records could be ingested separately. Also notice the cardinality of each of the event record attributes is now One. There are no lists or arrays of multiple elements in the structure specified by this GeoEvent Definition. The "location" is still a Group which is fine; each event record should have (one) location and the coordinate values can legitimately be organized as children within a sub-structure.

The updates to the structure specified in the GeoEvent Definition change how the coordinate values are accessed. Now that the event records have been separated, you can access each record's attributes without specifying one of several element indices to select an element from a list.

You should now be ready to re-configure the input to construct a geometry as well as make some minor updates to the data types of each attribute in the GeoEvent Definition in order to handle "id" as a Long and "calibrated" as a Date. You also need to add a new field of type Geometry to the GeoEvent Definition to hold the geometry being constructed.

GeoEvent Input

GeoEvent Definition

Hopefully this blog provided some additional insight on working with hierarchical and multi-cardinal JSON data structures in GeoEvent Server. If you have ideas for future blog posts, let me know, the team is always looking for ways to make you more successful with the Real-Time & Big Data GIS capabilities of ArcGIS.

14 Comments
RJSunderman
Esri Regular Contributor

Check for understanding

If the above blog content made sense, let's complicate the data we expect to receive by having the maintenance date/time values for each item reported as a list of values (rather than a single value).

  • How would this change the GeoEvent Definition?
  • Does the "calibrated" attribute's type change from Date to Group?
  • Is every record in the "items" list required to have the same number of reported maintenance visits?
  • Is it possible to extract the most recent calibration date from the list of date/time values for event notification?

Here's a look at the proposed changes to the block of JSON you might expect to receive:

{
    "items": [{
            "id": 3201,
            "status": "online",
            "calibrated": [1521135120000, 1521136416000, 1521137712000],
            "location": {
                "latitude": -117.125,
                "longitude": 31.125
            }
        },
        {
            "id": 5416,
            "status": "offline",
            "calibrated": [1521638100000],
            "location": {
                "latitude": -113.325,
                "longitude": 33.325
            }
        },
        {
            "id": 9823,
            "status": "error",
            "calibrated": [1522291320000, 152229261600],
            "location": {
                "latitude": -111.625,
                "longitude": 35.625
            }
        }
    ]
}‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Only a couple of changes need to be made to the GeoEvent Definition to accommodate the JSON data illustrated above. First, the cardinality of the "calibrated" event attribute needs be changed from One to Many (since its value is now an array or list rather than a single date/time value). The data type, however, does not change. The values in the array can still be treated as Date; the element is not a Group element.

Second, you would need to remove the TIME_START tag from the element. GeoEvent Server can handle lists of values as an array without requiring that every value in the array be a name/value pair, but tags can only be applied to named values, and the individual values in the new "calibrated" arrays do not have names.

GeoEvent Definition

One advantage to handling each "calibrated" element as a variable length array is that individual records are not required to have the same number of maintenance date/time values recorded. GeoEvent Server does require that every value in the array be the same type (e.g. Long, String, Date, ..., or Group), but that's not an issue in this case as all the values are epoch long integer representations of date values in milliseconds. In the first illustration above, sensor 5416 has only been calibrated once, but the other two sensors have been calibrated multiple times.

The disadvantage to handling each "calibrated" element as a variable length array is that there is no easy way to get the 'most recent' calibration date. It appears, looking at the data, that the last value in each list is the most recent calibration date. But you cannot use a Field Mapper Processor in GeoEvent Server, for example, to extract the last value from each event record's "calibration" list -- you don't know how many values will be in each array, so you cannot use an index to access a particular value. None of the  processors available out-of-the-box with GeoEvent Server provide the ability to iterate over a list of values, or provide a count of the number of values in a list, so you cannot identify the last value in any particular list.

Assuming extraction of the most recent calibration date was required, your best option in this case would be to use the ArcGIS GeoEvent Server SDK to develop a custom processor which implemented a list iterator, or work with the data provider to see if the data schema could be modified to provide the most recent calibration date as a data value whose cardinality was One rather than Many.

AllenScully
Occasional Contributor III

Hi Leo - 

Curious how this is going for you - we're also working to ingest bikeshare data - from the looks of it the same API - 

and am running into the same issue:

{"last_updated":1551371458,"ttl":3,"data":{"stations":[{"station_id":"1",.....

The json object name would be "data", if I understand correctly, but how to reference the actual name/values is proving tricky for me since there's an additional level ("stations" for be, "bikes" for your example).

Is it possible to reference this data using something like "data.stations.station_id"  for example?

Thanks - 

Allen

LeoLadefian5
Occasional Contributor II

Hi Allen, I accidentally deleted my post, but. By specifying the  JSON object name "bikes" I was able to get the individual attributes, lat, lon etc:

AllenScully
Occasional Contributor III

Excellent - works perfectly.  Many thanks. 

AllenScully
Occasional Contributor III

Sorry to bug you - one more question though, about dealing with the UNIX formatted dates.  A little unsure how to handle the 'Expected Date Format' for these dates (sent in 'seconds since 1970' format).  HAve you been able to crack that one? - thinking it's a field calculation in GEP but haven't found the right syntax.

AdamRepsher
Occasional Contributor III

Hi Allen,

I don't know that you need to define the Expected Date Format.  Try just connecting to a Text to JSON file output and see what it gives you.  Just make sure that the field is Date formatted in the input GE Definition.

--Adam

AllenScully
Occasional Contributor III

Thanks Adam -

So, the combination of setting the data type on the Input connector to Date (default is Double for this type of data in the generated GeoEvent Definition) - you need to multiply the date field in question by 1000 in a field calculator, due to GeoEvent expecting milliseconds, not seconds.  This worked for me.

For reference:

StackExchange - How can I convert a UNIX timestamp to a datetime format within the GeoEvent 

Note:

The above reference from Stack Exchange is a little dated (2016). Please also consider updated documentation for GeoEvent Server which shows how an expression toString( ) used in a Field Calculator can be used to obtain a String representation of a Date ... works if you need a String representation of a Geometry as well.

toString(fieldName)

Returns: string

Example: toString(Geometry)

Example: toString(Epoch)

Returns a string representation of the value in the specified event attribute field. These functions are intended to allow string representations of Date and Geometry attributes to be nested within other functions such as substring, so that real-time analytics can extract information from the higher-order data type.

Result: "{""x"":32.125,""y"":-117.125,""spatialReference"":{""wkid"":4326}}"

Result: "Fri Nov 03 17:07:56 PDT 2017"

See Also:  https://community.esri.com/community/gis/enterprise-gis/geoevent/blog/2019/03/14/what-time-is-it-wel...

EmmaPaz2
New Contributor

Hi there,

Is it normal for the Cardinality attribute of a group element within a geoevent definition to reject being saved as 'infinity'/many? I've tried saving as 'multi' multiple times and it appears to get overwritten back as '1' every time...

Thanks for your help,

EP

BrianLomas
Occasional Contributor III

What if your json hierarchy begins with a dynamic value such as a time stamp? 

RJSunderman
Esri Regular Contributor

Hello Emma ... When clicking the "node hierarchy" icon to the left of the pencil when the attribute is of type 'Group' I've had the GeoEvent Manager pop back out to the 'Group' level after editing one of the attributes nested beneath the group. Rather annoying as I have to then click the "node hierarchy" icon again to dive back into the group and add/edit another attribute within the group. But no, I haven't had edits I try to make to specify that an attribute is cardinality 'Many' rather than cardinality 'One' not take when I click 'Save' (to save my changes to the field attribute) and then 'Save' again (to save my overall changes to the GeoEvent Definition).

Sorry for what may be a reply too late to be helpful. If this is still a problem for you, please submit an incident to Esri Technical Support and they can likely help you through a screen share.

- RJ

RJSunderman
Esri Regular Contributor

Hey Brian ... what you've illustrated above is a problem for GeoEvent Server. The JSON is valid from the point-of-view of the specification, but by making the data structure so concise the data provider has basically overloaded an object key/name by making the key dynamic to also supply a data value. I cannot think of a way, out-of-the-box, to extract the string value of an event attribute name and write the value as an attribute value. Field Mapper, for example, wouldn't support this. A custom processor could, but that's a lot of overhead to assume and I'd only take that approach if there was no other way.

Could you possibly get the data provider to format the data as an array, rather than as nested objects with dynamic key names?

What you have...                                            What I'd recommend...

If you've no influence over the data provider, you might want to consider writing a Python script (or something similar) to take the data being offered and restructure it into something that GeoEvent Server is able to ingest. Sometimes developing such a Python "bridge" between a data provider and GeoEvent Server to perform some simple data manipulation or clean-up is easier than using the GeoEvent Server Java SDK to develop a custom inbound adapter or processor.

- RJ

EricWood4
New Contributor

Thank you so much for this post.  The description of how the "JSON Object Name" is used was a great help and was difficult to find anywhere else.

BrianLomas
Occasional Contributor III

I have been working with some xml data lately and have run into a problem with a list of items in the xml. It seems that the input only wants to pick up the last item in the list. Is there a workaround for this type of issue? Thanks. @RJSunderman 

 

<data type="list">
   <data type="item">
      <CFS_number></CFS_number>
      <External_Agency_number/>
      <Case_Number/>
      <Name></Name>
      <Codes></Codes>
      <Descriptions></Descriptions>
      <Description/>
      <Priority></Priority>
      <Street></Street>
      <Unit_number/>
      <Latitude></Latitude>
      <Longitude></Longitude>
      <CFS_Date_Time></CFS_Date_Time>
      <Reported_At></Reported_At>
      <Responder_Units></Responder_Units>
      <Responder_Personnel></Responder_Personnel>
      <Primary_Units></Primary_Units>
   </data>
   <data type="item">
      <CFS_number></CFS_number>
      <External_Agency_number/>
      <Case_Number/>
      <Name></Name>
      <Codes></Codes>
      <Descriptions></Descriptions>
      <Description/>
      <Priority></Priority>
      <Street></Street>
      <Unit_number/>
      <Latitude></Latitude>
      <Longitude></Longitude>
      <CFS_Date_Time></CFS_Date_Time>
      <Reported_At></Reported_At>
      <Responder_Units></Responder_Units>
      <Responder_Personnel></Responder_Personnel>
      <Primary_Units></Primary_Units>
   </data>
 
</data>

 

RJSunderman
Esri Regular Contributor

@BrianLomas -- Would you please open an incident with Esri Technical Support on this so an analyst can work with you to establish reproducibility?  Off the cuff, I'm thinking that the repeated key 'data' is going to be a problem. I don't know that you're going to be able to specify an XML Object Name for a Poll an External Website for XML input to use to jump forward to the correct substructure in the XML and begin reading data from that point.

We'll need to take a look at your GeoEvent Definition to make sure the cardinality of the different attribute keys are configured to properly interpret <data type="list"> as a single item (cardinality 1) and the nested <data type="item"> as a collection of items (cardinality many).

By chance did you review the post XML Data Structures - Characteristics and Limitations?  It contains some information which complements this article's discussion JSON Data Structures - Working with Hierarchy and Multicardinality