Skip navigation
All Places > GIS > Enterprise GIS > GeoEvent > Blog > 2018 > July
2018

In a separate blog, JSON Data Structures - Working with Hierarchy and Multicardinality, I wrote about how data can be organized in a JSON structure, how to recognize data hierarchy and cardinality from a GeoEvent Definition, and how to access data values given a hierarchical, multi-cardinal, data structure.

In this blog, we'll explore XML, another self-describing data format which -- like JSON -- has a specific syntax that organizes data using key/value pairs. XML is similar to JSON, but the two data formats are not interchangeable.

What does XML support that JSON does not?

One difference is that XML supports both attribute and element values whereas JSON really only supports key/value pairs. With JSON you generally expect data values will be associated with named fields. Consider the two examples below (credit: w3schools.com):

<person sex="female">
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

The XML in this first example above provides information on a person, "Anna". Her first and last name are provided as elements whereas her gender is provided as an attribute value.

<person>
  <sex>female</sex>
  <firstname>Anna</firstname>
  <lastname>Smith</lastname>
</person>

The XML in this second example above provides the same information, except now all of the data is provided using element values

Both XML structures are valid, but if you have any influence with your data provider, it is probably better to avoid attribute values and instead use elements exclusively when ingesting XML data into GeoEvent Server. This is only a recommendation, not a requirement. As you will see in the following examples, GeoEvent Server can successfully adapt XML which contains attribute values.

Here's a little secret:  GeoEvent Server does not actually handle XML data at all.

GeoEvent Server uses third party libraries to translate XML it receives to JSON. The JSON adapter is used interpret the data and create event records from the translated data. Because JSON does not support attribute values, all data values in an XML structure must be translated as elements. Consider the following illustration which shows how a block of XML data might be translated to JSON by GeoEvent Server:

XML vs. JSON

Notice the JSON on the right in this example organizes each event record as separate elements in a JSON array. Also notice the first line of the XML on the left which declares the version and encoding being used. The libraries GeoEvent Server uses to translate the XML to JSON really like seeing this information as part of the XML data. Finally, sometimes XML will include non-visible characters such as a BOM (byte-order mark). If the XML you are trying to ingest is not being recognized by an input you've configured, try copying the XML into a text editor and saving a text-only version to strip out any hidden characters.

 

Other limitations to consider when ingesting XML

There are several other limitations to consider when ingesting XML data into GeoEvent Server. Sometimes a block of JSON might pass an online JSON validator such as the one provided by JSON Lint but still not be ingested into GeoEvent Server. The JSON syntax rules, for example, do not require that every nested element have a name; yet without a name, it is impossible to construct a GeoEvent Definition since every event attribute must have a name to create a complete GeoEvent Definition.

Similarly, there are XML structures which are perfectly valid which GeoEvent Server may have trouble ingesting. Consider the following block of XML data as an example:

<?xml version="1.0" encoding="utf-8"?>
<data>
  <vehicles>
    <vehicle make="Ford" model="Explorer">
      <license_plate>4GHG892</license_plate>
    </vehicle>
    <vehicle make="Toyota" model="Prius">
      <license_plate>6KLM153</license_plate>
    </vehicle>
  </vehicles>
  <personnel>
    <person fname="James" lname="Albert">
      <employee_number>1234</employee_number>
    </person>
    <person fname="Mary" lname="Smith">
      <employee_number>7890</employee_number>
    </person>
  </personnel>
</data>

The XML data illustrated above contains a mix of both "vehicles" and "personnel". The self-describing nature of the XML makes it apparent to a reader which data elements are which, but an input in GeoEvent Server may still have trouble identifying the multiple occurrences of the different data items if the inbound adapter's XML Object Name property is not specified.

Here is the GeoEvent Definition the inbound adapter generates when its XML Object Name property is left unspecified and the XML data sample above is ingested into GeoEvent Server:

GeoEvent Definition

In testing, the very first time the XML with the combination of "vehicles" and "personnel" was received and written out as JSON to a system text file, I observed only one person and one vehicle were written to the output file. Worse yet, without changing the generated GeoEvent Definition or any of the input connector's properties, sending the exact same XML a second time produced an output file with "vehicles" and "personnel" elements that were empty.

We know from the JSON Data Structures - Working with Hierarchy and Multicardinality blog that, at the very least, the cardinality specified by the generated GeoEvent Definition is not correct. The GeoEvent Definition also implies a nesting of groups within groups, which is probably not correct.

Working around the issue

Let's explore how you might work around the issue identified above using the configurable properties available in GeoEvent Server. First, ensure the XML input connector specifies which node in the XML should be treated as the root node by setting the XML Object Name property accordingly as illustrated below:

GeoEvent Input

Second, verify the GeoEvent Definition has the correct cardinality for the data sub-structure beneath the specified root node as illustrated below:

GeoEvent Definition

By configuring these above properties accordingly, GeoEvent Server will only consider data within a sub-structure found beneath a "vehicles" root node and should make allowances that the sub-structure may contain more than one "vehicle".

XML Sample

With this approach, there are two ramifications you might want consider. First, the inbound adapter is literally throwing half of the received data away by excluding data from any sub-structure found beneath the "personnel" nodes. This can be addressed by making a copy of the existing Receive XML on a REST Endpoint input and configuring this copy to use "personnel" as its XML Object Name. The copied input should also use a different GeoEvent Definition -- one which specifies "person" as an event attribute with cardinality Many and the attributes of a "person" (rather than a "vehicle") as illustrated below.

Copied Input Configuration

Second, the event record being ingested has multiple vehicles (or people) as items in an array. You'll likely want to process each vehicle (or person) as individual event records. To address this, it's recommended you use a processor available on the ArcGIS GeoEvent Server Gallery, specifically the Multicardinal Field Splitter Processor. There are two different field splitter processors provided in the download, so make sure to use the processor that handles multicardinal data structures.

A Multicardinal Field Splitter Processor, added to a GeoEvent Service illustrated below, will clone event records it receives and split the event record so that each record output has only one vehicle (or person). Notice that each event record output from the Multicardinal Field Splitter Processor includes an index at which the element was found in the original array.

GeoEvent Service

Conclusion

The examples I've referenced in this blog are obviously academic. There's no good reason why a data provider would mashup people and vehicles this way in the same XML data structure. However, you might come across data structures which are not homogeneous and need to use one or more of the approaches highlighted in this blog to extract a portion of the data out of a data structure. Or you might need to debug your input connector's configuration to figure out why attribute or element values you know to exist in the XML being received are not coming through in the event records that output. Or maybe in the data you're receiving you expect multiple event records to be ingested and end up only observing a few -- or maybe only one -- event records being ingested. Hopefully the information provided will help you address these challenges when you encounter them.

To summarize, below are the tips I highlighted in this article:

  • Use the GeoEvent Definition as a clue to the hierarchy and cardinality GeoEvent Server is using to define each event record's structure.
  • Specify the root node or element when ingesting XML or JSON; don't let the inbound adapter assume which node should be considered the root. If necessary, specify an interior node as the root node so only a subset of the data is actually considered.
  • Avoid XML data which uses attributes. If you must use XML data with attributes, know that an attempt will be made to promote these as elements when the XML is translated to JSON.
  • Encourage your data providers to design data structures whose records are homogeneous. This can run counter to database normalization instincts where data common to all records is included in a sub-section above each of the actual records. Sometimes simple is better, even when "simple" makes individual data records verbose.
  • Make sure the XML you ingest includes a header specifying its version and encoding -- the libraries GeoEvent Server is using really like seeing this metadata. Also, watch out for hidden characters which are sometimes present in the data.

GeoEvent Server Automatic Configuration Backup Files

It is possible, and in fact preferred, to create XML snapshots of your ArcGIS GeoEvent Server configuration using GeoEvent Manager (Site > GeoEvent > Configuration Store > Export Configuration).

But what if something has gone sideways and you cannot access GeoEvent Manager? Before you delete GeoEvent Server’s ZooKeeper distributed configuration store, you will want to locate a recent XML configuration and see if recent changes to inputs, outputs, GeoEvent Definitions, and GeoEvent Services are in the configuration file.

Beginning with GeoEvent Server 10.5, a copy of the configuration is exported automatically for you, daily, at 00:00:00 hours (local time).

  • Automatic backup files, by default, are written to the following folder:
    C:\ProgramData\Esri\GeoEvent
  • You can change the folder used by editing the folder registered for 'Automatic Backups':
    Site > GeoEvent > Data Stores > Register Folder
  • You can change when and how often snapshots of your configuration are taken:
    Site > Settings > Configure Global Settings > Automatic Backup Settings

 

GeoEvent Server ZooKeeper Files

At the 10.5 / 10.5.1 release – GeoEvent Server uses the “synchronization service” platform service in ArcGIS Server, which is running an Apache ZooKeeper behind the scenes. Since this is an ArcGIS Server service, the application files are found in the ArcGIS Server 'local' folder (e.g. C:\arcgisserver\local).

If a system administrator wanted to administratively clear a configuration of GeoEvent Server they could stop the ArcGIS Server platform service -- using the Administrative API -- or stop the ArcGIS Server Windows service and delete the files and folders found beneath C:\arcgisserver\local\zookeeper\.

  • You should leave the parent folder, C:\arcgisserver\local\zookeeper intact.
  • You should also confirm with Esri Technical Support that patches, service packs, or hot-fixes you may have installed have not changed how the “synchronization service” platform service is used by other ArcGIS Enterprise components before administratively deleting files from beneath the ArcGIS Server directories. (ArcGIS GeoAnalytics Server, for example, uses the platform service to elect a machine participating in a multiple-machine analytic as the "leader" for an operation.)

Beginning with the 10.6 release – GeoEvent Server is running its own Apache ZooKeeper instance within the ArcGIS GeoEvent Gateway Windows service. If a system administrator wanted to administratively clear a 10.6 configuration of GeoEvent Server they could stop the ArcGIS GeoEvent Gateway Windows service – which will also stops the dependent ArcGIS GeoEvent Server Windows service – and then delete the files and folders found beneath: C:\ProgramData\Esri\GeoEvent-Gateway\zookeeper-data.


GeoEvent Server Kafka File

NOTE: The following only applies to 10.6 and later releases of GeoEvent Server.

Beginning with the 10.6 release – GeoEvent Server is running an Apache Kafka instance as an event message broker within the ArcGIS GeoEvent Gateway Windows service. The message broker uses on-disk topic queues to manage event records. The event records which have been sent from the message broker to a GeoEvent Server instance for processing are recorded within the broker's associated configuration store (e.g. Apache ZooKeeper).

The Kafka message broker provides a transactional message guarantee that the RabbitMQ message broker (used in 10.5.1 and earlier releases) does not provide. If the GeoEvent Gateway on a machine were stopped and restarted, the configuration store will have recorded where event message processing was suspended and will use indexes into the topic queues to resume processing previously received event records.

The topic queue files are closed, new files created, and old files deleted according to configurable data retention strategy. However, if the GeoEvent Gateway were stopped and its ZooKeeper configuration were deleted, the Kafka topic queues will likely be orphaned and potentially large message log files may not be deleted from disk according to the data retention strategy. In this case, a system administrator might need to locate and delete the topic queue files from beneath C:\ProgramData\Esri\GeoEvent-Gateway\kafka.

 

GeoEvent Server Runtime Files

When GeoEvent Server is initially launched, following a new product installation, a number of files are created as the system framework is built. These files, referred to as “cached bundles” are written into a \data folder in the GeoEvent Server installation directory (e.g  C:\Program Files\ArcGIS\Server\GeoEvent\data). Again, if something has gone sideways, a system administrator might want to try deleting these files, forcing the system framework to be rebuilt, before deciding to uninstall and then reinstall GeoEvent Server.

This might be necessary if, for example, you continue to see the message "No Services Found" displayed in a browser window (after several minutes and a browser refresh) when attempting to launch GeoEvent Manager. In this case, deleting the runtime files from the \data folder to force the system framework to be rebuilt may remedy an issue which prevented GeoEvent Server from launching correctly the first time.

Another reason a system administrator may need to force the system framework to be rebuilt might be observing a message that the ArcGIS GeoEvent Server Windows service could not be stopped “in a timely fashion” (when selecting to stop the service using the Windows Task Manager). In this case, an administrator should ensure the process identified in the C:\Program Files\ArcGIS\Server\GeoEvent\instances\instance.properties file has been stopped. Administratively terminating this processes to stop GeoEvent Server can leave the system framework in a bad state, requiring the \data files be deleted so the framework can be rebuilt.

 

Administratively Reset GeoEvent Server

Deleting the Apache ZooKeeper files (to administratively clear the GeoEvent Server configuration), the product’s runtime files (to force the system framework to be rebuilt), and removing previously received event messages (by deleting Kafka topic queues from disk) is how system administrators reset a GeoEvent Server instance to look like the product has just been installed. Below are the steps and system folders you need to access to administratively reset GeoEvent Server at the 10.5.x and 10.6.x releases.

 

If you have custom components in the C:\Program Files\ArcGIS\Server\GeoEvent\deploy folder, move these from the \deploy folder to a local temporary folder, while GeoEvent Server is running, to prevent the component from being restored (from the distributed configuration store) when GeoEvent Server is restarted. Also, make sure you have a copy of the most recent XML export of your GeoEvent Server configuration if you want to save the elements you have created.

10.5.x

  You should confirm with Esri Technical Support that system folders and files you plan to delete before executing the steps below. Files you delete following the steps below are irrecoverable.

  1. Stop the ArcGIS Server Windows service.
    (This will also stop the GeoEvent Server Windows service)
  2. Locate and delete the files and folders beneath C:\Program Files\ArcGIS\Server\GeoEvent\data
    (Leave the \data folder intact)
  3. Locate and delete the files and folders beneath C:\arcgisserver\local\zookeeper
    (Leave the \zookeeper folder intact)
  4. Locate and delete the files and folders beneath C:\ProgramData\Esri\GeoEvent
    (Leave the \GeoEvent folder intact)
  5. Start the ArcGIS Server Windows service.
    (Confirm you can log in to the ArcGIS Server Manager web application)
  6. Start the ArcGIS GeoEvent Server Windows service.

10.6.x

  Note that the lifecycle of the ArcGIS GeoEvent Gateway service is intended to mirror that of the operating system.
  You can administratively reset GeoEvent Server (e.g. deleting its runtime files from its \data folder) without stopping the ArcGIS GeoEvent Gateway service -- unless you also want to administratively delete the ZooKeeper files from the configuration store (which in the 10.6.x are maintained as part of the ArcGIS GeoEvent Gateway service).

  1. Stop the ArcGIS GeoEvent Server Windows service.
  2. Locate and delete the files and folders beneath the following directories (leaving the parent folders intact):
    C:\Program Files\ArcGIS\Server\GeoEvent\data\
    C:\ProgramData\Esri\GeoEvent\
  3. Stop the ArcGIS GeoEvent Gateway Windows service.
    This will also stop the ArcGIS GeoEvent Server Windows service if it is running.
  4. Locate and delete the files and folders beneath the following directories.
    Leave the parent folders (highlighted) intact:
    C:\ProgramData\Esri\GeoEvent-Gateway\zookeeper-data
    C:\Program Files\ArcGIS\Server\GeoEvent\gateway\log
  5. If you delete the zookeeper-data files, you should remove any orphaned topic queues
    by deleting the on-disk Kafka logs (delete the 'logs' sub-folder, leave the 'kafka' folder intact):
    C:\ProgramData\Esri\GeoEvent-Gateway\kafka\logs
  6. Locate and delete the GeoEvent Gateway configuration file (a new file will be rebuilt).
    C:\Program Files\ArcGIS\Server\GeoEvent\etc\com.esri.ges.gateway.cfg
  7. Start the ArcGIS GeoEvent Server Windows service.
    This will start the ArcGIS GeoEvent Gateway service if it has been stopped.
    Confirm you can log in to GeoEvent Manager.

At this point you can also review the contents of the rebuilt com.esri.ges.gateway.cfg file. The GeoEvent Gateway will record its message broker and configuration store port configurations in this file if it was able to launch successfully:

gateway.zookeeper.connect=MY-MACHINE.MY-DOMAIN:4181

gateway.kafka.brokers=MY-MACHINE.MY-DOMAIN:9192

gateway.kafka.topic.partitions=3

gateway.kafka.topic.replication.factor=3

When speaking with customers who want to get started with ArcGIS GeoEvent Server, I'm often asked if GeoEvent Server has an input connector for a specific data vendor or type of device. My answer is almost always that we prefer to integrate via REST and the question you should be asking is: "Does the vendor or device offer a RESTful API whose endpoints a GeoEvent Server input can be configured to query?"

Ideally, you want to be able to answer two integration questions:

  1. How is the data being sent to a GeoEvent Server input?
  2. How is the data formatted; what does the data's structure look like?

For example, an input can be configured to accept data sent to a GeoEvent Server hosted REST endpoint. That answers the first question - integration will occur via REST with the vendor sending data as an HTTP/POST request to a GeoEvent Server endpoint. The second question, how is the data formatted, is the focus of this blog.

What does a typical JSON data record look like?

Typically, when a data vendor sends event data formatted as JSON, there will be multiple event records organized within a list such as this:

{
    "items": [{
                  "id": 3201,
                  "status": "",
                  "calibrated": 1521135120000,
                  "location": {
                         "latitude": -117.125,
                         "longitude": 31.125
                  }
           },
           {
                  "id": 5416,
                  "status": "offline",
                  "calibrated": 1521638100000,
                  "location": {
                         "latitude": -113.325,
                         "longitude": 33.325
                  }
           },
           {
                  "id": 9823,
                  "status": "error",
                  "calibrated": 1522291320000,
                  "location": {
                         "latitude": -111.625,
                         "longitude": 35.625
                  }
           }
    ]
}

 

There are three elements, or objects, in the block of JSON data illustrated above. It would be natural to think of each element as an event record with its own "id", "status", and "location". Each event record also has a date/time the item was last "calibrated" (expressed as an epoch long integer in milliseconds).

 

What do we mean when we refer to a "multi-cardinal" JSON structure?

The JSON data illustrated above is multi-cardinal because the data has been organized within an array. We say the data structure is multi-cardinal because its cardinality, in a mathematical sense of the number of elements in a group, is more than one. The array is enclosed within a pair of square brackets:  "items": [ ... ]

If the array were a list of simple integers the data would look something like:  "values": [ 1, 3, 5, 7, 9 ]

The data elements in the illustration above are not simple integers. Each item is bracketed within curl-braces which is how JSON identifies an object. For GeoEvent Server, it is important that both the array have a name and that each object within the array have a homogeneous structure, meaning that every event record should, generally speaking, use a common schema or collection of name/value pairs to communicate the item's data.

What do we mean when we refer to a "hierarchical" JSON structure?

The data elements in the array are themselves hierarchical. Values associated with "id", "status", and "calibrated" are simple numeric, string, or Boolean values. The "location" value, on the other hand, is an object which encapsulates two child values -- "latitude" and "longitude". Because "location" organizes its data within a sub-structure the overall structure of each data element in the array is considered hierarchical.

It should be noted that the coordinate values within the "location" sub-structure can be used to create a point geometry, but "location" itself is not a geometry. This is evident by examining how a GeoEvent Definition is used to represent the data contained in the illustrated block of JSON.

Different ways of looking viewing this data using a GeoEvent Definition

In GeoEvent Server, if you were to configure a new Receive JSON on a REST Endpoint input, leaving the JSON Object Name property unspecified, selecting to have an GeoEvent Definition created for you, and specifying that the inbound adapter not attempt to construct a geometry from received attribute values, the GeoEvent Definition created would match the one illustrated below:

GeoEvent Definition

Notice the cardinality of "items" is specified as Many (the infinity sign simply means "more than one"). Also, when the block of JSON data illustrated above is sent to the input via HTTP/POST, the input's event count only increments by one, indicating that only one event record was received.

Also notice that, in this configuration, "items" is a Group element type. This implies that in addition to the structure being multi-cardinal, it's also organized as a group of elements, which in JSON is typically an array.

Finally, notice that the "location" is also a Group element type. The cardinality of "location", however, is One not Many. This tells you that the value is a single element, not an array of elements or values.

Accessing data values

Working with the structure specified in the GeoEvent Definition illustrated above, if you wanted to access the coordinate values for "latitude" or "longitude" you would have to specify which latitude and longitude you wanted. Remember, the data was received as a single event record and "items" is a list or array of elements. Each element in the array has its own set of coordinate values. Consider the following expressions:

  items[2].location.longitude

  items[2].location.latitude

The expressions above specify that the third element in the "items" list is the one in which you are interested. You cannot refer to items.location.latitude because you have not specified an index to select one of the three elements in the "items" array. The array's index is zero-based, which means the first item is at index 0, the second is at index 1, and so on.

Ingesting this data as a single event record is probably not what you would want to do. It is unlikely that an arbitrary choice to use the third element's coordinates, rather than the first or second element in the list, would appropriately represent the items in the list. These three items have significantly different geographic locations, so we should find a way to ingest them as three separate event records.

Re-configuring the data ingest

When I first mentioned configuring a Receive JSON on a REST Endpoint input to allow the illustrated block of JSON to be ingested into GeoEvent Server for processing, I indicated that the JSON Object Name property should be left unspecified. This was done to support a discussion of the data's structure.

If the illustrated JSON data were representative of data you wanted to ingest, you should specify an explicit value for the JSON Object Name parameter when configuring the GeoEvent Server input. In this case, you would specify "items" as the root node of the data structure.

Specifying "items" as the JSON Object Name tells the input to handle the data as an array of values and to ingest each item from the array as its own event record. If you make this change to our input, and delete the GeoEvent Definition it created the last time the JSON data was received, you will get a slightly different GeoEvent Definition generated as illustrated below:

 GeoEvent Definition

The first thing you should notice, when the illustrated block of JSON data is sent to the input, is the input's event count increments by three -- indicating that three event records were received by GeoEvent Server. Looking at the new GeoEvent Definition, notice there is no attribute named "items" -- the elements in the array have been split out so that the event records could be ingested separately. Also notice the cardinality of each of the event record attributes is now One. There are no lists or arrays of multiple elements in the structure specified by this GeoEvent Definition. The "location" is still a Group which is fine; each event record should have (one) location and the coordinate values can legitimately be organized as children within a sub-structure.

The updates to the structure specified in the GeoEvent Definition change how the coordinate values are accessed. Now that the event records have been separated, you can access each record's attributes without specifying one of several element indices to select an element from a list.

You should now be ready to re-configure the input to construct a geometry as well as make some minor updates to the data types of each attribute in the GeoEvent Definition in order to handle "id" as a Long and "calibrated" as a Date. You also need to add a new field of type Geometry to the GeoEvent Definition to hold the geometry being constructed.

GeoEvent Input

GeoEvent Definition

Hopefully this blog provided some additional insight on working with hierarchical and multi-cardinal JSON data structures in GeoEvent Server. If you have ideas for future blog posts, let me know, the team is always looking for ways to make you more successful with the Real-Time & Big Data GIS capabilities of ArcGIS.