Spatiotemporal Big Datastore Query Performance

RyanClancy · ‎10-05-2018

I posted this in the ArcGIS API for Python space yesterday: https://community.esri.com/message/804065-spatiotemporal-bds-rest-query-performance and I'm raising a slightly different version of the issue here because I'm not sure if the performance I'm experiencing is due to the way I'm querying the data (Python API), the way I set up the data source in the big data store, or maybe both. Or maybe some other factor.

I created the output data source in the STBDS using GeoEvent Manager. In a nutshell, I've got unprojected point data (GPS-like data in sr 4326), two spatial indexes (the default geohash plus flat hexagons), and a datetime index. My spatiotemporal big datastore is a 3 node cluster where each node has 16 cores and 32GB RAM with ArcGIS Enterprise 10.6.1 running on Linux.

I loaded 450 million records via GeoEvent server.

The first query I need to do gives me a list of distinct values and their counts for each 24 hr period. This works and I can get the query results I wanted, it's just a lot slower than I expected. I'm using the ArcGIS Python API to query the data via an arcgis.features.FeatureLayer.query() and each query takes over an hour to return. By comparison, the same query via cx_Oracle directly to the same data in an Oracle db takes 38 seconds. The exact queries I'm doing against Oracle and via the Feature Layer are included at the link I posted above.

Am I doing something wrong, either in way I've configured the STBDS data source, or in the way I'm querying? My assumption is that the time_filter parameter in the FeatureLayer.query() method leverages the datetime index on my STBDS data source. Is that true? Is there a way to leverage the indexes that isn't exposed via the Python API? What type of query response time is reasonable to expect in this situation? I was shocked that the query against Oracle took 38 seconds while the one against data in the STBDS took over an hour.