<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Spatiotemporal Big Datastore Query Performance in ArcGIS GeoEvent Server Questions</title>
    <link>https://community.esri.com/t5/arcgis-geoevent-server-questions/spatiotemporal-big-datastore-query-performance/m-p/879056#M3426</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I posted this in the ArcGIS API for Python space yesterday: &lt;A href="https://community.esri.com/message/804065-spatiotemporal-bds-rest-query-performance"&gt;https://community.esri.com/message/804065-spatiotemporal-bds-rest-query-performance&lt;/A&gt; and I'm raising a slightly different version of the issue here because I'm not sure if the performance I'm experiencing is due to the way I'm querying the data (Python API), the way I set up the data source in the big data store, or maybe both. Or maybe some other factor. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I created the output data source in the STBDS using GeoEvent Manager. In a nutshell, I've got unprojected point data (GPS-like data in sr 4326), two spatial indexes (the default geohash plus flat hexagons), and a datetime index. My spatiotemporal big datastore is a 3 node cluster where each node has 16 cores and 32GB RAM with ArcGIS Enterprise 10.6.1 running on Linux.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I loaded 450 million records via GeoEvent server.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The first query I need to do gives me a list of distinct values and their counts for each 24 hr period. This works and I can get the query results I wanted, it's just a lot slower than I expected. I'm using the ArcGIS Python API to query the data via an arcgis.features.FeatureLayer.query() and each query takes over an hour to return. By comparison, the same query via cx_Oracle directly to the same data in an Oracle db takes 38 seconds. The exact queries I'm doing against Oracle and via the Feature Layer are included at the link I posted above.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Am I doing something wrong, either in way I've configured the STBDS data source, or in the way I'm querying? My assumption is that the time_filter parameter in the FeatureLayer.query() method leverages the datetime index on my STBDS data source. Is that true? Is there a way to leverage the indexes that isn't exposed via the Python API? What type of query response time is reasonable to expect in this situation? I was shocked that the query against Oracle took 38 seconds while the one against data in the STBDS took over an hour.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 05 Oct 2018 15:46:37 GMT</pubDate>
    <dc:creator>RyanClancy</dc:creator>
    <dc:date>2018-10-05T15:46:37Z</dc:date>
    <item>
      <title>Spatiotemporal Big Datastore Query Performance</title>
      <link>https://community.esri.com/t5/arcgis-geoevent-server-questions/spatiotemporal-big-datastore-query-performance/m-p/879056#M3426</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I posted this in the ArcGIS API for Python space yesterday: &lt;A href="https://community.esri.com/message/804065-spatiotemporal-bds-rest-query-performance"&gt;https://community.esri.com/message/804065-spatiotemporal-bds-rest-query-performance&lt;/A&gt; and I'm raising a slightly different version of the issue here because I'm not sure if the performance I'm experiencing is due to the way I'm querying the data (Python API), the way I set up the data source in the big data store, or maybe both. Or maybe some other factor. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I created the output data source in the STBDS using GeoEvent Manager. In a nutshell, I've got unprojected point data (GPS-like data in sr 4326), two spatial indexes (the default geohash plus flat hexagons), and a datetime index. My spatiotemporal big datastore is a 3 node cluster where each node has 16 cores and 32GB RAM with ArcGIS Enterprise 10.6.1 running on Linux.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I loaded 450 million records via GeoEvent server.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The first query I need to do gives me a list of distinct values and their counts for each 24 hr period. This works and I can get the query results I wanted, it's just a lot slower than I expected. I'm using the ArcGIS Python API to query the data via an arcgis.features.FeatureLayer.query() and each query takes over an hour to return. By comparison, the same query via cx_Oracle directly to the same data in an Oracle db takes 38 seconds. The exact queries I'm doing against Oracle and via the Feature Layer are included at the link I posted above.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Am I doing something wrong, either in way I've configured the STBDS data source, or in the way I'm querying? My assumption is that the time_filter parameter in the FeatureLayer.query() method leverages the datetime index on my STBDS data source. Is that true? Is there a way to leverage the indexes that isn't exposed via the Python API? What type of query response time is reasonable to expect in this situation? I was shocked that the query against Oracle took 38 seconds while the one against data in the STBDS took over an hour.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 05 Oct 2018 15:46:37 GMT</pubDate>
      <guid>https://community.esri.com/t5/arcgis-geoevent-server-questions/spatiotemporal-big-datastore-query-performance/m-p/879056#M3426</guid>
      <dc:creator>RyanClancy</dc:creator>
      <dc:date>2018-10-05T15:46:37Z</dc:date>
    </item>
  </channel>
</rss>

