<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: FGDB Compress Performance Impacts with LIKE query in Geodatabase Questions</title>
    <link>https://community.esri.com/t5/geodatabase-questions/fgdb-compress-performance-impacts-with-like-query/m-p/771063#M947</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't have time to dive into a full response, but I will say that pagination support was added to the ArcGIS REST API back in 10.3.x to address the scraping issue.&amp;nbsp; Part of the problem is a client that executes dozens of simultaneous requests for offset Object IDs, which is a very greedy way of retrieving records from the service.&amp;nbsp; Pagination is less greedy and more efficient for the back-end datastore, and hence the service itself.&amp;nbsp; Most of Esri's platforms that call the REST API have been updated to utilize pagination when available, but that doesn't stop an outdated or simply lazy developer from ignoring pagination and hammering the datastore.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In terms of the performance you are seeing with LIKE, the first thing that comes to mind is your LIKE statement isn't fully utilizing an index.&amp;nbsp; The structure of a LIKE statement can greatly impact how an index is used.&amp;nbsp; If an index isn't being used, or used very much, a compressed file geodatabase would require the data to be uncompressed before comparing the data with the search criteria.&amp;nbsp; The process of uncompressing the data could add significant time to queries.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 07 May 2019 23:54:24 GMT</pubDate>
    <dc:creator>JoshuaBixby</dc:creator>
    <dc:date>2019-05-07T23:54:24Z</dc:date>
    <item>
      <title>FGDB Compress Performance Impacts with LIKE query</title>
      <link>https://community.esri.com/t5/geodatabase-questions/fgdb-compress-performance-impacts-with-like-query/m-p/771062#M946</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This thread is quite long with some gory details (below) so here is the bottom line...&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="color: #0000ff;"&gt;A File Geodatabase (FGDB) that has been compressed has showed to perform better on almost all&amp;nbsp;operations tested, &lt;STRONG&gt;except a query with the &lt;A href="https://www.w3schools.com/sql/sql_like.asp" style="color: #0000ff;"&gt;LIKE&lt;/A&gt; operator (using _)&lt;/STRONG&gt;.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="color: #0000ff;"&gt;The underlying field where the LIKE operator is applied to is indexed.&amp;nbsp;&lt;/SPAN&gt;&amp;nbsp;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;SPAN style="color: #339966;"&gt;&lt;STRONG&gt;Does anyone have ideas how to increase query performance on a compressed geodatabase when using a LIKE operator?&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 22px;"&gt;&lt;STRONG&gt;--- GORY DETAILS ---&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The dataset we are working with is ~27GB when stored uncompressed in a FGDB.&amp;nbsp; The dataset is highly used (300-400K hits a day with 4-5K unique clients) so adequate performance is critical.&amp;nbsp; Furthermore, this is a central dataset to our organization operations.&amp;nbsp; This FGDB is hosted on a file system and published through arcgis server as a mapping/feature service.&amp;nbsp; There is a 1K limit on query operations.&amp;nbsp; Display performance is generally sub-second as we have scale dependencies applied for appropriate levels.&amp;nbsp; May of the clients need raw data access to the underlying features (think map identify, table display, query, etc) and generally not problematic.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We have experienced clients that seem to 'scrape' data off the service by making ArcGIS Server query API calls like:&amp;nbsp;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;A class="link-titled" href="https://www.myexample.com/arcgis/rest/services/folder/service/MapServer/3/query?returnGeometry=true&amp;amp;where=OBJECTID%20%3E277000%20AND%20OBJECTID%20%3C=278000&amp;amp;outSr=4326&amp;amp;outFields=*&amp;amp;f=geojson" title="https://www.myexample.com/arcgis/rest/services/folder/service/MapServer/3/query?returnGeometry=true&amp;amp;where=OBJECTID%20%3E277000%20AND%20OBJECTID%20%3C=278000&amp;amp;outSr=4326&amp;amp;outFields=*&amp;amp;f=geojson"&gt;https://www.myexample.com/arcgis/rest/services/folder/service/MapServer/3/query?returnGeometry=true&amp;amp;where=OBJECTID%20%3E…&lt;/A&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The&amp;nbsp;example encoded 'where' clause is&amp;nbsp;&lt;STRONG&gt;OBJECTID &amp;gt;277000 AND OBJECTID &amp;lt;=278000&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;What we observe is that some clients are making requests like this 1K record chunks at a time.&amp;nbsp; Some of the clients are throwing upwards of 16 concurrent requests at the server and the server takes over &lt;SPAN style="color: #ff0000;"&gt;&lt;STRONG&gt;80 seconds&lt;/STRONG&gt;&lt;/SPAN&gt; to handle&amp;nbsp;EACH request with the uncompressed database. This set of requests cause high CPU on the back-end arcgis server environments and if left unchecked, cause system unresponsiveness on this and many other services co-located here (the environment gets 5-6M hits a day with 60K+ unique clients). The back-end hardware is generally enough to handle daily operations (4 servers, each with 8 CPU, 32GB RAM), but when this scraping action occurs we experience instability. We have had to selectivly deny public IP addresses that exhibit this behavior.&amp;nbsp; Often times they will come back, but scale back the concurrent requests (to like &lt;span class="lia-unicode-emoji" title=":smiling_face_with_sunglasses:"&gt;😎&lt;/span&gt; which the systems are able to handle, albeit a bit slow.&amp;nbsp; Some of these are public anonymous users.&amp;nbsp; Sadly, we do have the data available for download but cannot seem to prevent public users from scraping the data as a service.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;When we compress the &lt;SPAN style="color: #339966;"&gt;&lt;STRONG&gt;FGDB (reduces to 3.5GB), the same request only takes 2.78 Seconds&lt;/STRONG&gt;&lt;/SPAN&gt; and a &lt;SPAN style="color: #339966;"&gt;&lt;STRONG&gt;SQL Enterprise GDB takes even less at 1.7 seconds&lt;/STRONG&gt;&lt;/SPAN&gt;:&lt;/P&gt;&lt;P&gt;&lt;IMG __jive_id="446066" class="image-1 jive-image" src="https://community.esri.com/legacyfs/online/446066_pastedImage_17.png" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Naturally the SQL Server Enterprise Geodatabase (EGDB) hosting appears to be the best case to resolve this performance issue, however we have completed some extensive testing and the FGDB hosting beats out the EGDB hosting on many other operations:&amp;nbsp;&lt;/P&gt;&lt;P style="padding-left: 30px;"&gt;&lt;IMG __jive_id="446067" class="image-2 jive-image" src="https://community.esri.com/legacyfs/online/446067_pastedImage_18.png" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The tests&amp;nbsp;noted above were doing simple operations within a web-map (and using fiddler to reissue the requests sequentially a few times to obtain the response times).&amp;nbsp; As you can see, the SQL EGDB performs worse than the compressed FGDB (the FGDB is 13-43% faster depending on the operation).&amp;nbsp; We have completed similar tests on other datasets and generally publish from a FGDB due to the performance gains.&amp;nbsp; This back-end EGDB was fully compressed (state of 0) with no versioning.&amp;nbsp; And the dataset is heavily indexed on both FGDB and EGDB.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So far, everything points to using a compressed FGDB for performance gains, however &lt;SPAN style="color: #ff0000;"&gt;&lt;STRONG&gt;a query using the LIKE operation takes so long it times out&lt;/STRONG&gt;&lt;/SPAN&gt;:&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;IMG __jive_id="446075" class="image-3 jive-image" src="https://community.esri.com/legacyfs/online/446075_pastedImage_22.png" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So... somewhat at a loss on best choice here.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In full disclosure, we do not notice query LIKE operations on the rest endpoint, but rather we have a custom written Server Object Extension (SOE) that is using the LIKE operator extensively.&amp;nbsp; This SOE was developed a few years back and we really dont want to dive in to try and re-write some of the query operations.&amp;nbsp; Even if we did, this would not prevent a rogue user issuing a query LIKE operation through the REST API /query operation.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;I see&amp;nbsp;a few options available (in order of best options IMO):&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1) Post here and someone from the community can suggest something we have not throught about yet (please please please!)&amp;nbsp; &lt;IMG src="https://community.esri.com/legacyfs/online/emoticons/happy.png" /&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;2) Move this to a SQL Server EGDB back-end.&amp;nbsp; Overall performance may be slower, but could be the best of all operations combined&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3) Host a compressed and uncompressed FGDB with 2 different services.&amp;nbsp; The uncompressed would have the SOE (that generally responds in sub-second time).&amp;nbsp; Essentially isolating the SOE operations to an uncompressed GDB, and using the compressed FGDB for standard mapping/feature access.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;4) refactor the SOE to remove any LIKE queries.&amp;nbsp; Establish alternative queries that could perform better.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thoughts from the community?&amp;nbsp; Thanks for any feedback you can provide.&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 07 May 2019 23:06:20 GMT</pubDate>
      <guid>https://community.esri.com/t5/geodatabase-questions/fgdb-compress-performance-impacts-with-like-query/m-p/771062#M946</guid>
      <dc:creator>pfoppe</dc:creator>
      <dc:date>2019-05-07T23:06:20Z</dc:date>
    </item>
    <item>
      <title>Re: FGDB Compress Performance Impacts with LIKE query</title>
      <link>https://community.esri.com/t5/geodatabase-questions/fgdb-compress-performance-impacts-with-like-query/m-p/771063#M947</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I don't have time to dive into a full response, but I will say that pagination support was added to the ArcGIS REST API back in 10.3.x to address the scraping issue.&amp;nbsp; Part of the problem is a client that executes dozens of simultaneous requests for offset Object IDs, which is a very greedy way of retrieving records from the service.&amp;nbsp; Pagination is less greedy and more efficient for the back-end datastore, and hence the service itself.&amp;nbsp; Most of Esri's platforms that call the REST API have been updated to utilize pagination when available, but that doesn't stop an outdated or simply lazy developer from ignoring pagination and hammering the datastore.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In terms of the performance you are seeing with LIKE, the first thing that comes to mind is your LIKE statement isn't fully utilizing an index.&amp;nbsp; The structure of a LIKE statement can greatly impact how an index is used.&amp;nbsp; If an index isn't being used, or used very much, a compressed file geodatabase would require the data to be uncompressed before comparing the data with the search criteria.&amp;nbsp; The process of uncompressing the data could add significant time to queries.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 07 May 2019 23:54:24 GMT</pubDate>
      <guid>https://community.esri.com/t5/geodatabase-questions/fgdb-compress-performance-impacts-with-like-query/m-p/771063#M947</guid>
      <dc:creator>JoshuaBixby</dc:creator>
      <dc:date>2019-05-07T23:54:24Z</dc:date>
    </item>
  </channel>
</rss>

