Partitioning Large Geometries to Improve Query Performance

Blog Post created by vangelo-esristaff Employee on Sep 15, 2014

A while back I needed to create a very efficient point-in-polygon tool, which needed to process millions of points daily, with up to ten thousand features in any single minute.  I quickly determined that simple queries against a database table (at ~20 milliseconds/query) weren't going to meet the requirements, so I started looking at the SE_STABLE interface in the ArcSDE 'C' API to use a RAM cache to query the data. First I had to build some tools to convert a table with one or more attributes into a "shape table" and an attribute cache.  And when I was done, I had a significantly faster tool -- with a mean query time of 0.5 milliseconds.


But I wondered what would happen if I was able to reduce the time of the worst-performing queries (on Canada, Russia, United States,... -- all the "large" countries with lots of vertices).  So I added some code to split any feature larger than some threshold with a number of tiles, that in total had the same area but a much smaller number of vertices per feature.  I had to change my processing code to handle the higher incidence of "on the border" results, but the performance gain was significant.


Then I mistakenly queried an RDBMS table with one of these partitioned countries layers, and discovered that I was able to achieve a significant performance gain in the database, without needing to use shape tables.  Of course, the RAM cache was still faster, but the database queries on partitioned tables were approaching the cache speed without partitioning.


It's been a long time since that discovery, but I recently had cause to revisit that code, and ran a benchmark on my Linux PC using two countries polygons layers, one at 1:50m scale, and one at 1:15m scale. It seems that modern hardware is faster (big surprise, right?) but the benefits of partitioning during point-in-polygon query remain:


For the 1:50m Data & Maps country file (249 features and 0.46Mb storage), with 10,000 random queries:


Tiling SchemePartitionsStorage (MB)Init Time (ms)Query Time (ms)Total Time (ms)Mean Query (ms/feat)


90 Degree2860.4850.64747.91798.550.08
60 Degree3090.4964.30537.58601.880.06
45 Degree3350.5072.47519.99592.460.06
40 Degree3610.5188.90490.82579.720.06
35 Degree3970.52143.21414.53557.740.06
30 Degree3930.52134.11410.26544.370.05
25 Degree4630.54160.34390.83551.170.06
20 Degree5040.55200.16378.13578.290.06
15 Degree6230.60288.28362.54650.820.07
12 Degree7540.63388.84365.14753.980.08
10 Degree8990.69568.12377.09945.210.10
5 Degree20261.011866.12501.002367.120.24
4 Degree28001.242774.39591.643366.030.34
3 Degree42681.674800.73767.105567.830.56
2 Degree81962.8210288.241243.7011531.941.15
1 Degree273518.3039975.243568.6643543.904.35


For the 1:15m Data & Maps country file (249 features and 3.72Mb storage), with the same 10,000 random queries:


Tiling SchemePartitions Storage (MB)Init Time (ms)Query Time (ms)Total Time (ms)Mean Query (ms/feat)




90 Degree2903.74207.263031.723238.980.32

60 Degree



45 Degree 3403.75346.041764.422110.460.21
40 Degree3693.76410.531541.541952.070.20
35 Degree4033.78510.591075.171585.760.16
30 Degree4033.77485.251073.191558.440.16
25 Degree474380716.04883.491599.530.16
20 Degree5193.82885.24784.801670.040.17
15 Degree6403.861258.04607.321865.360.19
12 Degree7713.901715.10559.442274.540.23
10 Degree9253.952475.66530.023005.680.30
5 Degree20694.318018.14586.198604.330.86
4 Degree28524.5511924.08671.9612596.041.26

3  Degree

2 Degree83216.1644082.181340.3545422.534.54
1 Degree2768411.75170964.923889.86174854.7817.49


Back in 2006 when I first did this, the break-even point was with a tile size of 5 degrees, but now it seems that both the 1:50m and 1:15m layers are optimized within the cache using 30-degree tiling, and both tables now also show a significant law of diminishing returns below a grid size of 10 degrees (where before it was closer to 1 degree).


Doctoring the code a bit to export the partitioned shapes, I was also able to re-run my database queries (on PostgreSQL 9.1.9, with the same 10,000 random queries):


Tiling Scheme1:50m Feature Count1:50m Mean Vertex Count1:50m Mean Query (ms/feat)1:15m Feature Count 1:15m Mean Vertex Count1:15m Mean Query (ms/feat)
90 Degree28684.772.71290818.344.51
60 Degree30979.412.63314756.983.55
45 Degree33573.662.63340699.313.31
30 Degree39364.172.59403591.552.89
20 Degree50451.792.58519461.382.70
15 Degree62343.352.56640375.622.60
10 Degree89932.232.54925262.232.53
5 Degree202617.742.542069120.892.49
4 Degree280014.472.54285289.422.49
3 Degree426811.442.55434160.732.49
2 Degree81968.582.53832134.412.48
1 Degree273516.222.552768414.062.51
0.5 Degree982975.402.60992137.632.56
0.25 Degree3694925.132.613722125.742.57


While running a database on modern hardware has reduced the ACID overhead to nearly 2 milliseconds, that's not good enough to match a simple RAM cache, which starts submillisecond and optimizes to a fifth to a twentieth of a millisecond per query (depending on the vertex density of the queried table).


I find a couple of lessons to be learned here:


  1. The more things change, the more they remain the same
    There are those that claim optimization is unnecessary on newer systems, because the database or other software "has been designed to handle that."  Even if that's true, it doesn't hurt to check.  And even if optimizers are smarter, it rarely hurts to loft softballs at them. In this case, as before, queries on a large number of less complicated polygons have a consistent advantage over smaller numbers of more complicated polygons (as least, until the sheer number of features overwhelms the benefits of simplicity). In short, having good fundamentals and following best practice is always good policy.
  2. It's often important to re-evaluate optimization decisions with each new generation of technology
    While the benefits of efficiently evaluated queries don't change, the "sweet spot" that defines the most efficient set of parameters could very well change, so it doesn't hurt to keep benchmark data and procedures in an accessible archive, where they can be re-run as-is when a new set of hardware is available.  It's amazing what a 1.4Ghz (AMD) 6-core CPU with 8Gb of RAM can accomplish over a 2.6 Ghz (Intel) two-CPU host with 2Gb of RAM -- especially when coupled with fast disk with a large on-board cache.  Modern boxes should be faster, but it doesn't hurt to see a tangible gain, and to be able to measure it accurately.  Even if the drift isn't significant every time, it doesn't hurt to revisit optimization thresholds, just so you won't miss a 5x performance gain when it is available.


- V


BTW: The IDENTITY class of the se_toolkit ASCII data conversion tool ascinfo was used to compile these metrics, using 9.x Data & Maps country files and the attached control files (these control files leverage changes to META parameters available in the recently released build of se_toolkit-10.2, so use Build 20 or later to reproduce this effort).