Skip navigation
All People > bixb0012 > Tilting at Globes > 2018 > December
2018

Don Knuth is often quoted as saying, "premature optimization is the root of all evil" when it comes to computer programming.  Attribution usually comes from "Structured Programming with go to Statements," a journal article he published in the mid-1970s.    Although the phrase makes for a great soundbite, I think his entire explanation makes the point better:

There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

The trick to optimizing code is to learn when you are chasing the 97% tail versus getting to the 3% heart of it.  Experience helps greatly with answering that question but so can simple tests and empirical data.

 

When scripting with Python geometry libraries (ArcPy, ArcGIS API for Python, Shapely, GeoDjango, etc....), it is quite common to encounter Python lists containing geometry coordinates, and turning those coordinates into geometry objects involves calling geometry constructors.  For ArcPy geometry classes, like most Python classes, the default constructor is accessed by calling the class and passing arguments.  For Polygon—ArcPy classes | ArcGIS Desktop:

Syntax

Polygon  (inputs, {spatial_reference}, {has_z}, {has_m})
ParameterExplanationData Type
inputs

The coordinates used to create the object. The data type can be either Point or Array objects.

Object
spatial_reference

The spatial reference of the new geometry.

(The default value is None)

SpatialReference
has_z

The Z state: True for geometry if Z is enabled and False if it is not.

(The default value is False)

Boolean
has_m

The M state: True for geometry if M is enabled and False if it is not.

(The default value is False)

Boolean

In addition to the ArcPy geometry class constructors, there are several other constructors for creating ArcPy geometries:

 

Given there are multiple ways to construct ArcPy geometries, it is reasonable for someone to wonder which constructor they should or shouldn't use.  The descriptions of arcpy.FromWKT(), arcpy.FromWKB(), and arcpy.AsShape() tell us those constructors work with specific geometry representations or encodings.  When it comes to which constructor someone should use, I think Don Knuth would argue the one that most closely matches your data's existing structure, i.e., don't overthink it.

 

I recently had reason to overthink ArcPy geometry constructors, or thought I had reason to, so I set about running some basic timing tests to gather more information before deciding whether to refactor some code involving arcpy.Polygon().  Using the simple, multipart polygon from A Case of Missing Prefixes: ArcGIS ...Geometries,

 

 

I created four tests constructing the geometry from a Python list containing coordinates:

import arcpy
import timeit

poly_rings = [
    [[15,0], [25,0], [25,10], [15,10], [15,0]],
    [[18,13], [24,13], [24,18], [18,18], [18,13]
]]

def FromArcPyArray():
    aarr = arcpy.Array(
        arcpy.Array(arcpy.Point(*xy) for xy in ring) for ring in poly_rings
    )
    return arcpy.Polygon(aarr)

def FromEsriJSON():
    esri_json = {"type":"Polygon", "rings":poly_rings}
    return arcpy.AsShape(esri_json, True)

def FromGeoJSON():
    geojson = {"type":"Polygon", "coordinates":poly_rings}
    return arcpy.AsShape(geojson)

def FromWKT():
    wkt = "MULTIPOLYGON({})".format(
        ",".join("(({}))".format(
            ", ".join("{} {}".format(*xy) for xy in ring)
        ) for ring in poly_rings)
    )
    return arcpy.FromWKT(wkt)

 

Using 26.6. timeit — Measure execution time of small code snippets — Python 2.7.15 documentation from Python 2.7.14 bundled with ArcGIS Desktop 10.6.1:

>>> for ctor in [FromArcPyArray, FromEsriJSON, FromGeoJSON, FromWKT]:
...     pg = ctor()
...     print("\n".join(
...         str(i) for i in [ctor.__name__, timeit.timeit(ctor, number=10000), ""]
...     ))
...
FromArcPyArray
20.2141071389

FromEsriJSON
4.77303549343

FromGeoJSON
20.2831866771

FromWKT
4.03049759916

>>>

I must admit, the results aren't what I was expecting.  I expected some timing differences between the various constructors, but I didn't expect some to be 5x faster than others.  What I really didn't expect is the ArcPy Polygon constructor nearly being the slowest.

 

Since I have ArcGIS Desktop 10.6.1 and ArcGIS Pro 2.2.3 on the same machine, I just had to run the same tests using 27.5. timeit — Measure execution time of small code snippets — Python 3.6.7 documentation from Python 3.6.5 bundled with ArcGIS Pro 2.2.3:

 

>>> for ctor in [FromArcPyArray, FromEsriJSON, FromGeoJSON, FromWKT]:
...     pg = ctor()
...     print("\n".join(
...         str(i) for i in [ctor.__name__, timeit.timeit(ctor, number=10000), ""]
...     ))
...
FromArcPyArray
10.2499491093713

FromEsriJSON
0.9167164168891304

FromGeoJSON
9.85043158674398

FromWKT
0.5525892638736423

>>>

 

What?!  This goes beyond unexpected, this is outright surprising.  It is good to see a nearly 50% decrease in the ArcPy Polygon constructor, but it is amazing to see 80% and 85% decreases in Esri JSON and WKT constructors.  The Esri JSON constructor went from 4x to 11x faster than the ArcPy Polygon constructor, and the WKT constructor is now 19x faster!

 

When basic timing tests come back with results this surprising, one has to wonder whether the relative timing differences will hold up when the tests use larger, real-world data.  To answer that question, I downloaded the USA States layer package included with Esri Data & Maps and available on ArcGIS.com.  I wanted a multipart polygon, and Michigan was the first state that came to mind.  It turns out, because of all the small islands in the Great Lakes, Michigan is a very multipart polygon:  450 parts and 166,107 points.

 

>>> for ctor in [FromArcPyArray, FromEsriJSON, FromGeoJSON, FromWKT]:
...     pg = ctor()
...     print("\n".join(
...         str(i) for i in [ctor.__name__, timeit.timeit(ctor, number=100), ""]
...     ))
...
FromArcPyArray
1267.5038210736802

FromEsriJSON
141.83130611867819

FromGeoJSON
464.2651427417986

FromWKT
86.92622569438026

>>>

For the most part, the relative results stay consistent when using a larger and more complex multipart polygon.  The ArcPy Polygon constructor does scale slightly better than the Esri JSON and WKT constructors, going from 11x and 19x slower to 9x and 15x slower respectively, but those improvements aren't nearly enough to make up for the overall slowness of the constructor.

 

Overall, I don't really know what to think about the GeoJSON constructor.  With small and simple polygons, it is as slow or slower than the ArcPy Polygon constructor.  With larger polygons it scales better than all of the other constructors, in relative terms, but it is still quite slow overall.

 

Comparing the timings between the simple example polygon and the Michigan polygon, the constructors appear to scale roughly linearly with the number of points/vertices in the polygon.  For the Michigan polygon the number of iterations was lowered 2 orders of magnitude (10,000 to 100) while the run times increased by roughly 2 orders of magnitude, leading to run times that are 4 orders of magnitude longer.  The magnitude of increase in run times is matched by an equal magnitude of increase points/vertices (10 to 166,107).

 

The results of these tests surprised me, truly.  I am not going to wholesale abandon the ArcPy geometry default constructors, but I do think they are worth a solid look when optimizing code.