Pesky computer rounding in arcpy

PaulDavidson1 · ‎12-22-2015

As an addendum to my NULL SHAPE question...

I've just hit a very strange issue.

For days I've been working with various FeatureClasses (in file geodatabases, and in enterprise geodatabases, etc... trying a variety of workspaces)

And for days, using an UpdateCursor, I've been successfully writing points as (0.0, 0.0)

This evening, I suddenly had the code:

fields = ['OID@', 'SHAPE@X', 'SHAPE@Y']

with arcpy.da.UpdateCursor(myFC, fields) as cursor :

for row in cursor:

row[1] = 0.0

row[2] = 0.0

cursor.updateRow(row)

Start writing non zero values into the XY locations.

When I read the data back in:

Shape(X, Y):(7.45058059692e-09,7.45058059692e-09)

Ok, sure, it's quite small and I could effectively work with it via a small delta function but that makes no sense.

0.0 should not have a rounding effect, should it.

Any ideas why this suddenly started happening?

Like I say, for days it's been written as 0.0 and read back that way.

And the datasets have always been in the same state plane coord system so unless I missed something and I'm dealing with some small projection error... But from the same dataset back to the dataset should have no projection translation...

It might be that my PC has flaked some bits and ....

And I'll reboot and try again but this seems very strange.

DanPatterson_Retired · ‎12-23-2015

I am not sure how the data originated or was created, but if it was created without a spatial reference being specified (not presumed) then it is possible. If you are doing that query thing again, I wouldn't check for equality, but for tolerance... again by example

def tol_test(val):
    tol = -1.0e-08
    if tol < val < abs(tol):
        val = 0.0
    return val

for val in [-1.1e-9, 1.05e-9,0.0,0.1]:
    returned = tol_test(val)
    print("val==> {: >10} out==>{: >10}".format(val, returned))

yields

val==>   -1.1e-09 out==>       0.0
val==>   1.05e-09 out==>       0.0
val==>        0.0 out==>       0.0
val==>        0.1 out==>       0.1

View solution in original post

DanPatterson_Retired · ‎12-22-2015

that is within the realm of floating point precision. If you want to guarantee numbers are read or printed with a certain precision, you have to enforce it.

second, if you don't have a defined coordinate system (ie you actually defined one, and not one that is expected), there are errors in geometry objects that are calculated (documented elsewhere and well known)

Using python as an example

>>> x = 7.45e-09
>>> print("x is...{}".format(x))  # no formatting
x is...7.45e-09
>>> print("x is...{: <8.3f}".format(x)). # format as a left justfied 8 char float with 3 decimal places
x is...0.000 
>>> print("x is...{: >8.3f}".format(x)). # ditto with right-justified
x is...  0.000

for more information, see python's mini-formatting language (I also posted a blog on formatting)

PaulDavidson1 · ‎12-22-2015

Is there a way to apply formatting to the read of a da.UpdateCursor? (or SearchCursor)

Other than to read the raw values and then format them?

Can the fields statement define formatting?

The latest esri documentation on the da.UpdateCursor for spatial_reference says: (The default value is None)

While for a da.SearchCursor: By default, the spatial reference of the geometry returned from a search cursor is the same as that of the feature class opened by the cursor.

Not sure I get that logic. Isn't an update nothing more than a search followed by an UPSERT?

But I don't have to get it... Just deal with it. That would seem to me to mean that a da.UpdateCursor should have spatial_reference as a required field. To default to none seems to introduce all sorts of nasty head scratching errors.

While I've seen floating point errors in calculations over the years, not sure I can recall ever seeing a definition of 0 introduce slop into zero. There is a bit representation of that after all.

It also seems odd that what had been reading and writing as zero suddenly, for no apparent reason, start showing up just outside the margin of the resolution of the feature class. live and learn

thanks

DanPatterson_Retired · ‎12-23-2015

I am not sure how the data originated or was created, but if it was created without a spatial reference being specified (not presumed) then it is possible. If you are doing that query thing again, I wouldn't check for equality, but for tolerance... again by example

def tol_test(val):
    tol = -1.0e-08
    if tol < val < abs(tol):
        val = 0.0
    return val

for val in [-1.1e-9, 1.05e-9,0.0,0.1]:
    returned = tol_test(val)
    print("val==> {: >10} out==>{: >10}".format(val, returned))

yields

val==>   -1.1e-09 out==>       0.0
val==>   1.05e-09 out==>       0.0
val==>        0.0 out==>       0.0
val==>        0.1 out==>       0.1

PaulDavidson1 · ‎12-23-2015

Exactly my thoughts, I had written almost the identical routine, just passed in tol which complicates it a bit.

Is it + or -, etc... Not sure if I need tol variable or not. Probably not. Certainly the function would be fastest to just have hard coded upper and lower bounds. But speed is not an issue.

Edit: btw - I was wrong, this issue didn't suddenly show up and what I had been reading as 0.0 I was all of the sudden reading as ~ 7.3e-8

In the heat of the late night coding, I had started working with a new data source that had much finer resolution, on the order of the fp error.

At least values in a dataset didn't suddenly change on me out of the blue.

thanks again.

fyi... one could refractor out the gub

def isZero(val,tol) : 
    llb = [-tol, tol][tol < 0 ] 
    gub = abs(tol)
    if llb < val < gub :  
        val = 0.0  
    return val

DanPatterson_Retired · ‎12-23-2015

if you are using cursors, then you can use my if statement which does the plus or minus check for the lower and upper bound. If the val (value) is within the +/- tolerance then it is obviously crap, so assign it zero. I still don't know how they got that way, I have seen it occur when values are calculated in arcpy without a spatial reference, but never by assignment.

I can't attest to the speed, since I do all my geometry operations in numpy and a check like that is vectorized and millions of points can be tested in less time than a sneeze.

PaulDavidson1 · ‎12-23-2015

yes, I expect the values got that way just as you describe. the spatial resolution is so small as to indicate to me there was no spatial_reference in place. The assignment thing is puzzling. I do find one value of the 7.1e-09 in th original data. I suspect related to the very fine resolution. I'll probably do a reprojection with realistic resolution and see if that changes anything.

While I've tried to kick the habit of worrying over speed in todays world, and I usually do manage to qualm the OCD, it still is there in the background. I imagine you've dealt with days when having an 8087 made a huge difference in one's calculations.

If speed were an issue, python wouldn't be the first choice would it. But we'd have to be talking mega data.

Takes longer to load arcpy that in does to process 200K rows these days with da.XCursor.

I appreciate the help.

Edit: JIC others ever struggle with this:

I have verified that the issue of the zeros that are showing up as ~7.2e-9, even when assigned = 0.0 in a da.UpdateCursor (when they read back as the 7.2e-9), is all due to the Spatial Resolution of the original file being extremely small.

I re-projected the original feature class to the same state plane projection, except I set the resolution to the one that is standard here, 0.00025

That data set then updates points at (None,None) to 0.0 and when read back in, the points return 0.0

DanPatterson_Retired · ‎12-23-2015

excluding load time, python is fast enough, numpy is blazingly fast, SciPy is even faster for some things and Numba with numpy is essentially c speed. speed is often overrated...have a read for an example

Before I forget ... # 16 ... NumPy vs SciPy... making a point

PaulDavidson1 · ‎12-23-2015

Interesting read and comparison:

totally agree, with today's boxes, cranking 50e6 point calcs in .96 vs. .66

Well... that's blazing and the difference is irrelevant for a real world app.

What's most interesting to me (besides the flip in which is the faster lib at 1e5 pts) is the internet wars over such meaningless stuff.