Select to view content in your preferred language

"If not Null and not empty string" paradox - help me figure it out?

618
9
Jump to solution
3 weeks ago
AllenDailey1
Frequent Contributor

Hello,

I am adding something to a script I wrote a few years ago, and in the process, I happened to notice some logic in the script that I would think is wrong, yet it works as desired.  This is breaking my brain!  While it is apparently not an actual problem, I would love some help in understanding it.  (I can't recall why I wrote it this way a few years ago, so my memory is no help here.)

This part of the script uses an arcpy update cursor to populate certain fields in a feature class, depending on what the existing values are.  If the State field is Null OR an empty string, it is supposed to be populated with "CA".  In other words, all records should have "CA" in this State field.  The script is supposed to ONLY edit records that really need editing, which is why I'm using criteria to pick which records get edited, rather than just doing a field calculation on the entire feature class.

However, the code is saying "if it is Null AND it is NOT an empty string, then make it CA."  I don't get it.  I would think that only Nulls would be picked... or maybe nothing at all.   

But the script does correctly populate the field with "CA" if there was an empty string there.  I don't understand how this is possible.  Any thoughts???

Here is the code:

# If State is blank or null, make it "CA"
if row[field_index_dict.get("State")] is None and row[field_index_dict.get("State")] != '':
    row[field_index_dict.get("State")] = "CA"
    field_uc.updateRow(row)

Thank you for reading!

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
HaydenWelch
MVP Regular Contributor

You just have a syntax error in your condition:

- if row[field_index_dict.get("State")] is None and row[field_index_dict.get("State")] != '':
+ if not row[field_index_dict.get("State")]:
    row[field_index_dict.get("State")] = "CA"
    field_uc.updateRow(row)

 

Empty string and None both evaluate to False so you can just say `if not` and it'll work. Your original logic is also evaluating to True if the sate is both None and not an empty string (which is always True)

View solution in original post

9 Replies
HaydenWelch
MVP Regular Contributor

You just have a syntax error in your condition:

- if row[field_index_dict.get("State")] is None and row[field_index_dict.get("State")] != '':
+ if not row[field_index_dict.get("State")]:
    row[field_index_dict.get("State")] = "CA"
    field_uc.updateRow(row)

 

Empty string and None both evaluate to False so you can just say `if not` and it'll work. Your original logic is also evaluating to True if the sate is both None and not an empty string (which is always True)

JoshuaBixby
MVP Esteemed Contributor

From both a practice ( https://peps.python.org/pep-0008/ ) and performance perspective, checking for None should be done using "is" or "is not".

HaydenWelch
MVP Regular Contributor

There's basically no performance difference between using a raw not (calling __bool__()) and is not (calling id(a)  !=  id(b)). I only dropped the is because forcing the condition to check the boolean in this case simplifies the condition from a readability standpoint. We know that Null/None and '' both evaluate to False when __bool__() is called.

I wouldn't recommend this pattern when dealing with arbitrary objects (as the __bool__ method can be overwritten), but since we know the type signature of the input value is Optional(str), we can just use the raw `not`.

You could change the condition to be val is None or val == '', but why add a second condition when one will suffice? Since the condition will likely be short circuited anyways, we could write it in a way that reflects that and combines the checks into a single statement.

 

Testing:

from random import choice

opts = ["Something", "", None]

def raw_not():
    val = choice(opts)
    return not val

def is_not():
    val = choice(opts)
    return val is not None or val != ''

 

Disassembled:

raw_not
  5           0 RESUME                   0

  6           2 LOAD_GLOBAL              1 (NULL + choice)
             12 LOAD_GLOBAL              2 (opts)
             22 CALL                     1
             30 STORE_FAST               0 (val)

  7          32 LOAD_FAST                0 (val)
             34 UNARY_NOT
             36 RETURN_VALUE

-----------------------------------------------------------
is_not
 9           0 RESUME                   0

 10           2 LOAD_GLOBAL              1 (NULL + choice)
             12 LOAD_GLOBAL              2 (opts)
             22 CALL                     1
             30 STORE_FAST               0 (val)

 11          32 LOAD_FAST                0 (val)
             34 LOAD_CONST               0 (None)
             36 IS_OP                    1
             38 COPY                     1
             40 POP_JUMP_IF_TRUE         5 (to 52)
             42 POP_TOP
             44 LOAD_FAST                0 (val)
             46 LOAD_CONST               1 ('')
             48 COMPARE_OP              55 (!=)
        >>   52 RETURN_VALUE

 

Performance:

%timeit raw_not()
298 ns ± 0.81 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

%timeit is_not()
303 ns ± 5.04 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

 

You can see that the python bytecode for the raw_not function is a lot simpler and only calls the UNARY_NOT function without adding any branching. Performance wise, it's a wash, but since there's branching in the short circuited version, the state of the data can effect the performance (checking for is None when most records are empty string will add an extra step for each record).

 

This is all beyond the scope for this question though lol, Everyone should definitely read PEP008, and do their best to abide by it, but I don't think that it should necessarily be used as gospel in all cases. Use your best judgement and if you feel that breaking PEP008 is valid in a certain case, make a note of it and be ready to justify it!

JoshuaBixby
MVP Esteemed Contributor

Stripping the test down to the most basic steps and removing IPython, I see equality checking costing quite a bit more than identity checking.

C:\>python -m timeit -n 1000000 -v "value = None; value != None"
raw times: 34.2 msec, 34.1 msec, 33.6 msec, 33.7 msec, 33.4 msec

1000000 loops, best of 5: 33.4 nsec per loop

C:\>python -m timeit -n 1000000 -v "value = None; value is not None"
raw times: 21.9 msec, 21.9 msec, 21.9 msec, 21.9 msec, 22.3 msec

1000000 loops, best of 5: 21.9 nsec per loop

Granted, both equality checks and identity checks are extremely fast for Python built-in constants, so on a practical level the choice would not make a difference in the overall performance of a Python function or code block.  I only mentioned performance to point out there is no performance benefit to go against best practice.

I agree that PEP008 should not be followed dogmatically, but I cannot imagine a Python core developer advocating for an equality check over an identity check when it comes to a singleton.

This might just be an-agree-to-disagree situation, but the discussion is good to have regardless.

HaydenWelch
MVP Regular Contributor

I think there was a miscommunication here, I'd never recommend that someone use equality over identity for checking against sentinel values. My example was just inverting the value since all conditions that the original poster wanted to check for evaluate to False. That check is actually even faster than the identity check:

python -m timeit -n 1000000 -v "val = None; not val"
raw times: 10.9 msec, 10.9 msec, 10.8 msec, 10.9 msec, 10.8 msec

python -m timeit -n 1000000 -v "val = None; val is not None"
raw times: 12.6 msec, 11.5 msec, 11.5 msec, 11.4 msec, 11.5 msec

 

Disassembled, you can see the unary not saves on a load:

>>> val = None 
>>> dis.dis("not val")
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (val)
              4 UNARY_NOT
              6 RETURN_VALUE

>>> dis.dis("val is not None")
  0           0 RESUME                   0

  1           2 LOAD_NAME                0 (val)
              4 LOAD_CONST               0 (None)
              6 IS_OP                    1
              8 RETURN_VALUE
0 Kudos
AllenDailey1
Frequent Contributor

Thank you!

I changed my code to simply "if not row[...]:" and it works.  Thanks for the reminder about this possibility.

DanPatterson
MVP Esteemed Contributor

I prefer dump and proceed

a = ["", " ", None]
val = ["A", "", "B", " ", None]
for i in val:
    if i in a:
        print("ignore")
    else:
        print(i)
        
A
ignore
B
ignore
ignore

... sort of retired...
JoshuaBixby
MVP Esteemed Contributor

Although SQL NULL gets mapped to Python None, they are not logically identical because Python by default does not implement three-value logic.  Whereas SQL logical operators in most systems support TRUE/FALSE/NULL, Python logical operators only support True/False.  From 6. Expressions - Python 3.y.z documentation:

In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: False, None, numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets).

The issue isn't that your logic is incorrect, or that the code is incorrect (although it may be), it is that you are trying to interpret results for Python logical expressions as if they were SQL logical expressions.

0 Kudos
DavidSolari
MVP Regular Contributor

I tried replicating this with the default "Point Notes" layer template and the following code but what I get is the expected behaviour: nulls pass but empty strings fail

with arcpy.da.UpdateCursor('Point Notes', "Name") as cur:
    for row in cur:
        if row[0] is None and row[0] != "":
            print(f"[{row[0]}] passed")
        else:
            print(f"[{row[0]}] failed")

>>>[None] passed
>>>[] failed

As you'd expect, it doesn't matter what the right hand expression is or what data's in the field, because the None check fails and stops the condition outright. All I can suggest is taking a closer look at your data or how that field index dict is constructed, I can't figure out why this is happening from the context provided.