Concatenating Strings with Field Calculator and Python - dealing with NULLS

14332
20
Jump to solution
11-06-2015 11:08 AM
ChrisDonohue__GISP
MVP Alum

I have some addressing data that I would like to update and one of the tasks is to concatenate a new full street address field [ADDR1] by combining several of the component fields.  I know how to do this in VBScript, but figured this would be a good example to work out in Python, as I don't use Python often and have alot to learn.  My programming background is FORTRAN and BASIC, and in the years since those were popular hack and slash adapting code for AML/Avenue/VBA/VB.Net - so assume I know little about Python.

What is desired (psuedo-logic):

ADDR1 = STRNUM + STRNUMSUF + STRPREDIR + STRNAME + STRTYPE + STRSUFDIR

I'm using ArcGIS 10.2.1 and the data shown here is test data in a File Geodatabase feature class.

After poking around various Help articles, it appears that in Python the .format method in would work.  However, when I run it in Field Calculator the result looks good except "None" gets concatenated as text if the original field value was NULL (this issue doesn't come up with VBScript).  I'd like any values of NULL to instead just be blank - i.e. ignore them entirely.  For example, for the first record the desired outcome in ADDR1 is "224 D N CHURCH ST" not "224 D N CHURCH ST None".

Concatenation Python NULLS become None.png

One of the Python articles I found mentions the NULL issue:

Dealing with Null Values.jpg

Source:  Concatenating field values using Calculate Field | ArcGIS Blog

Is there a different way to do the concatenation in Python to achieve the desired outcome?  I'm trying to wrap my head around how Python approaches this, particularly in regards to using it in ArcMap in Field Calculator. 

  • I noticed some articles suggested casting all the fields to string to resolve the NULLS.  However, I haven't had any luck adding in str() in with the fields in the format statement shown.  It would just error out.  Would this need to be a separate line to do the casting?
  • Would a better approach be to use the Codeblock to run some sort of process to convert NULLS to blank before running the format method?  For example, adapt a process similar to this?

FixNull.jpg

Source:  arcgis desktop - Calculate Field tool to calculate on null fields - Geographic Information Systems S...

Lastly, can you recommend a good guide for beginners that explains how to use Python for these sort of issues?

Thanks,

Chris Donohue, GISP

20 Replies
ChrisDonohue__GISP
MVP Alum

That helps.  But I'm curious about something.  The code as you wrote has more detail and is easier to see the pieces.  However, I see a List and Append that are not visible in the one-liner Dan posted.  I'm assuming Join is accomplishing the same thing, is that correct?

Also, the looping in Python is throwing me a bit.  How does Python know to end a loop?  This reminds me of If-Then in other languages, but there is no explicit Then here.

Chris Donohue, GISP

0 Kudos
DarrenWiens2
MVP Honored Contributor
I'm assuming Join is accomplishing the same thing, is that correct?

Join doesn't do anything except cycle through a list, interleaving it with the thing (like a space). It's really the list comprehension that does the magic in creating the list.

How does Python know to end a loop?

I'm probably glazing over some subtleties, but you can think of Python "for" loops as "for each" loops. They cycle through through all the things, then quit.

ChrisDonohue__GISP
MVP Alum

I have a question then on List Comprehensions.

Using Dan's one-line example, I came up with the variation that fit my data:

" ".join([str(i) for i in [ !STRNUM!, !STRNUMSUF!, !STRPREDIR!, !STRNAME!, !STRTYPE!, !STRSUFDIR! ] if i])

So, the question is, does the for i automatically create a List Comprehension as part of using it?  It seems like in this case it does not need to be declared, as it was in your example.  Or is there a different process going on?  I'm assuming the for i is a looping process - is a List Comprehension built into it?  Just curious.

Chris Donohue, GISP

0 Kudos
DanPatterson_Retired
MVP Emeritus

for i doesn't create the list comprehension (LC), it is used to generate the values kept in the LC.  It is useful in a variety of contexts ie

>>> a = [ i for i in range(10) if i < 5]

>>> a

[0, 1, 2, 3, 4]

It is the same as using for loops, with the exception that you dont have to append or extend list.  The principles are the same.  The utility in your context is the ability to skip a code block and simplify it down to a simple field calculation.

I use LCs a lot, and I don't try to go for that smokin' one liner, I usually develop them stacked in one of my other posts then de-stack them when they work.  Visually it works for me.  As was said  6 of one, half dozen of another.

DarrenWiens2
MVP Honored Contributor

My understanding of list comprehensions (which is limited) is that it becomes a list comprehension when it meets the following pattern:

[return elements of some sort for something in some iterable if some condition] # all in an outgoing list

I'm not sure at which point Python decides that this is a list comprehension, or if that's even important. List comprehensions are a convenient way to build lists (although I find them confusing and really don't use them at all).

DanPatterson_Retired
MVP Emeritus

Well they are useful once you get used to them since it can simplify  existence conditions (exists) and truth values (is)

obj = []

obj_exists = isinstance(obj,list)

obj_is = obj is True

print("Exists ... {}\nTruth value... {} ".format(exists, obj_is))

Result

Exists ... True

Truth value... False

Because sometimes, you want to ensure that you are working with the right object and it meets the truth condition, since iterables can be many things, and it gets more complicated when you are working with arrays because there ae more existance and truth conditions.

DanPatterson_Retired
MVP Emeritus

Of if you want to stick with a list comprehension, you don't have to put a LC on one line, you can stack the syntax as follows (I posted on this as well)

>>> a = 12345
>>> b = None
>>> c = "some text"
>>> d = ""
>>> e = "more"
>>> " ".join([ str(i)
...           for i in [a,b,c,d,e]
...           if i ])
'12345 some text more'
>>>

We obviously have too little to do on a Friday

ChrisDonohue__GISP
MVP Alum

Thanks Dan, that works.  I'm going to play with it a bit to try to get a better handle on it.

I was puzzled by the space " " in front of the join - I realized it was a separator to keep all the text from running into each other in the combined string but was worried it would add a space in front of the result, but it does not. 

The "If i" is interesting.  Sort of like For-Next in other languages.

I wonder why the .join method treats Nulls differently than the .format method does....

Chris Donohue, GISP

0 Kudos
DanPatterson_Retired
MVP Emeritus

I gave a simple example so as not to obfuscate the potential.

In reality you can create your appropriate string without the join but you need a code block.  So just pretend, that the values assigned to a-e are field names.  Then you can generate the required number of { } in the mini-language formatting in this manner.

>>> a = 12345
>>> b = None
>>> c = "some text"
>>> d = ""
>>> e = "more"
>>> ok_flds = [i for i in [a,b,c,d,e] if i ]
>>> ok_flds
[12345, 'some text', 'more']
>>> frmt = ("{} "*len(ok_flds)).format(*ok_flds)
>>> frmt
'12345 some text more '
>>>

Now notice that I generated the braces with a space from the length of the fields that met the ok_flds condition ie the if i component.  Then the number of unknown fields is essentially unknown so you have to use *ok_flds ... note the star!!! to unpack the number of values from the list that is unknown in the number of braces is unknown.  If this sounds all kind of twighlighty...I posted a blog on formatting a while ago (actually several) but I am not at my link computer.

Once you get the hang of it, life is a lot simpler now that the formatting language exists.  The !s and !r formatting options are also particularly useful for some things ie

>>> a = np.array([1,2,3])
>>> print("string version...{!s}  repr version... {!r}".format(a,a))
string version...[1 2 3]  repr version... array([1, 2, 3])
>>>

So if you have nothing to do, check my posts on nothing and formatting something

AndrewSmith21
New Contributor

Thanks Dan

0 Kudos