Hi everyone,
I am trying to concatenate address strings across a dataset that has 3.5 million records. I want to concatenate the values of eight fields into a single string, but many of the fields have null values and many have spaces instead of nulls or extra spaces, etc.
What is the most efficient way in Python or Arcade to join strings ignoring nulls or spaces and just having a single space between the joined values? I usually use a function like " ".join((i for i in (str1,str2,str3) if i)) to do this, but across millions of records I'm hoping to avoid the bottleneck of using an if statement. Also, I'm hoping to avoid having to do trim calculations on each field first. Is there a more efficient way to do this?
Thanks!
Hi @asmith_tssw.,
Depending on the method, either python or arcade, you can easily achieve this. You can do something like this:
Python:
Code Expression:
!Concatenation( FieldA , FieldB , FieldC, etc... )!
Code Block:
def Concatenation( FieldA , FieldB , FieldC, etc... ):
Values = [FieldA , FieldB , FieldC, etc...]
ConcatNonNullValues = ''.join([ x for x in Values if x is not None ])
return ConcatNonNullValues
Arcade:
var ValueList = [ $feature.FieldA , $feature.FieldB, $feature.FieldC, etc... ]
var ConcatValue = ''
for ( var index in ValueList ){
ListValue = ValueList[ index ]
if ( IsEmpty( ListValue ) == False ){ ConcatValue + ListValue }
}
return ConcatValue
You can use several arcade functions to make the code above easier, but it depends on how you want to go about it.
Feel free to play around with the code above. It should work for what you need or at least point you in the right direction.
I use this one quite often:
strval = ' '.join(filter(None, [val1, val2, val3,val4]))
The space in the single quotes is the delimiter (single space). If any of the val variables have a value, it will be concatenated, but, if Null, will just be ignored without error.
R_
A Python if statement will be far from the slowest part of this process.
v = ['a', 'c', 'b', 'd', None]
v*2 # --- just to show you can expand lists
['a', 'c', 'b', 'd', None, 'a', 'c', 'b', 'd', None]
v0 = v*1000000 # lets make one with 5 million values
# ---- As Joshua says, it isn't slow, and there is an `if` statement in the #
list comprehension
%timeit ' '.join([x for x in v0 if x is not None ])
217 ms ± 31.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# ---- What about filter using the first 4 values agains `None`
' '.join(filter(None, v[:4]))
'a c b d'
# ---- no slouch either, and largely insignificant
%timeit ' '.join(filter(None, v0))
170 ms ± 8.33 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
slow coding night 😉