Select by attribute based on whether one field is a substring of another field

06-13-2016 02:00 PM
New Contributor III

I have been looking for a way to use the SelectLayerByAttribute function to select records in Python based on whether or not one field is a substring of another field.  For example, in the table view below, I would want to select the first three records because each of their sub_text strings is contained within the full_text string, or use a NOT version to select the fourth record.  I had tried using the LIKE statement (Sub_Text LIKE Full_Text) but functionally it works the same as an = statement and thus selects nothing.  So far I've found out how to compare a field to a fixed string within the query, or to see if a string is located at various character positions within the string, but that doesn't really help. 

0 Kudos
6 Replies
MVP Esteemed Contributor

see Building a query expression—Help | ArcGIS for Desktop

you need to use a wildcard with your selection... see the searching strings section... and its example

STATE_NAME LIKE 'Miss%'    for anything after Miss...

STATE_NAME LIKE '%e%'      any state with an e in it

STATE_NAME LIKE '%a'          any state ending in an a

MVP Esteemed Contributor

Started this earlier, but then realized you wanted the SelectLayerByAttribute part of it...but I'll add this to Dan's comment.  Just for a simple way to loop thru the two fields.

Here is one way to check, if you get you info into an array.  I'll let the array guru's add there touch to this.  And if you have a lot of data, you most likely would want to use arcpy.da.UpdateCursor or similar, especially if you are wanting to update another field.  but this is a simple example, with an add records with a word "is" that IS in all of them...except my added record.

recs = [[1,    "thisisaverylongstring",    "very"], [2,    "thisistrulyanevenlongerstring",    "longer"], [3,    "thisstringissolongyoucouldmakeasweateroutofit",    "sweater"], [4,    "idontwantanysubtextinthisstring",    "really"], [5, "whatever", "is"]]

#import arcpy
fullArray = []
subArray = []
for rec in recs:
print("fullArray: {0}".format(fullArray))
print("subArray: {0}".format(subArray))
for subText in subArray:
    for fullText in fullArray:
        if subText in fullText:
            print("{0} is within {1}".format(subText, fullText))
            print("{0} is NOT within {1}".format(subText, fullText))

The output will be

fullArray: ['thisisaverylongstring', 'thisistrulyanevenlongerstring', 'thisstringissolongyoucouldmakeasweateroutofit', 'idontwantanysubtextinthisstring', 'whatever']

subArray: ['very', 'longer', 'sweater', 'really', 'is']

very is within thisisaverylongstring

very is NOT within thisistrulyanevenlongerstring

very is NOT within thisstringissolongyoucouldmakeasweateroutofit

very is NOT within idontwantanysubtextinthisstring

very is NOT within whatever

longer is NOT within thisisaverylongstring

longer is within thisistrulyanevenlongerstring

longer is NOT within thisstringissolongyoucouldmakeasweateroutofit

longer is NOT within idontwantanysubtextinthisstring

longer is NOT within whatever

sweater is NOT within thisisaverylongstring

sweater is NOT within thisistrulyanevenlongerstring

sweater is within thisstringissolongyoucouldmakeasweateroutofit

sweater is NOT within idontwantanysubtextinthisstring

sweater is NOT within whatever

really is NOT within thisisaverylongstring

really is NOT within thisistrulyanevenlongerstring

really is NOT within thisstringissolongyoucouldmakeasweateroutofit

really is NOT within idontwantanysubtextinthisstring

really is NOT within whatever

is is within thisisaverylongstring

is is within thisistrulyanevenlongerstring

is is within thisstringissolongyoucouldmakeasweateroutofit

is is within idontwantanysubtextinthisstring

is is NOT within whatever

MVP Esteemed Contributor

Since my last response [edit :was so long], I'm separating this.  I decided to play around with this, and first (build off  my script above) was grabbing ANY record that had ANY of the Sub_Text words within Full_Text, but then I realized that is probably not what you wanted.  So, here is a sample to run thru the records, and build an equation on the fly.  Since it is a table, I'm creating another table, but you could use it in the SelectByAttributes if it is a featureclass.  I'll attach my test table just so you can see what I did (you'll have to change the input path)

import arcpy
import os
inTable = r'C:\__temp\test.gdb\testFld1Fld2'
fields = ('OBJECTID', 'Full_Text', 'Sub_Text')
npArray = arcpy.da.TableToNumPyArray(inTable, fields)
ws = arcpy.env.workspace = os.path.dirname(inTable)
arcpy.env.overwriteOutput = True

qcnt = 0
myQuery = ""

for aRow in npArray:
    if aRow[2] in aRow[1]:  
        print("OBJECTID:{0}  {2} IS IN {1}".format(aRow[0], aRow[1], aRow[2]))
        quickTest = ("OBJECTID = {0}".format(aRow[0]))    # will eliminate duplicates
        if qcnt == 0:
            myQuery = ("OBJECTID = {0} OR ".format(aRow[0]))
            qcnt += 1
        elif quickTest not in myQuery:
            myQuery = ("{0} OBJECTID = {1} OR ".format(myQuery, aRow[0]))

myQuery = myQuery.rstrip(" OR ")
print myQuery

arcpy.TableToTable_conversion(inTable, ws, "outtest", myQuery, "#", "")

del npArray

I'm sure others could improve on this, but hopefully this can get you started.

Edit 2: forgot to attach the test gdb/

edit 3: fyi, the length of the query, in the way I have it above will hit the max length fairly fast.  Since that really isn't practical for large files, other ways to do it include having the first do a select as "NEW SELECTION" and then add to the seletion each round...but this will most likely be slow.  You could modify it to build a query of x number/length, then add to the selection, and repeat.  or you could write the selected record to a temp file, have that collect everything, then create your new feature layer at the end.  That may be you best bet.

0 Kudos
MVP Frequent Contributor

Just to be different, I used numpy and pandas to get the result.  Basically just converts the input FC into a pandas DataFrame, apply filter, create output table, join back to original FC and apply final SelectByAttributes. 

import arcpy
import pandas as pd
import numpy as np
arrOut = r'in_memory\_arrOut'
items = ['very','longer','sweater','really']
fc = r'<input feature class>'
fl = arcpy.MakeFeatureLayer_management(fc, "inputFC")
flds = [ for f in arcpy.ListFields(fc)]
tarr = arcpy.da.TableToNumPyArray(fc,flds)
df = pd.DataFrame(tarr,columns=['OID','Full_Text','flag'])

for item in items:
    df_result = df[df['Full_Text'].str.contains(item)]

listofdfs = pd.concat(dflist)    
dfar = listofdfs.to_records()
tarrout = np.array(dfar, np.dtype([('ID', np.int32),('Full_Text', '|S255'),('flag', '|S1')]))
arcpy.da.NumPyArrayToTable(tarrout, r"in_memory\numpytab2")

arcpy.AddJoin_management(fl, "OBJECTID", r"in_memory\numpytab2", "OID", "KEEP_COMMON")
exp = """ "flag" = 'x' """
arcpy.SelectLayerByAttribute_management(fl, "NEW_SELECTION", exp)
MVP Esteemed Contributor

image.png and your numpy badge for that

New Contributor II

Not sure if anyone is still needs help with this, but here is a more up-to-date solution:

Example - select features for which Field1='Forest' and Field2='Pine Forest'. 

The final expression will be:

"Field2" LIKE CONCAT(CONCAT('%',SUBSTRING("Field1",1,CHAR_LENGTH("Field1"))),'%')

0 Kudos