Select by attribute based on whether one field is a substring of another field

StephenRhone · ‎06-13-2016

I have been looking for a way to use the SelectLayerByAttribute function to select records in Python based on whether or not one field is a substring of another field. For example, in the table view below, I would want to select the first three records because each of their sub_text strings is contained within the full_text string, or use a NOT version to select the fourth record. I had tried using the LIKE statement (Sub_Text LIKE Full_Text) but functionally it works the same as an = statement and thus selects nothing. So far I've found out how to compare a field to a fixed string within the query, or to see if a string is located at various character positions within the string, but that doesn't really help.

OID	Full_Text	Sub_Text
1	thisisaverylongstring	very
2	thisistrulyanevenlongerstring	longer
3	thisstringissolongyoucouldmakeasweateroutofit	sweater
4	idontwantanysubtextinthisstring	really

DanPatterson_Retired · ‎06-13-2016

see Building a query expression—Help | ArcGIS for Desktop

you need to use a wildcard with your selection... see the searching strings section... and its example

STATE_NAME LIKE 'Miss%' for anything after Miss...

STATE_NAME LIKE '%e%' any state with an e in it

STATE_NAME LIKE '%a' any state ending in an a

RebeccaStrauch__GISP · ‎06-13-2016

Started this earlier, but then realized you wanted the SelectLayerByAttribute part of it...but I'll add this to Dan's comment. Just for a simple way to loop thru the two fields.

Here is one way to check, if you get you info into an array. I'll let the array guru's add there touch to this. And if you have a lot of data, you most likely would want to use arcpy.da.UpdateCursor or similar, especially if you are wanting to update another field. but this is a simple example, with an add records with a word "is" that IS in all of them...except my added record.

recs = [[1,    "thisisaverylongstring",    "very"], [2,    "thisistrulyanevenlongerstring",    "longer"], [3,    "thisstringissolongyoucouldmakeasweateroutofit",    "sweater"], [4,    "idontwantanysubtextinthisstring",    "really"], [5, "whatever", "is"]]

#import arcpy
fullArray = []
subArray = []
for rec in recs:
    fullArray.append(rec[1])
    subArray.append(rec[2])
    
print("fullArray: {0}".format(fullArray))
print("subArray: {0}".format(subArray))
    
for subText in subArray:
    for fullText in fullArray:
        if subText in fullText:
            print("{0} is within {1}".format(subText, fullText))
        else:
            print("{0} is NOT within {1}".format(subText, fullText))

The output will be

fullArray: ['thisisaverylongstring', 'thisistrulyanevenlongerstring', 'thisstringissolongyoucouldmakeasweateroutofit', 'idontwantanysubtextinthisstring', 'whatever']
subArray: ['very', 'longer', 'sweater', 'really', 'is']
very is within thisisaverylongstring
very is NOT within thisistrulyanevenlongerstring
very is NOT within thisstringissolongyoucouldmakeasweateroutofit
very is NOT within idontwantanysubtextinthisstring
very is NOT within whatever
longer is NOT within thisisaverylongstring
longer is within thisistrulyanevenlongerstring
longer is NOT within thisstringissolongyoucouldmakeasweateroutofit
longer is NOT within idontwantanysubtextinthisstring
longer is NOT within whatever
sweater is NOT within thisisaverylongstring
sweater is NOT within thisistrulyanevenlongerstring
sweater is within thisstringissolongyoucouldmakeasweateroutofit
sweater is NOT within idontwantanysubtextinthisstring
sweater is NOT within whatever
really is NOT within thisisaverylongstring
really is NOT within thisistrulyanevenlongerstring
really is NOT within thisstringissolongyoucouldmakeasweateroutofit
really is NOT within idontwantanysubtextinthisstring
really is NOT within whatever
is is within thisisaverylongstring
is is within thisistrulyanevenlongerstring
is is within thisstringissolongyoucouldmakeasweateroutofit
is is within idontwantanysubtextinthisstring
is is NOT within whatever

RebeccaStrauch__GISP · ‎06-13-2016

Since my last response [edit :was so long], I'm separating this. I decided to play around with this, and first (build off my script above) was grabbing ANY record that had ANY of the Sub_Text words within Full_Text, but then I realized that is probably not what you wanted. So, here is a sample to run thru the records, and build an equation on the fly. Since it is a table, I'm creating another table, but you could use it in the SelectByAttributes if it is a featureclass. I'll attach my test table just so you can see what I did (you'll have to change the input path)

import arcpy
import os
inTable = r'C:\__temp\test.gdb\testFld1Fld2'
fields = ('OBJECTID', 'Full_Text', 'Sub_Text')
npArray = arcpy.da.TableToNumPyArray(inTable, fields)
ws = arcpy.env.workspace = os.path.dirname(inTable)
arcpy.env.overwriteOutput = True

qcnt = 0
myQuery = ""

for aRow in npArray:
    print(aRow)
    if aRow[2] in aRow[1]:  
        print("OBJECTID:{0}  {2} IS IN {1}".format(aRow[0], aRow[1], aRow[2]))
        quickTest = ("OBJECTID = {0}".format(aRow[0]))    # will eliminate duplicates
        if qcnt == 0:
            myQuery = ("OBJECTID = {0} OR ".format(aRow[0]))
            qcnt += 1
        elif quickTest not in myQuery:
            myQuery = ("{0} OBJECTID = {1} OR ".format(myQuery, aRow[0]))

myQuery = myQuery.rstrip(" OR ")
print myQuery

arcpy.TableToTable_conversion(inTable, ws, "outtest", myQuery, "#", "")

del npArray

I'm sure others could improve on this, but hopefully this can get you started.

Edit 2: forgot to attach the test gdb/

edit 3: fyi, the length of the query, in the way I have it above will hit the max length fairly fast. Since that really isn't practical for large files, other ways to do it include having the first do a select as "NEW SELECTION" and then add to the seletion each round...but this will most likely be slow. You could modify it to build a query of x number/length, then add to the selection, and repeat. or you could write the selected record to a temp file, have that collect everything, then create your new feature layer at the end. That may be you best bet.

JamesCrandall · ‎06-14-2016

Just to be different, I used numpy and pandas to get the result. Basically just converts the input FC into a pandas DataFrame, apply filter, create output table, join back to original FC and apply final SelectByAttributes.

import arcpy
import pandas as pd
import numpy as np
    
dflist=[]
arrOut = r'in_memory\_arrOut'
items = ['very','longer','sweater','really']
fc = r'<input feature class>'
fl = arcpy.MakeFeatureLayer_management(fc, "inputFC")
flds = [f.name for f in arcpy.ListFields(fc)]
tarr = arcpy.da.TableToNumPyArray(fc,flds)
df = pd.DataFrame(tarr,columns=['OID','Full_Text','flag'])
df['flag']='x'

for item in items:
    df_result = df[df['Full_Text'].str.contains(item)]
    dflist.append(df_result)

listofdfs = pd.concat(dflist)    
dfar = listofdfs.to_records()
tarrout = np.array(dfar, np.dtype([('ID', np.int32),('Full_Text', '|S255'),('flag', '|S1')]))
arcpy.da.NumPyArrayToTable(tarrout, r"in_memory\numpytab2")

arcpy.AddJoin_management(fl, "OBJECTID", r"in_memory\numpytab2", "OID", "KEEP_COMMON")
exp = """ "flag" = 'x' """
arcpy.SelectLayerByAttribute_management(fl, "NEW_SELECTION", exp)

DanPatterson_Retired · ‎06-14-2016

and your numpy badge for that

AngelBarnett · ‎12-01-2021

Not sure if anyone is still needs help with this, but here is a more up-to-date solution:

Example - select features for which Field1='Forest' and Field2='Pine Forest'.

The final expression will be:

"Field2" LIKE CONCAT(CONCAT('%',SUBSTRING("Field1",1,CHAR_LENGTH("Field1"))),'%')

https://support.esri.com/en/technical-article/000007577