Compare two tables field values using wildcard

Per_Olav_Kaasa · ‎01-12-2018

Hello, I'm currently working with one feature class table containing 400 000 objects and one database table containing 700 objects. I'm currently working in ArcMap 10.4 and using a Geodatabase with Python 2.7. What I'm trying to do is match the database table containing 700 objects against the FC. What I want is to match them, treating all 700 objects as wildcars %700 ojects%, and match those towards the FC that contains text from the table. I have 2 tables with 1 column in each that im trying to match. I have not yet found a good way to treat all rows in a column as wildcard to search against another column.

This is my code so far. This only will print out what the database table values. Next i wanted to get the FC values and compare them. I've read about maybe using a dictionary to make it easier to extract values but I've yet to figure out how.

import arcpy, unicodedata, datetime, openpyxl 
arcpy.env.workspace =r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb"

def rows_as_dicts(cursor):
    colnames = cursor.fields
    for row in cursor:
        yield dict(zip(colnames, row)

with arcpy.da.SearchCursor(r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb\Ordbok", '*') as sc:
    for row in rows_as_dicts(sc):
        print row['field']‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Any input will be appreciated.

XanderBakker · ‎01-19-2018

Hi Per Olav Kaasa ,

You can try the code below. It will take some time to run. Be sure to run this on the uncompressed version of the data (it will not be able to run on compressed data).Change the following:

lines 7 to 11, names and paths to the data to be used (line 9 contains the output field for the results which will be added to the annotation in case it does not exist). Don't have the annotation open with Desktop, since this will create a lock on the data and the process may fail.

# -*- coding: utf-8 -*-

def main():
    import arcpy

    # settings (first uncompress geodatabase data!)
    fc_anno = r'C:\GeoNet\SearchWildcard\gdb\TestBase.gdb\Navn_org'
    fld_anno = 'SNAVN'  # field to match with dictionary
    fld_out = 'OrdbokMatch'  # output field name
    tbl = r'C:\GeoNet\SearchWildcard\gdb\TestBase.gdb\Ordbok'
    fld_tbl = 'field'  # dictionary data

    # create dictionary list
    lst_dct = [stripText2list(r[0]) for r in arcpy.da.SearchCursor(tbl, (fld_tbl))]

    # Add output field to annotation
    AddField(fc_anno, fld_out, "TEXT", 100)

    # loop through annotation and update output field
    cnt = 0
    # dct_hits = {}
    with arcpy.da.UpdateCursor(fc_anno, (fld_anno, fld_out)) as curs:
        for row in curs:
            cnt += 1
            if cnt % 5000 == 0:
                print "Processing row:", cnt
            snavn = row[0]
            lst_items = stripText2list(snavn)
            result = FindMatches(lst_items, lst_dct)
            row[1] = result
            curs.updateRow(row)


def FindMatches(lst_items, lst_dct):
    result = None
    data = []
    for lst_validate in lst_dct:
        for validate in lst_validate:
            for item in lst_items:
                if validate.upper() in item.upper():
                    data.append(validate)
    if len(data) > 0:
        data = list(set(data))
        result = data[0]
        for data_item in data[1:]:
            result += ", " + data_item
    return result

def AddField(fc, fld_name, fld_type, fld_length=None):
    if len(arcpy.ListFields(fc, fld_name)) == 0:
        arcpy.AddField_management(fc, fld_name, fld_type, None, None, fld_length)

def stripText2list(text):
    text_ori = text
    lst_ok = []
    if text is None:
        return lst_ok
    else:
        s = text.find('(')
        if s != -1:
            e = text.find(')')
            if e == -1:
                e = len(text)-1
            text = text[0:s] + text[e:len(text)-1]
        text = text.replace('"','')
        text = text.replace(';','')
        text = text.replace('-','')
        text = text.replace(',',' ')
        text = text.replace('  ','')
        lst = text.split(' ')

        for a in lst:
            a = a.strip()
            if a != '':
                if a[0] != '(':
                    if a[-1:] != ')':
                        lst_ok.append(a)
        return lst_ok

if __name__ == '__main__':
    main()
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

DanPatterson_Retired · ‎01-12-2018

Per, I had to edit your code formatting so it was readable, check that I got the indentation right.

For the future Code Formatting... the basics++

JoshuaBixby · ‎01-12-2018

I must admit, I don't quite understand the desired outcome. If you could provide an example with some screenshots that would be most helpful.

Per_Olav_Kaasa · ‎01-15-2018

Sorry if I explained it poorly. What i want is to match column1 from table1(700 objects) towards column2 (400 000 objects) from table2. Each row in column1 might have several matches against column2. Column1 only contain parts of the text in column2, like keywords, meaning for example if column1 has "Mountain", column2 might have "Mountainrange", or "Westmountainrange". So each side of the text in column2 can have random values, meaning wildcards I want to match against column1. I cannot perform a join or relate, since they only partly match against each other.

TedKowal · ‎01-12-2018

Reading you request, it sounds like you are trying to match one set1 (700) things against another set2 (400,00) based upon one field.... Have you tried using a SQL query?

Select a.matchingsetfield, a.fieldstodisplay, b.fieldstodisplay  
From set1 as a
Join set2 as b
On a.matchinsetfield = b.matchingsetfield;‍‍‍‍‍‍‍‍‍‍

TedKowal · ‎01-19-2018

So in further reading, what you are trying to do is compare to lists against each other. The python solution discussed is a good one and very flexible. However, an alternative, there are SQL solutions to this as well. Depending on the Database you are using .... I use the following SQL model for comparison..

DECLARE @v1 VARCHAR(MAX) = 'Lime, Tequila, Salt, Sugar',
        @v2 VARCHAR(MAX) = 'Party, Fun, TGIF';

SELECT CASE WHEN EXISTS 
(
  SELECT 1 
    FROM dbo.Split(@v1) AS a
    INNER JOIN dbo.Split(@v2) AS b
    ON a.Item = b.Item
)
THEN 1 ELSE 0 END;‍‍‍‍‍‍‍‍‍‍‍

If your database does not support this type of operation then create two temporary tables fill in the temp tables built from the field lists from both layers then do a simple join.

Again the Python solution is excellent, but I wanted to give an alternate path to the same problem.

JoeBorgione · ‎01-12-2018

Can you use a relate?

That should just about do it....

Per_Olav_Kaasa · ‎01-15-2018

Relate will only give me partial matches, like i explained in my answer above, column2 will have random keywords on both sides. A relate will only match those that have a 100% match towards each other.

XanderBakker · ‎01-15-2018

Once you find the matched, what do you intent to do with it? Will you write something in table1, table2, both or create some new output with it? You can create two dictionaries or lists from the two fields, but since there is only a partial match you will probably need to loop through it. Depending on what you want to do, this will require additional information and define if it is better to compare table1 with table2 or the other way around.

Per_Olav_Kaasa · ‎01-15-2018

I will create a new output of the matches in table1. Those that match will be used as a dictionary for the names in table2 in a map layout. How will i create a dictionary from the two fields? I have tried looking for that solution for a while.