Select to view content in your preferred language

Compare two tables field values using wildcard

9211
42
Jump to solution
01-12-2018 01:28 AM
Per_Olav_Kaasa
Emerging Contributor

Hello, I'm currently working with one feature class table containing 400 000 objects and one database table containing 700 objects.  I'm currently working in ArcMap 10.4 and using a Geodatabase with Python 2.7. What I'm trying to do is match the database table containing 700 objects against the FC. What I want is to match them, treating all 700 objects as wildcars %700 ojects%, and match those towards the FC that contains text from the table.  I have 2 tables with 1 column in each that im trying to match. I have not yet found a good way to treat all rows in a column as wildcard to search against another column. 

This is my code so far. This only will print out what the database table values. Next i wanted to get the FC values and compare them. I've read about maybe using a dictionary to make it easier to extract values but I've yet to figure out how.

import arcpy, unicodedata, datetime, openpyxl 
arcpy.env.workspace =r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb"

def rows_as_dicts(cursor):
    colnames = cursor.fields
    for row in cursor:
        yield dict(zip(colnames, row)

with arcpy.da.SearchCursor(r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb\Ordbok", '*') as sc:
    for row in rows_as_dicts(sc):
        print row['field']‍‍‍‍‍‍‍‍‍‍‍

Any input will be appreciated.

0 Kudos
42 Replies
XanderBakker
Esri Esteemed Contributor

To create a list from all the values in a field you can do something like this:

import arcpy

# settings
tbl1 = r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb\YourTableName1"
fld1= "your field name from table 1"
tbl2 = r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb\YourTableName2"
fld1= "your field name from table 2"

# create lists from tables
lst1 = [r[0] for r in arcpy.da.SearchCursor(tbl1, (fld1))]
lst2 = [r[0] for r in arcpy.da.SearchCursor(tbl2, (fld2))]
‍‍‍‍‍‍‍‍‍‍‍
# optionally you can use list(set(lst1)) and list(set(lst2)) to get lists with unique values

To compare the lists you can use something like this (it would be easier to have access to the data, this example creates random text values in both lists):

def main():
    import random

    # create list of table 1
    lst1 = [chr(i+65) for i in range(10)]
    print("\nContent Table 1:\n{}".format(lst1))

    lst2 = []
    for a in range(100):
        i = random.randrange(0, 10)
        chr1 = lst1[i]
        j = random.randrange(0, 26)
        chr2 = chr(j+65)
        txt = chr1 + chr2
        lst2.append(txt)
    print("\nContent Table 2:\n{}".format(lst2))

    # dict 1
    dct1 = {}
    for value1 in lst1:
        for value2 in lst2:
            if value1.upper() in value2.upper():
                if value1 in dct1:
                    data = dct1[value1]
                    data.append(value2)
                    data = list(set(data))
                    dct1[value1] = data
                else:
                    data = [value2]
                    dct1[value1] = data
    print("\nContent Dict 1:\n{}".format(dct1))

    # dict 2
    dct2 = {}
    for value2 in lst2:
        for value1 in lst1:
            if value1.upper() in value2.upper():
                if value2 in dct2:
                    data = dct2[value2]
                    data.append(value1)
                    data = list(set(data))
                    dct2[value2] = data
                else:
                    data = [value1]
                    dct2[value2] = data
    print("\nContent Dict 2:\n{}".format(dct2))

if __name__ == '__main__':
    main()

This will yield:

Content Table 1:
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']

Content Table 2:
['CO', 'CA', 'GK', 'HT', 'HZ', 'IF', 'BJ', 'CD', 'GF', 'FS', 'FP', 'DO', 'BH', 'CW', 'CA', 'BV', 'IQ', 'DN', 'AC', 'IP', 'GY', 'EK', 'JH', 'EC', 'DN', 'CO', 'GM', 'FT', 'IV', 'AX', 'JO', 'HT', 'GK', 'JL', 'CX', 'HQ', 'EF', 'GM', 'CB', 'JB', 'DC', 'CE', 'AM', 'EA', 'CA', 'CA', 'EB', 'FI', 'HM', 'DO', 'FN', 'HI', 'AA', 'CL', 'HR', 'AS', 'HA', 'DI', 'BI', 'IJ', 'FN', 'IJ', 'GC', 'AQ', 'DF', 'HF', 'II', 'EP', 'AW', 'AP', 'FO', 'CY', 'FL', 'AS', 'BT', 'DD', 'BX', 'JH', 'JV', 'JN', 'JL', 'HR', 'IK', 'DU', 'BY', 'AP', 'FJ', 'EN', 'AT', 'IU', 'AL', 'EQ', 'DS', 'JP', 'HU', 'IN', 'IM', 'HV', 'CE', 'II']

Content Dict 1:
{'A': ['AA', 'AC', 'EA', 'CA', 'AM', 'AL', 'AQ', 'AP', 'AS', 'AT', 'AW', 'AX', 'HA'], 'C': ['AC', 'CO', 'CL', 'CB', 'CA', 'DC', 'EC', 'CD', 'CY', 'CX', 'GC', 'CE', 'CW'], 'B': ['CB', 'BH', 'BI', 'BJ', 'EB', 'BT', 'BV', 'BX', 'BY', 'JB'], 'E': ['EN', 'EK', 'EF', 'EA', 'CE', 'EB', 'EC', 'EQ', 'EP'], 'D': ['DN', 'DO', 'DI', 'DF', 'DD', 'DC', 'CD', 'DU', 'DS'], 'G': ['GC', 'GF', 'GM', 'GK', 'GY'], 'F': ['FP', 'FS', 'FT', 'DF', 'HF', 'EF', 'GF', 'FI', 'FJ', 'FO', 'FL', 'FN', 'IF'], 'I': ['DI', 'IQ', 'IP', 'IU', 'BI', 'IV', 'II', 'IK', 'IJ', 'HI', 'IN', 'IM', 'FI', 'IF'], 'H': ['HZ', 'HR', 'HV', 'HQ', 'BH', 'HT', 'HU', 'HI', 'JH', 'HM', 'HA', 'HF'], 'J': ['JO', 'JV', 'JP', 'BJ', 'JL', 'JN', 'IJ', 'JH', 'FJ', 'JB']}

Content Dict 2:
{'HZ': ['H'], 'FP': ['F'], 'FS': ['F'], 'FT': ['F'], 'DO': ['D'], 'HR': ['H'], 'EA': ['A', 'E'], 'AA': ['A'], 'HQ': ['H'], 'BH': ['H', 'B'], 'DC': ['C', 'D'], 'HT': ['H'], 'HU': ['H'], 'JL': ['J'], 'BV': ['B'], 'JO': ['J'], 'JH': ['H', 'J'], 'FL': ['F'], 'HM': ['H'], 'JV': ['J'], 'FI': ['I', 'F'], 'FJ': ['J', 'F'], 'HA': ['A', 'H'], 'HF': ['H', 'F'], 'BY': ['B'], 'JB': ['J', 'B'], 'FO': ['F'], 'HV': ['H'], 'DU': ['D'], 'GK': ['G'], 'DI': ['I', 'D'], 'IJ': ['I', 'J'], 'DS': ['D'], 'JP': ['J'], 'DF': ['D', 'F'], 'FN': ['F'], 'CB': ['C', 'B'], 'IK': ['I'], 'DD': ['D'], 'BX': ['B'], 'EC': ['C', 'E'], 'AC': ['A', 'C'], 'EN': ['E'], 'CO': ['C'], 'EK': ['E'], 'CL': ['C'], 'IQ': ['I'], 'IP': ['I'], 'CA': ['A', 'C'], 'EF': ['E', 'F'], 'AM': ['A'], 'AL': ['A'], 'GY': ['G'], 'CD': ['C', 'D'], 'AQ': ['A'], 'GF': ['G', 'F'], 'AS': ['A'], 'CX': ['C'], 'CE': ['C', 'E'], 'BJ': ['J', 'B'], 'AW': ['A'], 'IN': ['I'], 'AX': ['A'], 'GM': ['G'], 'IV': ['I'], 'CW': ['C'], 'EP': ['E'], 'EB': ['B', 'E'], 'IF': ['I', 'F'], 'II': ['I'], 'DN': ['D'], 'BT': ['B'], 'EQ': ['E'], 'AP': ['A'], 'CY': ['C'], 'HI': ['I', 'H'], 'JN': ['J'], 'IU': ['I'], 'IM': ['I'], 'GC': ['C', 'G'], 'BI': ['I', 'B'], 'AT': ['A']}
0 Kudos
Per_Olav_Kaasa
Emerging Contributor

Thank you for the example and help so far, I'm uploading 2 screenshots from the tables to give you a better understanding of what I'm trying to compare. Table1 is to the left and Table2 to the right. 

Table1Table2

0 Kudos
XanderBakker
Esri Esteemed Contributor

Can you share (a portion of) the actual data. At first sight I don't see many matches in the screenshot you provided.

Per_Olav_Kaasa
Emerging Contributor

Uploaded the files in reply below.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Since column1, Navn, can contain multiple terms/tags/tokens that you want to match; the SQL will get complicated enough that I don't know if every data store that ArcGIS supports will work.  I think working within Python will be the way to go.

Instead of a screenshot, can you export those to a text or CSV file and attach it?  As helpful as a screenshot can be, it is much easier to work with actual data than a picture of data.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Whether Python or SQL, you have a lot of conditions you are trying to work with.  For example, what does a comma and parentheses mean in column1?  Do commas separate multiple terms?  If so, do both terms need to be present to match column2?  What about parentheses?  What about the one row that is in quotes, is that one term or multiple terms?

JoshuaBixby
MVP Esteemed Contributor

Will the new output, let's call it table3, just look like?

column1column2
MountainMountainrange
MountainWestmountainrange
passDonnors Pass
passNorthwest Passage
passDyatlov Pass
Per_Olav_Kaasa
Emerging Contributor

The new output should show something like that yes, the text that is separated by comma in column1 is a diffraction of a word, that must both be checked against column2, as 1 object. So i need to have the code to treat comma as a delimiter to jump to next keyword.  The ones in parenthesis is just a short explanation in other languages, not important for the actual search. Column1 is a dictionary, while column2 is a placename base for a large area. I'm uploading the files here, but only a portion(50 000) of the 400 000 in table2 since its too big.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I realize part of our conversation is being lost in translation, so please bear with my questions....

Looking at Ordbok.csv, I assume ORDLISTE_–_GLOSSARY is the field/column containing terms for matching.  Earlier, you stated that "both [must] be checked against column2, as 1 object."  It is that last part, "as 1 object," that is throwing me.  I think an example will be best for my question.  Looking at Ordbok.csv:

OBJECTIDORDLISTE_–_GLOSSARY
56bru, brua
57bru, brui
58brun, brunen

Looking at ObjectID 56, is the match "bru OR brua" or "bru AND brua".  I was assuming the first, but your "as 1 object" statement made me question my assumption.

XanderBakker
Esri Esteemed Contributor

If I look at the data I kinda feel that the objective might be comparing "SNAVN" from the file Navn.csv against the field "field" from the file Ordbok.csv. That is just a guess and I don't we should be guessing...

There will be multiple hits either way and one should define what to do in that case. Lot of question marks before we can continue to write some code.

0 Kudos