Hello, I'm currently working with one feature class table containing 400 000 objects and one database table containing 700 objects. I'm currently working in ArcMap 10.4 and using a Geodatabase with Python 2.7. What I'm trying to do is match the database table containing 700 objects against the FC. What I want is to match them, treating all 700 objects as wildcars %700 ojects%, and match those towards the FC that contains text from the table. I have 2 tables with 1 column in each that im trying to match. I have not yet found a good way to treat all rows in a column as wildcard to search against another column.
This is my code so far. This only will print out what the database table values. Next i wanted to get the FC values and compare them. I've read about maybe using a dictionary to make it easier to extract values but I've yet to figure out how.
import arcpy, unicodedata, datetime, openpyxl
arcpy.env.workspace =r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb"
def rows_as_dicts(cursor):
colnames = cursor.fields
for row in cursor:
yield dict(zip(colnames, row)
with arcpy.da.SearchCursor(r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb\Ordbok", '*') as sc:
for row in rows_as_dicts(sc):
print row['field']
Any input will be appreciated.
Solved! Go to Solution.
To create a list from all the values in a field you can do something like this:
import arcpy
# settings
tbl1 = r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb\YourTableName1"
fld1= "your field name from table 1"
tbl2 = r"C:\Users\kaaper\Documents\DataDrivenPages\DataDrivenPages.gdb\YourTableName2"
fld1= "your field name from table 2"
# create lists from tables
lst1 = [r[0] for r in arcpy.da.SearchCursor(tbl1, (fld1))]
lst2 = [r[0] for r in arcpy.da.SearchCursor(tbl2, (fld2))]
# optionally you can use list(set(lst1)) and list(set(lst2)) to get lists with unique values
To compare the lists you can use something like this (it would be easier to have access to the data, this example creates random text values in both lists):
def main():
import random
# create list of table 1
lst1 = [chr(i+65) for i in range(10)]
print("\nContent Table 1:\n{}".format(lst1))
lst2 = []
for a in range(100):
i = random.randrange(0, 10)
chr1 = lst1[i]
j = random.randrange(0, 26)
chr2 = chr(j+65)
txt = chr1 + chr2
lst2.append(txt)
print("\nContent Table 2:\n{}".format(lst2))
# dict 1
dct1 = {}
for value1 in lst1:
for value2 in lst2:
if value1.upper() in value2.upper():
if value1 in dct1:
data = dct1[value1]
data.append(value2)
data = list(set(data))
dct1[value1] = data
else:
data = [value2]
dct1[value1] = data
print("\nContent Dict 1:\n{}".format(dct1))
# dict 2
dct2 = {}
for value2 in lst2:
for value1 in lst1:
if value1.upper() in value2.upper():
if value2 in dct2:
data = dct2[value2]
data.append(value1)
data = list(set(data))
dct2[value2] = data
else:
data = [value1]
dct2[value2] = data
print("\nContent Dict 2:\n{}".format(dct2))
if __name__ == '__main__':
main()
This will yield:
Content Table 1:
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J']
Content Table 2:
['CO', 'CA', 'GK', 'HT', 'HZ', 'IF', 'BJ', 'CD', 'GF', 'FS', 'FP', 'DO', 'BH', 'CW', 'CA', 'BV', 'IQ', 'DN', 'AC', 'IP', 'GY', 'EK', 'JH', 'EC', 'DN', 'CO', 'GM', 'FT', 'IV', 'AX', 'JO', 'HT', 'GK', 'JL', 'CX', 'HQ', 'EF', 'GM', 'CB', 'JB', 'DC', 'CE', 'AM', 'EA', 'CA', 'CA', 'EB', 'FI', 'HM', 'DO', 'FN', 'HI', 'AA', 'CL', 'HR', 'AS', 'HA', 'DI', 'BI', 'IJ', 'FN', 'IJ', 'GC', 'AQ', 'DF', 'HF', 'II', 'EP', 'AW', 'AP', 'FO', 'CY', 'FL', 'AS', 'BT', 'DD', 'BX', 'JH', 'JV', 'JN', 'JL', 'HR', 'IK', 'DU', 'BY', 'AP', 'FJ', 'EN', 'AT', 'IU', 'AL', 'EQ', 'DS', 'JP', 'HU', 'IN', 'IM', 'HV', 'CE', 'II']
Content Dict 1:
{'A': ['AA', 'AC', 'EA', 'CA', 'AM', 'AL', 'AQ', 'AP', 'AS', 'AT', 'AW', 'AX', 'HA'], 'C': ['AC', 'CO', 'CL', 'CB', 'CA', 'DC', 'EC', 'CD', 'CY', 'CX', 'GC', 'CE', 'CW'], 'B': ['CB', 'BH', 'BI', 'BJ', 'EB', 'BT', 'BV', 'BX', 'BY', 'JB'], 'E': ['EN', 'EK', 'EF', 'EA', 'CE', 'EB', 'EC', 'EQ', 'EP'], 'D': ['DN', 'DO', 'DI', 'DF', 'DD', 'DC', 'CD', 'DU', 'DS'], 'G': ['GC', 'GF', 'GM', 'GK', 'GY'], 'F': ['FP', 'FS', 'FT', 'DF', 'HF', 'EF', 'GF', 'FI', 'FJ', 'FO', 'FL', 'FN', 'IF'], 'I': ['DI', 'IQ', 'IP', 'IU', 'BI', 'IV', 'II', 'IK', 'IJ', 'HI', 'IN', 'IM', 'FI', 'IF'], 'H': ['HZ', 'HR', 'HV', 'HQ', 'BH', 'HT', 'HU', 'HI', 'JH', 'HM', 'HA', 'HF'], 'J': ['JO', 'JV', 'JP', 'BJ', 'JL', 'JN', 'IJ', 'JH', 'FJ', 'JB']}
Content Dict 2:
{'HZ': ['H'], 'FP': ['F'], 'FS': ['F'], 'FT': ['F'], 'DO': ['D'], 'HR': ['H'], 'EA': ['A', 'E'], 'AA': ['A'], 'HQ': ['H'], 'BH': ['H', 'B'], 'DC': ['C', 'D'], 'HT': ['H'], 'HU': ['H'], 'JL': ['J'], 'BV': ['B'], 'JO': ['J'], 'JH': ['H', 'J'], 'FL': ['F'], 'HM': ['H'], 'JV': ['J'], 'FI': ['I', 'F'], 'FJ': ['J', 'F'], 'HA': ['A', 'H'], 'HF': ['H', 'F'], 'BY': ['B'], 'JB': ['J', 'B'], 'FO': ['F'], 'HV': ['H'], 'DU': ['D'], 'GK': ['G'], 'DI': ['I', 'D'], 'IJ': ['I', 'J'], 'DS': ['D'], 'JP': ['J'], 'DF': ['D', 'F'], 'FN': ['F'], 'CB': ['C', 'B'], 'IK': ['I'], 'DD': ['D'], 'BX': ['B'], 'EC': ['C', 'E'], 'AC': ['A', 'C'], 'EN': ['E'], 'CO': ['C'], 'EK': ['E'], 'CL': ['C'], 'IQ': ['I'], 'IP': ['I'], 'CA': ['A', 'C'], 'EF': ['E', 'F'], 'AM': ['A'], 'AL': ['A'], 'GY': ['G'], 'CD': ['C', 'D'], 'AQ': ['A'], 'GF': ['G', 'F'], 'AS': ['A'], 'CX': ['C'], 'CE': ['C', 'E'], 'BJ': ['J', 'B'], 'AW': ['A'], 'IN': ['I'], 'AX': ['A'], 'GM': ['G'], 'IV': ['I'], 'CW': ['C'], 'EP': ['E'], 'EB': ['B', 'E'], 'IF': ['I', 'F'], 'II': ['I'], 'DN': ['D'], 'BT': ['B'], 'EQ': ['E'], 'AP': ['A'], 'CY': ['C'], 'HI': ['I', 'H'], 'JN': ['J'], 'IU': ['I'], 'IM': ['I'], 'GC': ['C', 'G'], 'BI': ['I', 'B'], 'AT': ['A']}
Thank you for the example and help so far, I'm uploading 2 screenshots from the tables to give you a better understanding of what I'm trying to compare. Table1 is to the left and Table2 to the right.
Can you share (a portion of) the actual data. At first sight I don't see many matches in the screenshot you provided.
Uploaded the files in reply below.
Since column1, Navn, can contain multiple terms/tags/tokens that you want to match; the SQL will get complicated enough that I don't know if every data store that ArcGIS supports will work. I think working within Python will be the way to go.
Instead of a screenshot, can you export those to a text or CSV file and attach it? As helpful as a screenshot can be, it is much easier to work with actual data than a picture of data.
Whether Python or SQL, you have a lot of conditions you are trying to work with. For example, what does a comma and parentheses mean in column1? Do commas separate multiple terms? If so, do both terms need to be present to match column2? What about parentheses? What about the one row that is in quotes, is that one term or multiple terms?
Will the new output, let's call it table3, just look like?
column1 | column2 |
---|---|
Mountain | Mountainrange |
Mountain | Westmountainrange |
pass | Donnors Pass |
pass | Northwest Passage |
pass | Dyatlov Pass |
The new output should show something like that yes, the text that is separated by comma in column1 is a diffraction of a word, that must both be checked against column2, as 1 object. So i need to have the code to treat comma as a delimiter to jump to next keyword. The ones in parenthesis is just a short explanation in other languages, not important for the actual search. Column1 is a dictionary, while column2 is a placename base for a large area. I'm uploading the files here, but only a portion(50 000) of the 400 000 in table2 since its too big.
I realize part of our conversation is being lost in translation, so please bear with my questions....
Looking at Ordbok.csv, I assume ORDLISTE_–_GLOSSARY is the field/column containing terms for matching. Earlier, you stated that "both [must] be checked against column2, as 1 object." It is that last part, "as 1 object," that is throwing me. I think an example will be best for my question. Looking at Ordbok.csv:
OBJECTID | ORDLISTE_–_GLOSSARY |
---|---|
56 | bru, brua |
57 | bru, brui |
58 | brun, brunen |
Looking at ObjectID 56, is the match "bru OR brua" or "bru AND brua". I was assuming the first, but your "as 1 object" statement made me question my assumption.
If I look at the data I kinda feel that the objective might be comparing "SNAVN" from the file Navn.csv against the field "field" from the file Ordbok.csv. That is just a guess and I don't we should be guessing...
There will be multiple hits either way and one should define what to do in that case. Lot of question marks before we can continue to write some code.