Join

1635
19
Jump to solution
02-02-2018 01:49 PM
CCWeedcontrol
Occasional Contributor III

I am doing a basic join from a table to  feature class, it takes about 6 mins.

How can up make this faster?

import arcpy, fnmatch
from arcpy import env
from datetime import datetime as d
startTime = d.now()

env.overwriteOutput = 1
env.workspace = r"C:\Temp"
table = r"C:\Temp\PermitsIssued"  
  
shapefile = "TaxParcels"  

arcpy.MakeFeatureLayer_management(shapefile, "In_memory\parcelLyr")  

#Join table to feature layer  
arcpy.AddJoin_management("In_memory\parcelLyr", "PIN_ID", table, "PINNum", "KEEP_COMMON")

#Export joined layer to new feature clas  
arcpy.FeatureClassToFeatureClass_conversion("In_memory\parcelLyr", r"C:\Temp", "ParcelPermits.shp") 

print 'Done'

try:
    print '(Elapsed time: ' + str(d.now() - startTime)[:-3] + ')'

except Exception, e:
    # If an error occurred, print line number and error message
    import traceback, sys
    tb = sys.exc_info()[2]
    print "Line %i" % tb.tb_lineno
    print e.message
0 Kudos
1 Solution

Accepted Solutions
ChrisSnyder
Regular Contributor III

Use a dictionary and update cursor (which will be much faster than the classic join/calc or join/export)

View solution in original post

0 Kudos
19 Replies
DanPatterson_Retired
MVP Emeritus

The join is going to depend on the file size as a minimum.

What is your software? Map? PRO? 

How much memory do you have?

64 bit enabled if using Map?

Are you monitoring system resources when you do the run?

other applications sucking up memory? (ie browsers etc)

CCWeedcontrol
Occasional Contributor III

ArcMap 10.4.1

32 gb ram

I just have 32 bit - i use to have 64 bit but i was having conflicts.

my system resources are cup 9%, Memory is 14 gb. with basic applications (browser, Microsoft)

When running the join script my cpu and memory don't change much.

0 Kudos
DanPatterson_Retired
MVP Emeritus

So if you are using arcmap, you are limited to 4 Gb, so all your extra ram is wasted basically

If you are working with locally stored data on a modern machine with one of those fancy new drives then your bottleneck may just be the size of the data you are joining and your software.

0 Kudos
JamesMacKay3
Occasional Contributor

Have you built indexes on the fields used in the join?

ChrisSnyder
Regular Contributor III

Use a dictionary and update cursor (which will be much faster than the classic join/calc or join/export)

0 Kudos
CCWeedcontrol
Occasional Contributor III

I did read some post about using dictionary and update cursor but i couldn't figure out how to set it up with my data. The ones i found have multiple fields that were joined.

Do you have a basic example of how to us dictionary and update cursor to do a join.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Start by not using shape files, seriously.  The performance of shape files continues to degrade in newer releases of ArcMap, even ArcGIS Pro.  There are certain geoprocessing operations (most in the Overlay toolset) where involving shape files can add an order of magnitude to the run times.

My suggestion is move data into file geodatabases and make sure you have indexed the fields you are joining on.

ChrisSnyder
Regular Contributor III

Here's a general example of dictionary/cursor method. Could certainly be made more elegant depending on what you are trying to do.... 


import arcpy
mainTbl = r"C:\temp\blah.shp"
joinTbl = r"C:\temp\lookup.dbf"
fieldsToJoinList = ["FIELD1","FIELD2","FIELD3"]
joinField = "COMMON_ID"
arcpy.AddField_management(mainTbl, "FIELD1", "BLAH")
arcpy.AddField_management(mainTbl, "FIELD2", "BLAH")
arcpy.AddField_management(mainTbl, "FIELD3", "BLAH")
joinTblDict = {r[0]:r[1:] for r in arcpy.da.SearchCursor(joinTbl, [joinField]+fieldsToJoinList)}
updateRows = arcpy.da.UpdateCursor(mainTbl, [joinField]+fieldsToJoinList)
for updateRow in updateRows:
    if updateRow[0] in joinTblDict:
        updateRow[1] = joinTblDict[updateRow[0]][1]
        updateRow[2] = joinTblDict[updateRow[0]][2]
        updateRow[3] = joinTblDict[updateRow[0]][3]                                              
    updateRows.updateRow(updateRow)
del updateRow, updateRows

0 Kudos
CCWeedcontrol
Occasional Contributor III

Thank you for sample. I have tried modifying what you posted to my code but after i run it the fields don't get updated. I don't get no error. Any ideas on what i am doing wrong?

import arcpy

mainTbl = "C:/Temp/BP.gdb/TaxParcels"
joinTbl = "C:/Temp/BP.gdb/Permits"
fieldsToJoinList = ["AppSubmitted","PermitIssued","Classification"]

#Addfields
fields = [
    ("AppSubmitted","Date","","","","","NULLABLE","NON_REQUIRED",""),
    ("PermitIssued","Date","","","","","NULLABLE","NON_REQUIRED",""),
    ("PermitNum","Text","30","","","","NULLABLE","NON_REQUIRED",""),
    ("ParcelNum","Text","24","0","","","NULLABLE","NON_REQUIRED",""),
    ("Classification","Text","4000","","","","NULLABLE","NON_REQUIRED","")
]
for field in fields:
    arcpy.AddField_management(*(mainTbl,) + field)
        
joinTblDict = {r[0]:r[1:] for r in arcpy.da.SearchCursor(joinTbl, ["PIN_Num"]+fieldsToJoinList)}
updateRows = arcpy.da.UpdateCursor(mainTbl, ["PIN_ID"] +fieldsToJoinList)
for updateRow in updateRows:
    if updateRow[0] in joinTblDict:
        updateRow[1] = joinTblDict[updateRow[0]][1]
        updateRow[2] = joinTblDict[updateRow[0]][2]
        updateRow[3] = joinTblDict[updateRow[0]][3]                                              
    updateRows.updateRow(updateRow)
del updateRow, updateRows
0 Kudos