Join

CCWeedcontrol · ‎02-02-2018

I am doing a basic join from a table to feature class, it takes about 6 mins.

How can up make this faster?

import arcpy, fnmatch
from arcpy import env
from datetime import datetime as d
startTime = d.now()

env.overwriteOutput = 1
env.workspace = r"C:\Temp"
table = r"C:\Temp\PermitsIssued"  
  
shapefile = "TaxParcels"  

arcpy.MakeFeatureLayer_management(shapefile, "In_memory\parcelLyr")  

#Join table to feature layer  
arcpy.AddJoin_management("In_memory\parcelLyr", "PIN_ID", table, "PINNum", "KEEP_COMMON")

#Export joined layer to new feature clas  
arcpy.FeatureClassToFeatureClass_conversion("In_memory\parcelLyr", r"C:\Temp", "ParcelPermits.shp") 

print 'Done'

try:
    print '(Elapsed time: ' + str(d.now() - startTime)[:-3] + ')'

except Exception, e:
    # If an error occurred, print line number and error message
    import traceback, sys
    tb = sys.exc_info()[2]
    print "Line %i" % tb.tb_lineno
    print e.message‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

ChrisSnyder · ‎02-02-2018

Use a dictionary and update cursor (which will be much faster than the classic join/calc or join/export)

View solution in original post

DanPatterson_Retired · ‎02-02-2018

The join is going to depend on the file size as a minimum.

What is your software? Map? PRO?

How much memory do you have?

64 bit enabled if using Map?

Are you monitoring system resources when you do the run?

other applications sucking up memory? (ie browsers etc)

CCWeedcontrol · ‎02-02-2018

ArcMap 10.4.1

32 gb ram

I just have 32 bit - i use to have 64 bit but i was having conflicts.

my system resources are cup 9%, Memory is 14 gb. with basic applications (browser, Microsoft)

When running the join script my cpu and memory don't change much.

DanPatterson_Retired · ‎02-02-2018

So if you are using arcmap, you are limited to 4 Gb, so all your extra ram is wasted basically

If you are working with locally stored data on a modern machine with one of those fancy new drives then your bottleneck may just be the size of the data you are joining and your software.

JamesMacKay3 · ‎02-02-2018

Have you built indexes on the fields used in the join?

ChrisSnyder · ‎02-02-2018

Use a dictionary and update cursor (which will be much faster than the classic join/calc or join/export)

CCWeedcontrol · ‎02-05-2018

I did read some post about using dictionary and update cursor but i couldn't figure out how to set it up with my data. The ones i found have multiple fields that were joined.

Do you have a basic example of how to us dictionary and update cursor to do a join.

JoshuaBixby · ‎02-03-2018

Start by not using shape files, seriously. The performance of shape files continues to degrade in newer releases of ArcMap, even ArcGIS Pro. There are certain geoprocessing operations (most in the Overlay toolset) where involving shape files can add an order of magnitude to the run times.

My suggestion is move data into file geodatabases and make sure you have indexed the fields you are joining on.

ChrisSnyder · ‎02-05-2018

Here's a general example of dictionary/cursor method. Could certainly be made more elegant depending on what you are trying to do....

import arcpy
mainTbl = r"C:\temp\blah.shp"
joinTbl = r"C:\temp\lookup.dbf"
fieldsToJoinList = ["FIELD1","FIELD2","FIELD3"]
joinField = "COMMON_ID"
arcpy.AddField_management(mainTbl, "FIELD1", "BLAH")
arcpy.AddField_management(mainTbl, "FIELD2", "BLAH")
arcpy.AddField_management(mainTbl, "FIELD3", "BLAH")
joinTblDict = {r[0]:r[1:] for r in arcpy.da.SearchCursor(joinTbl, [joinField]+fieldsToJoinList)}
updateRows = arcpy.da.UpdateCursor(mainTbl, [joinField]+fieldsToJoinList)
for updateRow in updateRows:
    if updateRow[0] in joinTblDict:
        updateRow[1] = joinTblDict[updateRow[0]][1]
        updateRow[2] = joinTblDict[updateRow[0]][2]
        updateRow[3] = joinTblDict[updateRow[0]][3]
    updateRows.updateRow(updateRow)
del updateRow, updateRows

CCWeedcontrol · ‎02-05-2018

Thank you for sample. I have tried modifying what you posted to my code but after i run it the fields don't get updated. I don't get no error. Any ideas on what i am doing wrong?

import arcpy

mainTbl = "C:/Temp/BP.gdb/TaxParcels"
joinTbl = "C:/Temp/BP.gdb/Permits"
fieldsToJoinList = ["AppSubmitted","PermitIssued","Classification"]

#Addfields
fields = [
    ("AppSubmitted","Date","","","","","NULLABLE","NON_REQUIRED",""),
    ("PermitIssued","Date","","","","","NULLABLE","NON_REQUIRED",""),
    ("PermitNum","Text","30","","","","NULLABLE","NON_REQUIRED",""),
    ("ParcelNum","Text","24","0","","","NULLABLE","NON_REQUIRED",""),
    ("Classification","Text","4000","","","","NULLABLE","NON_REQUIRED","")
]
for field in fields:
    arcpy.AddField_management(*(mainTbl,) + field)
        
joinTblDict = {r[0]:r[1:] for r in arcpy.da.SearchCursor(joinTbl, ["PIN_Num"]+fieldsToJoinList)}
updateRows = arcpy.da.UpdateCursor(mainTbl, ["PIN_ID"] +fieldsToJoinList)
for updateRow in updateRows:
    if updateRow[0] in joinTblDict:
        updateRow[1] = joinTblDict[updateRow[0]][1]
        updateRow[2] = joinTblDict[updateRow[0]][2]
        updateRow[3] = joinTblDict[updateRow[0]][3]                                              
    updateRows.updateRow(updateRow)
del updateRow, updateRows
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍