I have a table containing some calculated results and a feature class with 4 text fields that I want to bring into the results table based on a shared ID field (Long). Both contain just fewer than 60,000 rows/features and both have indexes on the ID field used for the join.
If I do this using the Join Field tool in ArcMap 10.2.2 the process takes over 50 minutes. This is insane for a join on ~60k rows.
As a comparison I can do the process manually in ArcMap in under 4 minutes by creating new fields, joining data and calculate fields.
If I use the Add Join tool to join all fields the process takes 11 seconds. However, this brings in a whole heap of fields that I am not interested in.
Additional notes: The process is to be run from within a .Net application so the manual process is not an option. The PC is an 8 core machine, the CPU usage sits at around 12% and there is about 10GB of available memory.
Why is Join Field so incredibly slow?
Solved! Go to Solution.
remembering that Pandas is just Numpy all dressed up
Yes, arcpy.da.ExtendTable is already there and suitable for the join field use case.
My tool has an issue if your key field is unicode, byte order drives the sort used to join whereas the collation order of a language may be different. I didn't pursue fixing that - it looked hard. Maybe a temporary SQLite database could be used.
This tool is very impressive, joining the fields in just over 12 seconds.
For my original issue, I ended up changing the way that the data was being processed to avoid the join field geoprocessing tool altogether. However, I am marking Bruce's answer as correct as it is most likely to help others that are searching for a solution to this problem.
There is a huge difference between 3 tools. To me, I used "AddJoin_management" in python. It is working very well. But the secret is input data. Using file-base table as the main Layer is perfect, however, when we use a Geo-database table, the performance is dropping dramatically!
fishNetMinimumPointsDistance = os.path.join(arcpy.env.scratchFolder, 'fishNetMinimumPointsDistance.dbf')
fishNetPointMapToLayer = arcpy.mapping.Layer(fishNetPointMap)
arcpy.AddJoin_management(fishNetPointMapToLayer, "FID", fishNetMinimumPointsDistance, "IN_FID", "KEEP_ALL")