Join Field - Incredibly Slow

14995
13
Jump to solution
10-06-2015 10:43 PM
OwenEarley
Occasional Contributor III

I have a table containing some calculated results and a feature class with 4 text fields that I want to bring into the results table based on a shared ID field (Long). Both contain just fewer than 60,000 rows/features and both have indexes on the ID field used for the join.

If I do this using the Join Field tool in ArcMap 10.2.2 the process takes over 50 minutes. This is insane for a join on ~60k rows.

As a comparison I can do the process manually in ArcMap in under 4 minutes by creating new fields, joining data and calculate fields.

If I use the Add Join tool to join all fields the process takes 11 seconds. However, this brings in a whole heap of fields that I am not interested in.

join-field-slow.png

Additional notes: The process is to be run from within a .Net application so the manual process is not an option. The PC is an 8 core machine, the CPU usage sits at around 12% and there is about 10GB of available memory.

Why is Join Field so incredibly slow?

13 Replies
DanPatterson_Retired
MVP Emeritus

remembering that Pandas is just Numpy all dressed up

BruceHarold
Esri Regular Contributor

Yes, arcpy.da.ExtendTable is already there and suitable for the join field use case.

My tool has an issue if your key field is unicode, byte order drives the sort used to join whereas the collation order of a language may be different.  I didn't pursue fixing that - it looked hard.  Maybe a temporary SQLite database could be used.

0 Kudos
OwenEarley
Occasional Contributor III

This tool is very impressive, joining the fields in just over 12 seconds.

For my original issue, I ended up changing the way that the data was being processed to avoid the join field geoprocessing tool altogether. However, I am marking Bruce's answer as correct as it is most likely to help others that are searching for a solution to this problem.

MohammadrezaNikfal1
Occasional Contributor

There is a huge difference between 3 tools. To me, I used "AddJoin_management" in python. It is working very well. But the secret is input data. Using file-base table as the main Layer is perfect, however, when we use a Geo-database table, the performance is dropping dramatically! 

fishNetMinimumPointsDistance = os.path.join(arcpy.env.scratchFolder, 'fishNetMinimumPointsDistance.dbf')
fishNetPointMapToLayer = arcpy.mapping.Layer(fishNetPointMap)
arcpy.AddJoin_management(fishNetPointMapToLayer,
"FID", fishNetMinimumPointsDistance, "IN_FID", "KEEP_ALL")

0 Kudos