Join Field - Incredibly Slow

15143
13
Jump to solution
10-06-2015 10:43 PM
OwenEarley
Occasional Contributor III

I have a table containing some calculated results and a feature class with 4 text fields that I want to bring into the results table based on a shared ID field (Long). Both contain just fewer than 60,000 rows/features and both have indexes on the ID field used for the join.

If I do this using the Join Field tool in ArcMap 10.2.2 the process takes over 50 minutes. This is insane for a join on ~60k rows.

As a comparison I can do the process manually in ArcMap in under 4 minutes by creating new fields, joining data and calculate fields.

If I use the Add Join tool to join all fields the process takes 11 seconds. However, this brings in a whole heap of fields that I am not interested in.

join-field-slow.png

Additional notes: The process is to be run from within a .Net application so the manual process is not an option. The PC is an 8 core machine, the CPU usage sits at around 12% and there is about 10GB of available memory.

Why is Join Field so incredibly slow?

1 Solution

Accepted Solutions
BruceHarold
Esri Regular Contributor

Hi

Try this tool, which from memory works in both 10.x and Pro:

http://www.arcgis.com/home/item.html?id=da1540fb59d84b7cb02627856f65a98d

Regards

View solution in original post

13 Replies
curtvprice
MVP Esteemed Contributor

Join Field is slow. I do not have an explanation.

Why not run the add field tool 4x, add join, and calculate within .NET?

In Python I have a little function that uses a dictionary to copy the data and it is scary fast by comparison.

curtvprice
MVP Esteemed Contributor

I shoud have mentioned, the reason Add Join is instantaneous (11 sec) is it doesn't copy any data, it just sets up a dynamic link. It still can be slow when you try to copy the data (especially if the join fields aren't indexed), say if run Copy Features on the table with the join active.

But not as slow as Join Field!

0 Kudos
DanPatterson_Retired
MVP Emeritus

Owen, I presume you saw the performance tips section in this link? Essentials of joining tables—Help | ArcGIS for Desktop

0 Kudos
JustinMeyers
New Contributor III

I would set up a model.  add fields, set up a join, then a calculate.  you can have numerous drop downs for what fields you want to claculate based on, and have your join fields set up too.  a little work up friont, but will save you time in the end.  Join field tool has always been slow.  if you really want to use it, hide all the other fields, or delete them going into the tool so you only have the fields you want.

DuncanHornby
MVP Notable Contributor

I've used this tool without issues but that has been within ArcMap. You make a vague reference to running this tool from within a .Net application, I would image the bottle neck is there. Without seeing the underlying code I don't think anyone can help you here.

0 Kudos
OwenEarley
Occasional Contributor III

The .Net code just passes parameters to the geoprocessor object to run the tool. The delay is in the running of the tool itself.

        public Geoprocessor gp { get; set; }
        public ITrackCancel CancelTracker { get; set; }

        ...  

        public void GpAddJoin(string layer, string keyField, string joinTable, string joinField, bool keepAll = false)
        {
            var tool = new ESRI.ArcGIS.DataManagementTools.AddJoin();
            tool.in_layer_or_view = layer;
            tool.in_field = keyField;
            tool.join_table = joinTable;
            tool.join_field = joinField;
            tool.join_type = (keepAll) ? "KEEP_ALL" : "KEEP_COMMON";
            gp.Execute(tool, CancelTracker);
        }
0 Kudos
DuncanHornby
MVP Notable Contributor

Hmmm.. Do the fields that you are joining with have indices, that can can improve performance significantly?

0 Kudos
BruceHarold
Esri Regular Contributor

Hi

Try this tool, which from memory works in both 10.x and Pro:

http://www.arcgis.com/home/item.html?id=da1540fb59d84b7cb02627856f65a98d

Regards

curtvprice
MVP Esteemed Contributor

Bruce, that script tool has a really interesting approach, you add the fields and then populate the values using an update cursor with a generator to limit how much memory gets used. The tool uses arcpy.da so it should work with 10.1 SP1 and later. You may want to add that tidbit to the info about the tool.

Now that pandas will be shipped with 10.4 it will be interesting if some of its lightning fast table handling can be leveraged to create similar supercharged tools for common operations that take too long!

0 Kudos