Join takes too long?

542
2
01-13-2013 06:22 PM
BKS
by
Occasional Contributor
I've been using ESRI products since ARC/INFO 3. 

I recall being able to perform a join or setup relationships between tables using different techniques.  For example indicating the data in the related column is sorted and thus you could use the ORDERED option.  This naturally sped up the join or relate a great deal.

I'm using the following python code...

arcpy.JoinField_management(pointFC, "joinfld_name", pointattributes, "joinfld_name")

to join a table of 200,000 records to a point FC of exactly the same number of records (i.e. 1:1 match).

This join takes more than 25 minutes.  I suspect it takes so long because it's not performing any intelligent searching on the join column of the "pointAttributes" table.  Instead it's just doing a brute force search.

I've tried creating an attribute index on both columns but this didn't seem to help.

All of the data is in the same fGDB.

I'm running Windows 7 with ARcGIS 10.0 on a relatively new laptop. 

Thanks for any help.

Regards, Brian
0 Kudos
2 Replies
RichardFairhurst
MVP Honored Contributor
I've been using ESRI products since ARC/INFO 3. 

I recall being able to perform a join or setup relationships between tables using different techniques.  For example indicating the data in the related column is sorted and thus you could use the ORDERED option.  This naturally sped up the join or relate a great deal.

I'm using the following python code...

arcpy.JoinField_management(pointFC, "joinfld_name", pointattributes, "joinfld_name")

to join a table of 200,000 records to a point FC of exactly the same number of records (i.e. 1:1 match).

This join takes more than 25 minutes.  I suspect it takes so long because it's not performing any intelligent searching on the join column of the "pointAttributes" table.  Instead it's just doing a brute force search.

I've tried creating an attribute index on both columns but this didn't seem to help.

All of the data is in the same fGDB.

I'm running Windows 7 with ARcGIS 10.0 on a relatively new laptop. 

Thanks for any help.

Regards, Brian


JoinField_Management is permanantly and directly physically altering the input table and writing it to disk.  That will take a long time, indexed or not.  If you are only wanting a join like you would create through the context menus in the Desktop and expecting that kind of performance, then you need to use AddJoin_management, not JoinField_Management.  The AddJoin_Management tool creates an in memory join that is not written to disk unless you copy or export it.  In memory processes are many orders of magnitude faster than any process that writes to disk.  I hope this helps.
0 Kudos
BKS
by
Occasional Contributor
Hello Richard - thanks for response.

I'm aware of the in-memory option.  Based on your input I suspect I'll have to go that route.  Thought perhaps I missed something.

I must say I am disappointed with the lack of performance enhancing options on the JoinField tool that used to exist 2 decades previous in what is now known as Workstation Arc/Info.  This join should take 2-3 mins if I the tool used an intelligent search algorithm (even as simple as a binary search if the related column is sorted).

Yes the disk I/O accounts to re-rewrite the table accounts for some of the additional processing time but my guess would be that the majority of time is spent searching for the matching record if it's not using any sort of intelligent search.

The same un-informed technique will happen in memory I presume (just faster). 

Thanks again.

Cheers, b
0 Kudos