Show distinct tuples regardless of column order

1789
10
Jump to solution
10-31-2018 12:10 PM
JeffThomasILM
Occasional Contributor II

I just added a corresponding request to ArcGIS ideas:

Option to eliminate pairs from Generate Near Table 

But in the meantime, I'm wondering if anyone can help me figure out how to find duplicate pairs of attribute values from separate columns, regardless of column order, by way of SQL and/or ArcPy.

The problem (described in the ArcGIS Idea above) is that the Generate Near Table tool creates pairs of distance values that are identical, as uniquely identified by the pairing of IN_FID and NEAR_FID attributes. That is, the unique distance between FID "a" and FID "b" occurs twice: "a-b" and "b-a" in the IN_FID and NEAR_FID columns, respectively. I only want one value from that pair, otherwise it affects statistics like the count and average of distances. I've found that it is possible to find these pairs with SQL:

mysql - Show distinct tuples regardless of column order - Stack Overflow (I stole this question title for my post, obviously )

But of course, you can't easily break out of the WHERE clause in ArcGIS. Maybe this is possible in ArcPy, but I'm not sure how to adapt that SQL to work in ArcPy. If anyone could help me with that, it would be a boon to doing not just this specific SQL query but how to do any advanced SQL query in ArcPy (or other methods?) that's not otherwise possible within ArcGIS.

Note: the near tables I've created are in a file GDB, but I do have the option of putting them in SDE if that helps. I do not, however, have the option of accessing SDE outside of ArcGIS, e.g., through SQL Server Management Studio. I'd like to learn how to do advanced SQL queries (that is, SQL involving more than just WHERE on a single layer) in ArcGIS regardless of data source. If it's possible at all.

Thank you!

0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

If you want to use ArcPy to process the output table from Generate Near Table, the following code should work:

import arcpy

tbl = # path to table from Generate Near Table
uniq_tbl = # path to unique table of Generate Near Table results

tv, = arcpy.MakeTableView_management(tbl, "near_results")
with arcpy.da.SearchCursor(tv, ["OID@", "IN_FID", "NEAR_FID"]) as cur:
    uniq_oids = {frozenset((in_fid, near_fid)):oid for oid, in_fid, near_fid in cur}
tv.setSelectionSet("NEW", uniq_oids.values())
arcpy.CopyRows_management(tv, uniq_tbl)

View solution in original post

10 Replies
JeffThomasILM
Occasional Contributor II

To answer my own question, here's the ArcPy and SQL to eliminate the "a-b" "b-a" pairs:

sde = r'sdeconnectionfile.sde'

sde_conn = arcpy.ArcSDESQLExecute(sde)

sql = """SELECT t1.IN_FID, t1.NEAR_FID, t1.NEAR_DIST
FROM sdeowner.NEARTABLE t1
EXCEPT
SELECT t1.IN_FID, t1.NEAR_FID, t1.NEAR_DIST
FROM sdeowner.NEARTABLE t1
INNER JOIN sdeowner.NEARTABLE t2
ON t1.IN_FID = t2.NEAR_FID
AND t1.NEAR_FID = t2.IN_FID
AND t1.IN_FID > t1.NEAR_FID"""

sde_conn.execute(sql)

Note that NEARTABLE is the same table every time; I'm inner joining it to itself. Also note that this is on SQL Server; other DBMSs may require different SQL.

Now, this results in an ugly list of IN_FIDs, NEAR_FIDs, and NEAR_DISTs in the Python window. All I care about is getting the average of NEAR_DIST, hence why I've gone to all this trouble to eliminate the a-b, b-a pairs which would affect the result. That can be done with this SQL:

SELECT AVG(t3.NEAR_DIST)
FROM 
(SELECT t1.IN_FID, t1.NEAR_FID, t1.NEAR_DIST
FROM sdeowner.NEARTABLE t1
EXCEPT
SELECT t1.IN_FID, t1.NEAR_FID, t1.NEAR_DIST
FROM sdeowner.NEARTABLE t1
INNER JOIN sdeowner.NEARTABLE t2
ON t1.IN_FID = t2.NEAR_FID
AND t1.NEAR_FID = t2.IN_FID
AND t1.IN_FID > t1.NEAR_FID) t3
JoshuaBixby
MVP Esteemed Contributor

If you want to use ArcPy to process the output table from Generate Near Table, the following code should work:

import arcpy

tbl = # path to table from Generate Near Table
uniq_tbl = # path to unique table of Generate Near Table results

tv, = arcpy.MakeTableView_management(tbl, "near_results")
with arcpy.da.SearchCursor(tv, ["OID@", "IN_FID", "NEAR_FID"]) as cur:
    uniq_oids = {frozenset((in_fid, near_fid)):oid for oid, in_fid, near_fid in cur}
tv.setSelectionSet("NEW", uniq_oids.values())
arcpy.CopyRows_management(tv, uniq_tbl)
JeffThomasILM
Occasional Contributor II

Joshua,

Thank you again for helping me. This should be a slick alternative to running a SQL query. I noticed an accidental (?) comma after tv on line 6, which I fixed. But then I got a different error:

Traceback (most recent call last):
  File "<string>", line 4, in <module>  (this is line 9 in your post above)
AttributeError: 'Result' object has no attribute 'setSelectionSet'

My only thought on this is setSelectionSet can only be run on a layer, not on a table view? I don't know why that would be. I'm sorry my Python skills aren't (yet) strong enough to troubleshoot this.

Thanks,

Jeff

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

The comma is intentional and needs to be there.  Another way to write that line is:

tv = arcpy.MakeTableView_management(tbl, "near_results")[0]

or

tv = arcpy.MakeTableView_management(tbl, "near_results").getOutput(0)
JeffThomasILM
Occasional Contributor II

Goes to show how little I know! Thanks for clarifying and even offering equivalent options. That helps me understand what's happening. But... no matter which of the 3 options I use for that line, I get the same error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: 'in_table' is not a table or a featureclass

If I remove the comma, that error goes away but then I get the "setSelectionSet" error mentioned above. 

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

The error you are getting when you remove the comma is because Result—ArcPy classes | ArcGIS Desktop objects don't have a setSelectionSet method.  Most geoprocessing tools return a Result object, and you need to retrieve the ArcPy object from the Result object.  If you remove the comma, all you have is the Result object, not the feature layer or table view itself.

In terms of your runtime error, what exactly does the "in_table" string look like?

0 Kudos
JeffThomasILM
Occasional Contributor II

OK, I just ran it in PyScripter and it worked fine. (D'oh, should've tried that before.) I also just tried it from the Python window in ArcMap 10.6 and it worked fine. My folly was trying to run it from the Python window in Pro. Must be a syntax change in Pro? Python 2.7 vs. 3.0?

In Pro, I've tried several variations for the in_table variable:

tbl = "C:\\Users\\thomasj\\Desktop\\Current\\NearTable1.dbf"

tbl = "C:\\Users\\thomasj\\Desktop\\Current\\Current.gdb\\NearTable1"

tbl = r"C:\Users\thomasj\Desktop\Current\Current.gdb\NearTable1"

tbl = "NearTable1" (dragged and dropped from Contents in Pro)

Always resulting in the same "in_table" error.

You are awesome; thank you for all your help!

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

the setSelectionSet syntax along with a few other ArcPy methods changed their syntax between ArcMap and ArcGIS Pro.  I should have stated my code was for ArcMap.

Please include the traceback form the error message in Pro.

0 Kudos
JeffThomasILM
Occasional Contributor II

I should know better than to assume anything will work in Pro without qualification.  I'm sorry, I'm not sure what to tell you about the traceback from Pro other than:

Traceback (most recent call last):
  File "<string>", line 7, in <module>
RuntimeError: 'in_table' is not a table or a featureclass‍‍‍

Line 7 in the error message is the same line 7 from your original script above.

0 Kudos