Select to view content in your preferred language

Find duplicate records in SDO_GEOMETRY table usig SQL

1686
3
12-03-2010 10:04 AM
OvidioRivero
Emerging Contributor
Hello All,

I am trying to find duplicate records from on  SDO_geometry table.  I am finding it is easy when I am looking for unique combinations of fields but I am having problems selecting identical geometries.  The closest I have have been able to do it is with a script like the one below:

Select PROVNAME,DBANAME,FRN,TRANSTECH,SPECTRUM,MAXADUP,MAXADDOWN,TYPICUP,TYPICDOWN,STATE_CODE,  SDO_GEOM.SDO_AREA(shape,.005),count(*) COUNT
from BB_SERVICE_WIRELESS_V2A
where FCC_SUBMISSION_CYCLE <>'2010-SPRING'
GROUP BY PROVNAME, DBANAME, FRN, TRANSTECH, SPECTRUM, MAXADUP, MAXADDOWN, TYPICUP, TYPICDOWN, STATE_CODE, SDO_GEOM.SDO_AREA(shape,.005)  having count(1) >1
order by STATE_CODE;

Basically select and group by all columns including the Area from the shape column.   The problem with this approach is that it is very slow to run and I am checking only the area.  It is unlikely, but different geometries can have the exact same area. Can anyone suggest a more efficient way to get all exact duplicates in this table?

Thanks,
0 Kudos
3 Replies
VinceAngelo
Esri Esteemed Contributor
I look for duplicates all the time, but I use a perfect hashing function (aka "digest") to
calculate a checksum (e.g., md5sum) across a binary stream formed by concatenating
the data from a list of columns. The trick is, Oracle doesn't have a perfect hashing
function natively, so you'd need to use Java to achieve the same effect.

Duplicate areas are much more frequent than duplicate geometries, especially
in the realm of sliver polygons.

Using all the vertices can cause false negatives if you don't have a strict rule about
rotating the rings before passing them through the digest algorithm (e.g., the leftmost
vertex with the greatest Y value is the starting point). [Hmm, that sounds like a fun
piece of code to write...]

- V
0 Kudos
OvidioRivero
Emerging Contributor
Hi Vince,

Can you point to any code sample that does what you describe?

Thanks,

Ovidio
0 Kudos
VinceAngelo
Esri Esteemed Contributor
Nothing in Java, or in Oracle.  'se_toolkit' uses a number of different digest providers
from open-source libraries (se_tools/digest.c), and calculates them across a stream as
part of the Digest DAT class (dat/dat_compute.c).  The ring orienter isn't written yet.

- V
0 Kudos