topic Re: Find duplicate records in SDO_GEOMETRY table usig SQL in Data Management Questions

Find duplicate records in SDO_GEOMETRY table usig SQL

OvidioRivero — Fri, 03 Dec 2010 18:04:11 GMT

Hello All,

I am trying to find duplicate records from on SDO_geometry table. I am finding it is easy when I am looking for unique combinations of fields but I am having problems selecting identical geometries. The closest I have have been able to do it is with a script like the one below:

Select PROVNAME,DBANAME,FRN,TRANSTECH,SPECTRUM,MAXADUP,MAXADDOWN,TYPICUP,TYPICDOWN,STATE_CODE, SDO_GEOM.SDO_AREA(shape,.005),count(*) COUNT
from BB_SERVICE_WIRELESS_V2A
where FCC_SUBMISSION_CYCLE <>'2010-SPRING'
GROUP BY PROVNAME, DBANAME, FRN, TRANSTECH, SPECTRUM, MAXADUP, MAXADDOWN, TYPICUP, TYPICDOWN, STATE_CODE, SDO_GEOM.SDO_AREA(shape,.005) having count(1) >1
order by STATE_CODE;

Basically select and group by all columns including the Area from the shape column. The problem with this approach is that it is very slow to run and I am checking only the area. It is unlikely, but different geometries can have the exact same area. Can anyone suggest a more efficient way to get all exact duplicates in this table?

Thanks,

Re: Find duplicate records in SDO_GEOMETRY table usig SQL

VinceAngelo — Fri, 03 Dec 2010 19:29:58 GMT

I look for duplicates all the time, but I use a perfect hashing function (aka "digest") to
calculate a checksum (e.g., md5sum) across a binary stream formed by concatenating
the data from a list of columns. The trick is, Oracle doesn't have a perfect hashing
function natively, so you'd need to use Java to achieve the same effect.

Duplicate areas are much more frequent than duplicate geometries, especially
in the realm of sliver polygons.

Using all the vertices can cause false negatives if you don't have a strict rule about
rotating the rings before passing them through the digest algorithm (e.g., the leftmost
vertex with the greatest Y value is the starting point). [Hmm, that sounds like a fun
piece of code to write...]

- V

Re: Find duplicate records in SDO_GEOMETRY table usig SQL

OvidioRivero — Fri, 03 Dec 2010 19:41:01 GMT

Hi Vince,

Can you point to any code sample that does what you describe?

Thanks,

Ovidio

Re: Find duplicate records in SDO_GEOMETRY table usig SQL

VinceAngelo — Fri, 03 Dec 2010 20:00:40 GMT

Nothing in Java, or in Oracle. 'se_toolkit' uses a number of different digest providers
from open-source libraries (se_tools/digest.c), and calculates them across a stream as
part of the Digest DAT class (dat/dat_compute.c). The ring orienter isn't written yet.

- V