<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Find duplicate records in SDO_GEOMETRY table usig SQL in Data Management Questions</title>
    <link>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596349#M33629</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I look for duplicates all the time, but I use a perfect hashing function (aka "digest") to &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;calculate a checksum (e.g., md5sum) across a binary stream formed by concatenating &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;the data from a list of columns. The trick is, Oracle doesn't have a perfect hashing &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;function natively, so you'd need to use Java to achieve the same effect.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Duplicate areas are much more frequent than duplicate geometries, especially&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;in the realm of sliver polygons. &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Using all the vertices can cause false negatives if you don't have a strict rule about&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;rotating the rings before passing them through the digest algorithm (e.g., the leftmost&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;vertex with the greatest Y value is the starting point). [Hmm, that sounds like a fun&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;piece of code to write...]&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;- V&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Fri, 03 Dec 2010 19:29:58 GMT</pubDate>
    <dc:creator>VinceAngelo</dc:creator>
    <dc:date>2010-12-03T19:29:58Z</dc:date>
    <item>
      <title>Find duplicate records in SDO_GEOMETRY table usig SQL</title>
      <link>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596348#M33628</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hello All, &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;I am trying to find duplicate records from on&amp;nbsp; SDO_geometry table.&amp;nbsp; I am finding it is easy when I am looking for unique combinations of fields but I am having problems selecting identical geometries.&amp;nbsp; The closest I have have been able to do it is with a script like the one below: &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Select PROVNAME,DBANAME,FRN,TRANSTECH,SPECTRUM,MAXADUP,MAXADDOWN,TYPICUP,TYPICDOWN,STATE_CODE,&amp;nbsp; SDO_GEOM.SDO_AREA(shape,.005),count(*) COUNT&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;from BB_SERVICE_WIRELESS_V2A&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;where FCC_SUBMISSION_CYCLE &amp;lt;&amp;gt;'2010-SPRING'&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;GROUP BY PROVNAME, DBANAME, FRN, TRANSTECH, SPECTRUM, MAXADUP, MAXADDOWN, TYPICUP, TYPICDOWN, STATE_CODE, SDO_GEOM.SDO_AREA(shape,.005)&amp;nbsp; having count(1) &amp;gt;1&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;order by STATE_CODE;&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Basically select and group by all columns including the Area from the shape column.&amp;nbsp;&amp;nbsp; The problem with this approach is that it is very slow to run and I am checking only the area.&amp;nbsp; It is unlikely, but different geometries can have the exact same area. Can anyone suggest a more efficient way to get all exact duplicates in this table? &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks,&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 03 Dec 2010 18:04:11 GMT</pubDate>
      <guid>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596348#M33628</guid>
      <dc:creator>OvidioRivero</dc:creator>
      <dc:date>2010-12-03T18:04:11Z</dc:date>
    </item>
    <item>
      <title>Re: Find duplicate records in SDO_GEOMETRY table usig SQL</title>
      <link>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596349#M33629</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;I look for duplicates all the time, but I use a perfect hashing function (aka "digest") to &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;calculate a checksum (e.g., md5sum) across a binary stream formed by concatenating &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;the data from a list of columns. The trick is, Oracle doesn't have a perfect hashing &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;function natively, so you'd need to use Java to achieve the same effect.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Duplicate areas are much more frequent than duplicate geometries, especially&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;in the realm of sliver polygons. &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;Using all the vertices can cause false negatives if you don't have a strict rule about&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;rotating the rings before passing them through the digest algorithm (e.g., the leftmost&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;vertex with the greatest Y value is the starting point). [Hmm, that sounds like a fun&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;piece of code to write...]&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;- V&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 03 Dec 2010 19:29:58 GMT</pubDate>
      <guid>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596349#M33629</guid>
      <dc:creator>VinceAngelo</dc:creator>
      <dc:date>2010-12-03T19:29:58Z</dc:date>
    </item>
    <item>
      <title>Re: Find duplicate records in SDO_GEOMETRY table usig SQL</title>
      <link>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596350#M33630</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Hi Vince, &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Can you point to any code sample that does what you describe?&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Thanks, &lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN&gt;Ovidio&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 03 Dec 2010 19:41:01 GMT</pubDate>
      <guid>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596350#M33630</guid>
      <dc:creator>OvidioRivero</dc:creator>
      <dc:date>2010-12-03T19:41:01Z</dc:date>
    </item>
    <item>
      <title>Re: Find duplicate records in SDO_GEOMETRY table usig SQL</title>
      <link>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596351#M33631</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;SPAN&gt;Nothing in Java, or in Oracle.&amp;nbsp; '&lt;/SPAN&gt;&lt;A href="ftp://ftp.esri.com/pub/staff/vangelo/se_toolkit/index.html"&gt;se_toolkit&lt;/A&gt;&lt;SPAN&gt;' uses a number of different digest providers&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;from open-source libraries (se_tools/digest.c), and calculates them across a stream as &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;part of the Digest DAT class (dat/dat_compute.c).&amp;nbsp; The ring orienter isn't written yet.&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; &lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;- V&lt;/SPAN&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 03 Dec 2010 20:00:40 GMT</pubDate>
      <guid>https://community.esri.com/t5/data-management-questions/find-duplicate-records-in-sdo-geometry-table-usig/m-p/596351#M33631</guid>
      <dc:creator>VinceAngelo</dc:creator>
      <dc:date>2010-12-03T20:00:40Z</dc:date>
    </item>
  </channel>
</rss>

