Checking for Duplicate strings

550
10
10-23-2013 12:55 PM
MikeHenson
Occasional Contributor
working on a sewer project and assigning new names to about 1500 manholes. Is there a way to check my data to see if there are duplicate names that I have assigned? If so, can you point me in the right direction?
Thanks
Tags (2)
0 Kudos
10 Replies
RichardFairhurst
MVP Honored Contributor
working on a sewer project and assigning new names to about 1500 manholes. Is there a way to check my data to see if there are duplicate names that I have assigned? If so, can you point me in the right direction?
Thanks


Use Summary Statistics on the field and look for any frequency greater than 1.  Relate or join back to the source to select them.
0 Kudos
TedCronin
MVP Honored Contributor
0 Kudos
TimWitt
Frequent Contributor
While we are at it here is another way to identify your duplicates within your attribute table.

http://support.esri.com/en/knowledgebase/techarticles/detail/38700
0 Kudos
TedCronin
MVP Honored Contributor
Another cool sample that does a unique, but not necessarily a summarize is from the following help doc.  This is nice because you don't have to output a table, you can see a unique list on screen, so in the end the frequency is better and more related to your question, but there is an output file.  Sometimes a unique handles what you need, plus this is just a cool sample.

http://resources.arcgis.com/en/help/main/10.2/index.html#//018w00000011000000



import arcpy
fc = "c:/data/base.gdb/well"
field = "Diameter"


# Use SearchCursor with list comprehe...

uniqueValues = sorted(set(values))

print(uniqueValues)
0 Kudos
RichardFairhurst
MVP Honored Contributor
Another cool sample that does a unique, but not necessarily a summarize is from the following help doc.  This is nice because you don't have to output a table, you can see a unique list on screen, so in the end the frequency is better and more related to your question, but there is an output file.  Sometimes a unique handles what you need, plus this is just a cool sample.

http://resources.arcgis.com/en/help/main/10.2/index.html#//018w00000011000000



import arcpy
fc = "c:/da...












uniqueValues = sorted(set(values))

print
(uniqueValues)


I did not recommend Frequency, because it requires an Advanced license while Summary Statistics is available with a Basic license, and in my opinion Summary Statistics can do everything Frequency can do and more.  If the Summary Statistics/Frequency tool is used in Model Builder or a python script and you can script the response to duplicate values without user interaction you do not have to output a file if you output it to an in_memory table
0 Kudos
TedCronin
MVP Honored Contributor
I did not recommend Frequency, because it requires an Advanced license while Summary Statistics is available with a Basic license, and in my opinion Summary Statistics can do everything Frequency can do and more.  If the Summary Statistics/Frequency tool is used in Model Builder or a python script and you can script the response to duplicate values without user interaction you do not have to output a file if you output it to an in_memory table


Really, unaware that there was a license restriction...  Rich.

I see Summarize as more oriented towards ArcView Users while Frequency is more oriented towards ArcInfo users.  I see them as solving different scenarios.

I see these responses as providing alternative ways to do something, like I got to learn from Tim with his Support example.

Good point though about in_memory, don't use in_memory much...  just love the List Comprehension example, very elegant approach.
0 Kudos
RichardFairhurst
MVP Honored Contributor
I see Summarize as more oriented towards ArcView Users while Frequency is more oriented towards ArcInfo users.


You should reexamine the abilities of the Summary Statistics tool before you dismiss it as only oriented to the needs of mere ArcView/Basic users.

1.  Frequency requires you to select at least one case field and although it allows you to select multiple case fields you are limited to selecting them in the order that they occur within the data source.  This means that when you select multiple fields that the sort order of the multiple unique combined case field values is restricted to the field arrangement of the original source.

Summary Statistics allows you to select no case field, one case field, or multiple case fields.  When you use multiple case fields they can be arranged in any order that you like.  Therefore the sort order of the multiple unique combined case field values can be in any arrangement that you like and is free from any restrictions of the source data field arrangement order,

2.  Frequency does not require you to include a summary value, but if you do you can only output the Sum of numeric fields.  The summed fields must be arranged in the order that they appear in the source data.

With Summary Statistics you are required to create at least one summary field, but it can produce any form of summary value you want (Sum, Min, Max, Mean, Range, Standard Deviation, Count, First and Last).  It can summarize numeric fields and text fields (Min, Max, Count, First, Last being the most useful text summaries).  The order of the summary fields can be outputted in any arrangement that you like and you can get multiple different types of summary values on any given source field.

Although date fields are filtered out of the Summary fields drop down, you can actually still type them into the Summary Fields text Box and press enter to add them to the summary field list.  After they are added to the summary field list you can choose any of the summary options that are available for numbers.  The output of date summaries will be the numeric representations of the date (Min, Max, First, Last) or calculations based on the date (all other summaries).   If you use the Field Calculator to transfer a Min, Max, First or Last date number into an actual date field the date number will be transformed back to its native date appearance.

Given all of the abilities of the Summary Statistics tool, I see no reason to ever use the Frequency tool again.
0 Kudos
TedCronin
MVP Honored Contributor
You should reexamine the abilities of the Summary Statistics tool before you dismiss it as only oriented to the needs of mere ArcView/Basic users.

1.  Frequency requires you to select at least one case field and although it allows you to select multiple case fields you are limited to selecting them in the order that they occur within the data source.  This means that when you select multiple fields that the sort order of the multiple unique combined case field values is restricted to the field arrangement of the original source.

Summary Statistics allows you to select no case field, one case field, or multiple case fields.  When you use multiple case fields they can be arranged in any order that you like.  Therefore the sort order of the multiple unique combined case field values can be in any arrangement that you like and is free from any restrictions of the source data field arrangement order,

2.  Frequency does not require you to include a summary value, but if you do you can only output the Sum of numeric fields.  The summed fields must be arranged in the order that they appear in the source data.

With Summary Statistics you are required to create at least one summary field, but it can produce any form of summary value you want (Sum, Min, Max, Mean, Range, Standard Deviation, Count, First and Last).  It can summarize numeric fields and text fields (Min, Max, Count, First, Last being the most useful text summaries).  The order of the summary fields can be outputted in any arrangement that you like and you can get multiple different types of summary values on any given source field.

Although date fields are filtered out of the Summary fields drop down, you can actually still type them into the Summary Fields text Box and press enter to add them to the summary field list.  After they are added to the summary field list you can choose any of the summary options that are available for numbers.  The output of date summaries will be the numeric representations of the date (Min, Max, First, Last) or calculations based on the date (all other summaries).   If you use the Field Calculator to transfer a Min, Max, First or Last date number into an actual date field the date number will be transformed back to its native date appearance.

Given all of the abilities of the Summary Statistics tool, I see no reason to ever use the Frequency tool again.


Thanks Rich for your extreme intelligence on the matter, like I said they serve different purposes.  It seems that Riverside County Trans employees have lots of free time on their hands.  Thanks for your service.
0 Kudos
RichardFairhurst
MVP Honored Contributor
Thanks Rich for your extreme intelligence on the matter, like I said they serve different purposes.  It seems that Riverside County Trans employees have lots of free time on their hands.  Thanks for your service.


Well this tools happens to be the workhorse tool of all of my scripts, so getting to know it well has proven very beneficial for creating scripts that give the free time to help others.  😉
0 Kudos