Can Python be used to find duplicate datasets?

631
3
Jump to solution
05-21-2020 12:57 PM
GailMorrison
New Contributor II

Can Python be used to compare datasets?  We have 2 directories stuffed with datasets. I need to sort thru the directories to determine which dataset is a stand-alone vs. a duplicate in a 2nd directories. Any ideas on how to go about this?

0 Kudos
1 Solution

Accepted Solutions
DanPatterson
MVP Esteemed Contributor

As Joe suggests, it could get real difficult if you have multiple data types and aren't familiar with coding (using arcpy.da.Walk) and several of the tools in this toolset

An overview of the Data Comparison toolset—Data Management toolbox | Documentation 


... sort of retired...

View solution in original post

3 Replies
DanPatterson
MVP Esteemed Contributor

depends on what you are comparing, but it is part of the standard module in all versions of python

filecmp — File and Directory Comparisons — Python 3.8.3 documentation 

moved to Python


... sort of retired...
JoeBorgione
MVP Emeritus

We have 2 directories stuffed with datasets.

What is your definition of dataset in this context?  Are these shape files?  Are they feature classes within geodatabases?  When you say duplicates do you mean the same name?

That should just about do it....
DanPatterson
MVP Esteemed Contributor

As Joe suggests, it could get real difficult if you have multiple data types and aren't familiar with coding (using arcpy.da.Walk) and several of the tools in this toolset

An overview of the Data Comparison toolset—Data Management toolbox | Documentation 


... sort of retired...