Select to view content in your preferred language

GIS project data data share - OK to enable Data deduplication on the disk?

140
1
Jump to solution
a month ago
EricJohnson-HDR
Occasional Contributor

We have an existing share that is reaching the limit (16TB) and the server folks asked if they can enable data dedupe on that volume.  I seem to remember from years back that is was a bad idea/not recommended to do so with GIS data.  My Google quest has come up empty as to the answer to that.  Any one have a link or thoughts either way?

I have asked some of the data owners to do some house cleaning, but we all know how folks love to not delete anything.

Thanks.

1 Solution

Accepted Solutions
MarceloMarques
Esri Regular Contributor

It is still a bad idea, especially if you use file geodatabases, see the thread below.

Data Deduplication on the File System - Esri Community

The root of a file geodatabase is a directory, and hardlinks aren't allowed on directories.  Hardlinking individual files between separate file geodatabases will definitely cause corruption of one or more of the geodatabases, it would be a matter of when and not if.

If you have lots of duplicative geospatial data, it is better to change your workflows and practices to reduce the duplication than relying on filesystem-level functionality that is completely unaware of the filesystem structure of geospatial data.

Storage isn't free, and it can definitely add up in cost if mindlessly wasted, but in general the price of strorage pales in comparison to the cost of collecting or deriving data.

| Marcelo Marques | Esri Principal Product Engineer | Cloud & Database Administrator | OCP - Oracle Database Certified Professional | "In 1992, I embarked on my journey with Esri Technology, and since 1997, I have been working with ArcSDE Geodatabases, right from its initial release. Over the past 32 years, my passion for GIS has only grown stronger." | “ I do not fear computers. I fear the lack of them." Isaac Isimov |

View solution in original post

1 Reply
MarceloMarques
Esri Regular Contributor

It is still a bad idea, especially if you use file geodatabases, see the thread below.

Data Deduplication on the File System - Esri Community

The root of a file geodatabase is a directory, and hardlinks aren't allowed on directories.  Hardlinking individual files between separate file geodatabases will definitely cause corruption of one or more of the geodatabases, it would be a matter of when and not if.

If you have lots of duplicative geospatial data, it is better to change your workflows and practices to reduce the duplication than relying on filesystem-level functionality that is completely unaware of the filesystem structure of geospatial data.

Storage isn't free, and it can definitely add up in cost if mindlessly wasted, but in general the price of strorage pales in comparison to the cost of collecting or deriving data.

| Marcelo Marques | Esri Principal Product Engineer | Cloud & Database Administrator | OCP - Oracle Database Certified Professional | "In 1992, I embarked on my journey with Esri Technology, and since 1997, I have been working with ArcSDE Geodatabases, right from its initial release. Over the past 32 years, my passion for GIS has only grown stronger." | “ I do not fear computers. I fear the lack of them." Isaac Isimov |