Best practices for spatial data file and folder management

MattWilkie3 · ‎06-28-2017

The GIS Data Administration topic on the GIS.com wiki is an excellent high level overview of the various server level strategies available and in what situations to apply them. I’m looking for something similar with a focus on personal and small workgroup beast practices for local file and folder management. So one level below the server GIS data admin.

Our server level spatial data management is pretty good (albeit with ample room for improvement!). However we still have gigabytes to terabytes of GIS data and projects scattered throughout various offices in the organization that is effectively useless to a wider audience without the individual who created them present to interpret the file and folder arrangements.

What guidelines can we give people starting new projects that will minimize unnecessary local variation while still being adaptive to local circumstance and make it easier for

new staff to pick and run with an existing project (because structure is predictable)
to distinguish intermediate working and in-progress stuff from the polished and ready to go stuff (like milestones and deliverables)
data administrators to float the useful data to the top (to the corporately managed geodatabases)
share externally

Listed more or less in order of priority. Our organization operates within a 90% ArcGIS Desktop ecosystem, though some other platforms are used here and there as well.

My question is broader than just how to store the geometry files and attribute tables. A typical project also uses and produces MXDs, (aprx for Pro), input data, external reference data, interim data, output data, python scripts, toolboxes, models, output files (pdf, jpeg, …), raster imagery, layer files, word docs, text docs, … in addition to file-gdb and shapefiles.

In a well managed project, where does all this stuff go and how is it named?

JoshuaSchwartz · ‎06-30-2017

While not as all-encompassing as you're hoping for, here's my default "New Project" folder structure; perhaps it'll be helpful...

The Scratch, Output and SourceData folders have contain empty .gdbs. The Documentation folder contains a blank Workflow.docx that I fill out as I move forward with the analysis. I copy the project initiation email, and milestones, and the acceptance email into the Communications folder.

Hope this helps. And if you have any input I'd love to hear it!

MattWilkie3 · ‎07-05-2017

That's nearly identical to our current best layout! 😉 It is reassuring to see your project folder structure match ours so closely. It helps validate the thinking that got us this far.

The pain points we have with it:

Doesn't address shared data among related projects (e.g. "ProjectBBB\maps\composition.mxd" using a feature class from "ProjectDDD\data\Results.gdb\trails_2015")
New versions/Archiving: It's 2017 and we need to run new series of maps first created in 2015.
- "2015\ProjectAAA" and "2017\ProjectAAA" - mega data duplication. Definitely easiest to see what is most current though, and is highly portable.
- "ProjectAAA\maps\2015\" and "ProjectAAA\maps\2017\" - Pretty clear for maps, but gets internally complicated for data layers (are you sure you remembered to change source from "base_2015.gdb\trails" to "base_2017.gdb\trails"?), and very difficult to share.

Having mxd's in a subfolder became a real pain after the introduction of the 'Home Folder' concept, wherein you can't just click your way to the parent folder to get at Scratch and Source, but putting them all at the top quickly gets out of hand too.
I could never figure out the best place for SomeCustom.tbx. (tried "Tools" and "Tools\scripts" subfolders for awhile, but is a navigation pain and leads to much extra typing)
There are 500+ and growing top level "ProjectXXX" folders in my work unit alone. It's hard to track what's in what. (We're using a spreadsheet as index, which was great for a few years, but it's collapsing under it's own weight, and won't survive merging with other work units.)

I like the concept of Map and Layer Packages combined with a Check-out and Check-in to storage central workflow, but it doesn't work well when there are numerous data layers which shouldn't be wrapped up that aren't stored in an SDE. (Sure would nice if one could say "Exclude all layers in folder or drive X".). I've considered spawning one or more pre-corporately-managed GDB instances to address this, but the internal flat nature of GDB (all feature classes and tables stored at same level) means we'd need a dozen or more, which makes me leery.

Thank you for your contribution to this exploration.

JoshuaSchwartz · ‎07-05-2017

And thank you for the kind feedback

DakeHenderson · ‎07-02-2018

Thanks, very helpful. Where would ArcGIS Pro projects go in this file structure?

Charlie_Kaufman · ‎02-01-2023

Here is a similar data structure that I use in addition to the folders that ArcPro creates with any new Project. Glad to see we are all on similar pages. Would love to hear any input as well. If anyone has a file structure that works I'd like to see more examples.

MattWilkie3 · ‎07-05-2017

This presentation from Software Carpentry on Data Management is excellent even if it is for a different field:

http://v4.software-carpentry.org/data/mgmt.html

It exposes a way of thinking about the problem that I'm finding useful, if not quite yet fruitful (for us).