Functio to List Non-GIS files

772
5
06-06-2023 10:56 AM
Status: Closed
AlfredBaldenweck
MVP Regular Contributor

Related: Solved: Re: Print files in folder as ArcCatalog sees them - Esri Community

I would like a function that lists non-GIS files in a directory.

Ideally, it'd be kind of the inverse of arcpy.da.Walk, which only lists GIS files "as Catalog sees them" (This is false. Catalog sees things that Walk can't, and vice versa). 

Obviously, the best thing would be a function that lists GIS and non-GIS files at once, but I would be more than happy to settle for one that just lists non-GIS files, since I could combine it with the otherwise very useful arcpy.da.Walk()

As it stands now, there isn't a good way to list the contents of a folder that contains GIS and non-gis formats, meaning that your options are to either use arcpy.da.Walk() and miss out on the non-GIS data, or to use os.walk() and suffer through this:

AlfredBaldenweck_0-1686072414789.png
Results of os.walk()

 

When we could have something closer to this:

  • CDT_ALL_Trail.mdb
  • example1.lyrx
  • example1.shp
  • Georeferencing guide.docx
  • Refresh.lyrx
  • gdb\ex1
  • gdb\ex2
  • gdb\example4

Using os.walk(), I myself have figured out a way to filter out GIS files, but it took a long time to figure out and required me to:

  1. arcpy.ListFiles

  2. Filter that list through my list of extensions to ignoreAlfredBaldenweck_1-1686073104792.png
    Note: There are 96 extensions here and I know for a fact I'm missing several more that I need.

  3. Take the filtered list and combine it with the results of:
    1. arcpy.ListFeatureClasses
    2. arcpy.ListDatasets:
      1. For each dataset, I have to set the environment to that dataset, then run List.FeatureClasses, then append that to the main list of files, then reset the environment.
    3. arcpy.ListTables
    4. arcpy.ListRasters
  4. Also needed to differentiate between GDBs and non-gdb folders.

It is also not as efficient as I'm sure it could be.

Please give us a way to tell the contents of a folder containing GIS and non-GIS data that does not require us to wade through all the component files that make up GIS data.

As I said in the second link up above, I really, really never need to see an IXS file.

 

5 Comments
HannesZiegler
Status changed to: Closed

Thank you for submitting your idea to our ideas exchange forum. We appreciate your suggestion and the time you took to provide detailed feedback. Please note that our existing Python tooling is designed to handle most common requirements efficiently. For other use cases, Python already offers os.walk for listing non-GIS files. After careful consideration, we regret to inform you that we will not be implementing this idea at this time.

Thank you again for your contribution and understanding. 

EsriQruqs

Even better (IMO) and far easier would be for Esri to provide a list of known GIS extensions, much like:

>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'

But instead one would receive a list of strings.

This they could tuck away in some simple module and make it available for downloading or something. Or why not just a code snippet somewhere in the docs?

So, in your code you could do:

for FILE in os.listdir('.'):
    if not os.path.splitext(FILE)[1] in gistypes:
        other_files.append(FILE)

 

AlfredBaldenweck

Good idea. I think that's a pretty viable solution and is actually basically exactly what I was already doing, but with the added bonus of the publisher making that information available instead of the user(s) independently trying to compile a list of file extensions based on their own data.

Overall pretty clean and would solve the problem, even if it's not the solution I wanted.

I'll let this sit over the weekend and maybe add a new idea for it.

MErikReedAugusta

I just stumbled upon this today, and I want to push back against @HannesZiegler 's response.  I think this misreads the intent of the function that @AlfredBaldenweck is describing here.

For the purposes of this discussion, I see four broad groups of filetypes as being relevant:

  1. Primary GIS Files
    • These are the files that appear in Catalog (more or less)
  2. Background/Hidden GIS Files
    • These are the files that are part of what shows in Catalog, but that you only see in File Explorer
    • e.g., example1.dbfa00000001.gdbindexes
  3. Potentially non-GIS files that appear in Catalog
    • e.g., Excel files
  4. Everything Else

The fundamental problem here is Item 2.  There's currently no easy way to exclude it from the ordinary python Walk functions, and it's not recognized by the arcpy Walks, so you can't even use that to exclude them.

 

I work for a municipal government in the AEC sector, and a given project file will regularly be filled with a mix of Shapefiles, GDBs, and/or non-GIS supporting documents.

Right now, all those "hidden" GIS files clutter the results of a generic python Walk function.  Catalog and the arcpy Walks (arcpy.Walk & arcpy.da.Walk) are smart enough to understand that it's a single "file" spread across multiple supporting files.  Generic Walks (os.walk and pathlib.Path.walk) can't be expected to know that, which forces us end users to manually compile a list of all the extensions to be aware of and filter out of those generic walks.  There's just too much potential for human error, here.

I was actively combing through a directory that has a mix of georeferenced and non-georeferenced files when I stumbled upon this post, and this gap in the two sets of Walk functions directly impacted my work and added to the general confusion of scanning the directories.

 

 

I think @EsriQruqs 's suggestion of some kind of official attribute in arcpy that just holds a master list of all the extensions represented by Item 2 above is a reasonable compromise.  But I feel like something is warranted here to address this gap.

HannesZiegler

Thank you for the feedback @MErikReedAugusta, there is a new idea that @AlfredBaldenweck created based on the comments after this idea was closed:

Provide list of known GIS extensions - Esri Community