Related: Solved: Re: Print files in folder as ArcCatalog sees them - Esri Community
I would like a function that lists non-GIS files in a directory.
Ideally, it'd be kind of the inverse of arcpy.da.Walk, which only lists GIS files "as Catalog sees them" (This is false. Catalog sees things that Walk can't, and vice versa).
Obviously, the best thing would be a function that lists GIS and non-GIS files at once, but I would be more than happy to settle for one that just lists non-GIS files, since I could combine it with the otherwise very useful arcpy.da.Walk()
As it stands now, there isn't a good way to list the contents of a folder that contains GIS and non-gis formats, meaning that your options are to either use arcpy.da.Walk() and miss out on the non-GIS data, or to use os.walk() and suffer through this:
Results of os.walk() |
When we could have something closer to this:
Using os.walk(), I myself have figured out a way to filter out GIS files, but it took a long time to figure out and required me to:
It is also not as efficient as I'm sure it could be.
Please give us a way to tell the contents of a folder containing GIS and non-GIS data that does not require us to wade through all the component files that make up GIS data.
As I said in the second link up above, I really, really never need to see an IXS file.
Thank you for submitting your idea to our ideas exchange forum. We appreciate your suggestion and the time you took to provide detailed feedback. Please note that our existing Python tooling is designed to handle most common requirements efficiently. For other use cases, Python already offers os.walk for listing non-GIS files. After careful consideration, we regret to inform you that we will not be implementing this idea at this time.
Thank you again for your contribution and understanding.
Even better (IMO) and far easier would be for Esri to provide a list of known GIS extensions, much like:
>>> import string
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
But instead one would receive a list of strings.
This they could tuck away in some simple module and make it available for downloading or something. Or why not just a code snippet somewhere in the docs?
So, in your code you could do:
for FILE in os.listdir('.'):
if not os.path.splitext(FILE)[1] in gistypes:
other_files.append(FILE)
Good idea. I think that's a pretty viable solution and is actually basically exactly what I was already doing, but with the added bonus of the publisher making that information available instead of the user(s) independently trying to compile a list of file extensions based on their own data.
Overall pretty clean and would solve the problem, even if it's not the solution I wanted.
I'll let this sit over the weekend and maybe add a new idea for it.
I just stumbled upon this today, and I want to push back against @HannesZiegler 's response. I think this misreads the intent of the function that @AlfredBaldenweck is describing here.
For the purposes of this discussion, I see four broad groups of filetypes as being relevant:
The fundamental problem here is Item 2. There's currently no easy way to exclude it from the ordinary python Walk functions, and it's not recognized by the arcpy Walks, so you can't even use that to exclude them.
I work for a municipal government in the AEC sector, and a given project file will regularly be filled with a mix of Shapefiles, GDBs, and/or non-GIS supporting documents.
Right now, all those "hidden" GIS files clutter the results of a generic python Walk function. Catalog and the arcpy Walks (arcpy.Walk & arcpy.da.Walk) are smart enough to understand that it's a single "file" spread across multiple supporting files. Generic Walks (os.walk and pathlib.Path.walk) can't be expected to know that, which forces us end users to manually compile a list of all the extensions to be aware of and filter out of those generic walks. There's just too much potential for human error, here.
I was actively combing through a directory that has a mix of georeferenced and non-georeferenced files when I stumbled upon this post, and this gap in the two sets of Walk functions directly impacted my work and added to the general confusion of scanning the directories.
I think @EsriQruqs 's suggestion of some kind of official attribute in arcpy that just holds a master list of all the extensions represented by Item 2 above is a reasonable compromise. But I feel like something is warranted here to address this gap.
Thank you for the feedback @MErikReedAugusta, there is a new idea that @AlfredBaldenweck created based on the comments after this idea was closed:
Provide list of known GIS extensions - Esri Community
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.