Print files in folder as ArcCatalog sees them

AlfredBaldenweck · ‎01-12-2023

Forgive me if I've asked this before.

Does anyone have a script written to return all the files in folder as Catalog sees them?

I can use arcpy.ListFiles(), but I don't want to see all the components that make up a shapefile, I just want to see the shapefile.

Similarly, I could use arcpy.ListFeatureClasses() or ...Tables, etc., but that would ignore non-GIS files.

For example, how could I cleanly return the contents of this folder, more or less as I see it here?

Compare to file explorer:

Thanks!

Anonymous User · ‎01-13-2023

You shouldn't remove items from an Array that you are iterating over, unless you are going in reverse. It is skipping those .shx, .prj, .sbn, .cpg extensions because the statement for f in TList just visits each item in the list in order: TList[0], then TList[1], then TList[2], and so on until it runs out of items.

Removing TList[0] shifts all the other items in the list to the left one slot; the original TList[1] is now TList[0], so the for loop will skip over it.

You can use reverse:

for f in reversed(TList):
    # print(os.path.splitext(f)[1])
    if os.path.splitext(f)[1] in ignoreExt:
        # print(f)
        TList.remove(f)

View solution in original post

JeffreyHolycross · ‎01-13-2023

I think the issue may be that you're modifying TList while trying to iterate over it. If you print(f) in every iteration, you'll see that not all of the original items are printed. You should create a new list and append to instead of modifying the existing list.

View solution in original post

DavidSolari · ‎01-12-2023

You can try something like this:

from os import path
def getFiles():
    def fname(x): return path.splitext(path.basename(x))[0]
    gisFiles = arcpy.ListFeatureClasses() + arcpy.ListTables() + arcpy.ListRasters()  # And so on so forth.
    gisBaseNames = {fname(g) for g in gisFiles}
    otherFiles = [f for f in arcpy.ListFiles() if fname(f) not in gisBaseNames]
    return sorted(gisFiles + otherFiles)

The catch is if you have, say, "mydata.shp" and "mydata.xlsx" in the same folder you'll lose the xlsx file. The fix for that is to build up a list of known special GIS extensions and split those files out, but that's a bit harder to write up so this should be a starting point at least.

Anonymous User · ‎01-13-2023

Just add the types you want to see to the files list. This will get files that are named the same but with different extensions as well.

files = ['.shp', '.txt', '.xlsx', '.csv', ...]
filteredFiles = [f for f in os.listdir(r'your path') if os.path.splitext(f)[1] in files]

If you wanted gdb's, you can use the for root, dir, files in os.walk(): method and filter the dir if it contains .gdb in the name.

AlfredBaldenweck · ‎01-13-2023

I'm going for a variation on this method, but I'm running into a weird issue. It's ignoring some of the shapefile parts when I'm telling it to remove them.

See here:

for r in rootL:
    arcpy.env.workspace = r
    ignoreExt = ['.shp', '.shx', '.dbf', '.prj', '.xml', '.sbn', '.sbx', '.cpg', '.aux']

    TList= ['test.shp', 'test.shx', 'test.dbf', 'test.prj', 'test.xml', 'test.sbn', 'test.sbx', 'test.cpg', 'test.aux' ]
    
    for f in TList:
        #print(os.path.splitext(f)[1])
        if os.path.splitext(f)[1] in ignoreExt:
            #print(f)
            TList.remove(f)
            
    print(TList)
    # ['test.shx', 'test.prj', 'test.sbn', 'test.cpg']

Why are these files still in the list?

Edit: Apparently splitext() is ignoring those files. What's weirder is that it doesn't ignore it if you feed them to it directly; os.path.splitext('test.shx')[1] will correctly yield ".shx"

JeffreyHolycross · ‎01-13-2023

I think the issue may be that you're modifying TList while trying to iterate over it. If you print(f) in every iteration, you'll see that not all of the original items are printed. You should create a new list and append to instead of modifying the existing list.

Anonymous User · ‎01-13-2023

You shouldn't remove items from an Array that you are iterating over, unless you are going in reverse. It is skipping those .shx, .prj, .sbn, .cpg extensions because the statement for f in TList just visits each item in the list in order: TList[0], then TList[1], then TList[2], and so on until it runs out of items.

Removing TList[0] shifts all the other items in the list to the left one slot; the original TList[1] is now TList[0], so the for loop will skip over it.

You can use reverse:

for f in reversed(TList):
    # print(os.path.splitext(f)[1])
    if os.path.splitext(f)[1] in ignoreExt:
        # print(f)
        TList.remove(f)