arcpy.da.Walk to not read excel worksheets

4302
14
Jump to solution
06-15-2016 02:13 PM
AmyKlug
Occasional Contributor III

My walk code will not skip reading the worksheets inside excel files. Staff have excel files with a huge amount of worksheets that is slowing my walk down so that it is basically unusable. I think it is still reading the excel worksheets in the else statement.

import arcpy, os, traceback, sys
arcpy.env.overwriteOutput = True
workspace = r"C:\Users\Documents\GisData"
arcpy.env.workspace = workspace
try:
    walk = arcpy.da.Walk(workspace)
    txt = open(r"C:\Users\Documents\StaffGISLibrary.txt", 'w')
    for dirpath, dirnames, filenames in walk:
        if arcpy.Exists(dirpath):
            #describe = arcpy.Describe(dirpath)
            if dirpath.endswith(('.xls', '.xlsx', '.txt')):
                print "skipping excel file"
                pass
            else:
                for filename in filenames:
                    fullpath = os.path.join(dirpath, filename)
                    describe = arcpy.Describe(fullpath)
                    print "writing " + fullpath
                    txt.write(fullpath + "," + filename + "," + describe.dataType + "\n")
        else:
            print "DOES NOT EXIST"
            pass
    del filename, dirpath, dirnames, filenames
    txt.close() 
except Exception, e:
    pass
    # If an error occurred, print line number and error message
    import traceback, sys
    tb = sys.exc_info()[2]
    print "Line %i" % tb.tb_lineno
    print e.message
finally:
    raw_input("Finished!")
0 Kudos
14 Replies
JoshuaBixby
MVP Esteemed Contributor

The simplest fix is changing line 13 from pass to continue.  You are using the wrong control flow statement.  See More Control Flow Tools.

0 Kudos
AmyKlug
Occasional Contributor III

I tried it with continue too, same result

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

I was hoping the "simplest" fix would work.  After seeing your reply and thinking about it, I realized why it doesn't work, or at least why your code is still slowing down when coming across Excel files.

As the ArcPy Data Access Walk documentation states, the standard method of forgoing a subworkspace is to modify the directory names list in place before the function starts stepping down into them.

When topdown is True, the dirnames list can be modified in-place, and Walk() will only recurse into the subworkspaces whose names remain in dirnames. This can be used to limit the search, impose a specific order of visiting, or even to inform Walk() about directories the caller creates or renames before it resumes Walk() again. Modifying dirnames when topdown is Falseis ineffective, because in bottom-up mode the workspaces in dirnames are generated before dirpath itself is generated.

Something along the lines of:

>>> walk = arcpy.da.Walk(workspace)
>>> for dirpath, dirnames, filenames in walk:
...    for dir in dirnames[:]:
...        if dir.endswith(('.xls', '.xlsx', '.txt')):
...            dirnames.remove(dir)
...    for filename in filenames:
....

Make sure to iterate over a copy of dirnames (as done by dirnames[:] ) or modifying the list in place won't work.

0 Kudos
AmyKlug
Occasional Contributor III

Why would this print statement not work? (EOF error when using "\")

    for dirpath, dirnames, filenames in walk:
        for d in dirnames:
            for f in filenames:
                print dirpath + d + "\" + f
    del d, f, dirpath, dirnames, filenames
0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Backslashes are escape characters in Python.  Since you aren't escaping the escape character, you are likely creating a special character with one of your file names and causing an issue.  The safer approach when building file system paths is to use Python's os.path functionality.

0 Kudos