arcpy.da.Walk to not read excel worksheets

4303
14
Jump to solution
06-15-2016 02:13 PM
AmyKlug
Occasional Contributor III

My walk code will not skip reading the worksheets inside excel files. Staff have excel files with a huge amount of worksheets that is slowing my walk down so that it is basically unusable. I think it is still reading the excel worksheets in the else statement.

import arcpy, os, traceback, sys
arcpy.env.overwriteOutput = True
workspace = r"C:\Users\Documents\GisData"
arcpy.env.workspace = workspace
try:
    walk = arcpy.da.Walk(workspace)
    txt = open(r"C:\Users\Documents\StaffGISLibrary.txt", 'w')
    for dirpath, dirnames, filenames in walk:
        if arcpy.Exists(dirpath):
            #describe = arcpy.Describe(dirpath)
            if dirpath.endswith(('.xls', '.xlsx', '.txt')):
                print "skipping excel file"
                pass
            else:
                for filename in filenames:
                    fullpath = os.path.join(dirpath, filename)
                    describe = arcpy.Describe(fullpath)
                    print "writing " + fullpath
                    txt.write(fullpath + "," + filename + "," + describe.dataType + "\n")
        else:
            print "DOES NOT EXIST"
            pass
    del filename, dirpath, dirnames, filenames
    txt.close() 
except Exception, e:
    pass
    # If an error occurred, print line number and error message
    import traceback, sys
    tb = sys.exc_info()[2]
    print "Line %i" % tb.tb_lineno
    print e.message
finally:
    raw_input("Finished!")
0 Kudos
1 Solution

Accepted Solutions
AmyKlug
Occasional Contributor III

Sorry for wasting everyone's time. it was just slow and I assumed it was stalling. guess I need to be more patient.

View solution in original post

0 Kudos
14 Replies
DanPatterson_Retired
MVP Emeritus

just a hunch, shouldn't it be the filenames that you are looking for the xls extensions not the dirpath?

Walk—Help | ArcGIS for Desktop

0 Kudos
AmyKlug
Occasional Contributor III

filenames goes into the worksheets themselves, was trying to avoid that

0 Kudos
DanPatterson_Retired
MVP Emeritus

I would throw in some print statements then, or use os.path to exclude excel files, although they should essentially work the same

0 Kudos
AmyKlug
Occasional Contributor III

I got to where I am by using print statements. the code works perfectly until I add in one of the large excel files (large amount of worksheets and records). it's not printing the excel files but it's still reading them in the else statement because my cursor stalls. Hope that makes sense. It's like dirpath or filenames is being pulled from the first part of the code before I excluded the excel files in the else statement.

0 Kudos
DanPatterson_Retired
MVP Emeritus

dirpath  isn't that the directory path

I would have put the excel check in the filename section or exclude it as a data type in the datatype section or provide an inclusion list of files you wish to examine

0 Kudos
AmyKlug
Occasional Contributor III

I originally had it in the filename section but it had to go in and read all the worksheets. when I changed that it worked much faster. I like the inclusion idea but not sure how to do that with file geodatabases and the feature classes within.....

I suppose I would need to run a "list" this or that................

0 Kudos
AmyKlug
Occasional Contributor III

Sorry for wasting everyone's time. it was just slow and I assumed it was stalling. guess I need to be more patient.

0 Kudos
AlexanderWinz
New Contributor II

Hey Amy,

have you tried os.path.exists() instead of arpcy.exists(). This should work way faster 🙂

0 Kudos
AmyKlug
Occasional Contributor III

I think I am going to move over to os.walk too, arcpy.da.walk errors "does not exist" for too many files, over 50 percent.

thanks for the tip for the os.path.exists()