loop down through a directory tree.

8257
19
07-02-2014 09:01 AM
PaulHuffman
Occasional Contributor III
I want to loop through a folder and clip all feature classes by a AOI poly and write the results to an output folder.  How to I also have the loop search all subfolders for vectors and raster feature classes as well?
Tags (2)
0 Kudos
19 Replies
ChrisSnyder
Regular Contributor III
0 Kudos
markdenil
Occasional Contributor III
Or, if you are using an older (<= 10.1) Arc,
you can use the python os.walk

for root, dirs, files in os.walk(basePathVariable):
        print "%s, " % (root)
        print str(sum(getsize(join(root, name)) for name in files))
        print " bytes, %s files\n" % (len(files))

prints the number and size of non-directory files in each directory and subdirectory.

Not exactly what you want to do, but you can get where you want to go from there...
0 Kudos
ChrisBrannin__GISP
New Contributor
I want to loop through a folder and clip all feature classes by a AOI poly and write the results to an output folder.  How to I also have the loop search all subfolders for vectors and raster feature classes as well?


This will walk all directories you want, including GDB's. You can also add the if to remove directories you're not interested in crawling.

for dirpath, dirnames, filenames in arcpy.da.Walk(in_workspace, datatype="FeatureClass",type="All"):
    if "Remove_Folder" in dirnames:
        dirnames.remove('Remove_Folder')
    for filename in filenames:
        if "{Field Name}" in [f.name for f in arcpy.ListFields(os.path.join(dirpath, filename))]:
            arcpy.Clip_analysis(os.path.join(dirpath, filename), clip_features, out_feature_class, xy_tolerance)
0 Kudos
PaulHuffman
Occasional Contributor III
It looks like I need to do it in 10.0 unless I travel there with a > 10.0 laptop installation. And I might have to do that because some of that data is in geodatabases, so os.walk will not find them. 

Something else has come up.  Many of the input feature classes are way outside the AOI clip.  Too much time is being taken up by handling these out of range FCs and the output files are just empty.  Is there a way of avoiding the out of range data in the clipping loop? It looks like I could first test for intersection with the clip polygon with Select layer by Location but I'm having a hard time telling if that tool is available at 10.0. I don't have 10.0 installed anywhere at my current location, and I was unable to find 10.0 help online.
0 Kudos
ChrisSnyder
Regular Contributor III
Some comments:

1. os.walk does get a you a handle on the directory name that is being parsed, so you could look for if dirName[-4:] == ".gdb" and if found, set the workspace to it, and issue arcpy.listfeatureclasses(). However, if I were you I wouldn't waste time developing code in anything < than v10.1 (since there is so many cool new toys in v10.1 SP1 like .da cursors, and .da.walk).

2. How about just intersecting the extent rectangle of your clip layer vs. the extent rectangles of the FCs that are returned by *.walk? If there is no intersection, don't clip. This extent comparison is easier/faster to do in v10.1 BTW.
0 Kudos
PaulHuffman
Occasional Contributor III

I was trying to get this script done yesterday,  but puzzled why half of it doesn't work. Because ESRI is doing site maintenance, I couldn'tget back in to ask this question. Client still waiting.

I was trying to get a script to walk down a tree of LIDAR data, find any rasters and feature classes, and clip them to an area of interest.  My test data tree just has some tifs in one folder,  some shapefiles in another to make things simpler.  And I need to make this run in ArcGIS 10.0,  so da.walk and da.Cursor are not available.

When my script hit the first tif file,  the AOI polygon didn't overlap it at all and I got an error message that it didn't overlap.  (Failed to create raster dataset. The reason could be: The clip feature is outside the raster extent.

Failed to execute (Clip).) I was going to put off solving the problem of testing for overlap until I could get the file finding and clipping figured out. So, following an example at http://gis.stackexchange.com/questions/13571/arcpy-compare-extents-of-rasters, I added the commented out section, lines 26 to 49,  to check for overlap of extent for the tifs.  However, the script hits line 48, and says name 'Subset' is not defined.  Also none of the print statements in the section are executed,  so it looks like none of the extent tests worked.  What went wrong? If I just run the script down to the first print statement,  all the tifs are found and printed.

In the next section, an if statement looks for just shapefiles, and successfully clips them and writes the output to a new output tree. No check for overlap is done for the shapefiles but I probably should to eliminate  empty output shapefile and to save execution time.

Hey, how do you insert code in this new forum?

#testclip.py

#test trying to walk a directory tree, find rasters and shapefiles, and clip to a area of Interest (AOI)

# and make it run in 10.0 so without da.walk.

# Paul Huffman, Yakama Fisheries, 7/7/2014

# Import arcpy module

import arcpy

import os

arcpy.CheckOutExtension("spatial")

# Set Geoprocessing environments

arcpy.env.scratchWorkspace = "c:\\avdata\\PythonTest\\ClipLidar\\Scratch"

arcpy.env.workspace = "c:\\avdata\\PythonTest\\ClipLidar\TestData"

arcpy.env.overwiteOutput = True

clipshape = "c:\\avdata\\PythonTest\\ClipLidar\\AOIPoly.shp"

InputFolder = "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Input"

OutputFolder = "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Output"

#Find AOI extent

AOI = arcpy.mapping.Layer(clipshape)

AOIextent=AOI.getExtent()

#Find the input feature classes

for root, dirs, files in os.walk(InputFolder):

    for name in files:

        if os.path.splitext(name)[1] == ".tif":

            print os.path.join(root, name), '\n'

            FileObj = arcpy.mapping.Layer(os.path.join(root, name))

            FileExtent = FileObj.getExtent()

            print "Processing: " + name

            print FileExtent

##            Extent_Overlap = str(FileExtent.overlaps(AOIextent))

##            Extent_Within = str(FileExtent.within(AOIextent))

##            Extent_Touches = str(FileExtent.touches(AOIextent))

##            Extent_Crosses = str(FileExtent.crosses(AOIextent))

##            if Extent_Overlap == 'True':

##                Subset = 1

##                print "Extent_Overlaps"

##            elif Extent_Within == 'True':

##                Subset = 1

##                print "Extent_Within"

##            elif Extent_Touches == 'True':

##                Subset = 1

##                print "Extent_Touches"

##            elif Extent_Crosses == 'True':

##                Subset = 1

##                print "Extent_Crosses"

##            #print(os.path.join(root, name))

##            #arcpy.Clip_management(name,clipshape,(os.path.join(OutputFolder,name)))

##            if Subset == 1:

##                arcpy.Clip_management((os.path.join(root, name)),"#",(os.path.join(OutputFolder,name)),clipshape,"#","ClippingGeometry")

##        if os.path.splitext(name)[1] == ".shp":

##            print(os.path.join(root, name))

##            #arcpy.Clip_management((os.path.join(root, name)),"#",(os.path.join(OutputFolder,name)),clipshape,"#","ClippingGeometry")

##            arcpy.Clip_analysis((os.path.join(root, name)),clipshape,(os.path.join(OutputFolder,name)))

End of code.

I also found the python glob method.  It also can find all the tifs by searching through the tree like:

#Find the input feature classes

print (glob.glob("c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Input\\*\\*.tif"),'\n')

##for root, dirs, files in os.walk(InputFolder):

##    for name in files:

##        if os.path.splitext(name)[1] == ".tif":

##            print os.path.join(root, name), '\n'

where the glob statement returns the same list as the next four commented out statements.  However, I eventually want to parse out the folders in which the files were found so I can reconstruct the input tree structure in the output tree.  Don't know I can do that with the glob returns.  Actually I don't know how to to that yet with the walk returns either.

0 Kudos
PaulHuffman
Occasional Contributor III

I got this to work for at least tifs and shapefiles. I'm not sure why I had to use mapping.layer, line 20 and 29, before I could get an Extent of the data.  Seems like the overlap tests, lines 32 -  47, could be down more efficiently.  Is there just one test that would find if there is no touching or overlap, then skip that file.  Where's the docs on that .overlaps stuff?  Can't find it today.

Currently all my output lands in one folder. My next step needs to be writing the output to separate folders in the output directory,  making the output tree structure look like the input tree.  I'm guessing that will involve manipulating the strings of the full paths of the inputs.

#testclip.py

#test trying to walk a directory tree, find rasters and shapefiles, and clip to a area of Interest (AOI)

# and make it run in 10.0 so without da.walk.

# Paul Huffman, Yakama Fisheries, 7/7/2014

# Import arcpy module

import arcpy

import os

#arcpy.CheckOutExtension("spatial")

# Set Geoprocessing environments

arcpy.env.scratchWorkspace = "c:\\avdata\\PythonTest\\ClipLidar\\Scratch"

arcpy.env.workspace = "c:\\avdata\\PythonTest\\ClipLidar\TestData"

arcpy.env.overwriteOutput = True

clipshape = "c:\\avdata\\PythonTest\\ClipLidar\\AOIPoly_SP.shp"

InputFolder = "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Input"

OutputFolder = "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Output"

#Find AOI extent

AOI = arcpy.mapping.Layer(clipshape)

AOIextent=AOI.getExtent()

#Find the input feature classes

for root, dirs, files in os.walk(InputFolder):

    for name in files:

        Subset = 0

        if os.path.splitext(name)[1] == ".tif":

            #print os.path.join(root, name), '\n'

            FileObj = arcpy.mapping.Layer(os.path.join(root, name))

            FileExtent = FileObj.getExtent()

            print "Processing: " + name

            Extent_Overlap = str(FileExtent.overlaps(AOIextent))

            Extent_Within = str(FileExtent.within(AOIextent))

            Extent_Touches = str(FileExtent.touches(AOIextent))

            Extent_Crosses = str(FileExtent.crosses(AOIextent))

            if Extent_Overlap == 'True':

                Subset = 1

                print "Extent_Overlaps"

            elif Extent_Within == 'True':

                Subset = 1

                print "Extent_Within"

            elif Extent_Touches == 'True':

                Subset = 1

                print "Extent_Touches"

            elif Extent_Crosses == 'True':

                Subset = 1

                print "Extent_Crosses"

            if Subset == 1:

                arcpy.Clip_management((os.path.join(root, name)),"#",(os.path.join(OutputFolder,name)),clipshape,"#","ClippingGeometry")

        if os.path.splitext(name)[1] == ".shp":

            print(os.path.join(root, name))

            arcpy.Clip_analysis((os.path.join(root, name)),clipshape,(os.path.join(OutputFolder,name)))

0 Kudos
PaulHuffman
Occasional Contributor III

This is what I have now.  Works with tifs and shapefiles, but I need to figure out how to detect and process grids as well.  The script looks for files down through the tree below "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Input" then writes the data into a copy of the input tree structure below "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Output"  To migrate this script to another datas et,  the end user has to edit in the new clipshape AOI, the new input path, then also change the name of the top of the input tree "Input" in line 48.  This is too tricky. I'd rather have one section at the top where all the editable parameters are found. Maybe I can generalize this by somehow getting the name of that bottom rung folder out of the InputFolder string and stick it in a variable.

Lines 27 - 39 are redundantly repeated again redundantly at line 43.  Bad style.  These lines could define a function, except for the differences in lines 37 and 39.

#testclip.py

#test trying to walk a directory tree, find rasters and shapefiles, and clip to a area of Interest (AOI)

# and make it run in 10.0 so without da.walk.

# Paul Huffman, Yakama Fisheries, 7/7/2014

# Import arcpy module

import arcpy

import os

#arcpy.CheckOutExtension("spatial")

# Set Geoprocessing environments

arcpy.env.scratchWorkspace = "c:\\avdata\\PythonTest\\ClipLidar\\Scratch"

arcpy.env.workspace = "c:\\avdata\\PythonTest\\ClipLidar\TestData"

arcpy.env.overwriteOutput = True

clipshape = "c:\\avdata\\PythonTest\\ClipLidar\\AOIPoly_SP.shp"

InputFolder = "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Input"

#OutputFolder = "c:\\avdata\\PythonTest\\ClipLidar\\TestData\\Output"

#Find AOI extent

AOI = arcpy.mapping.Layer(clipshape)

AOIextent=AOI.getExtent()

#Find the input feature classes

for root, dirs, files in os.walk(InputFolder):

    for name in files:

        if os.path.splitext(name)[1] == ".tif":

            FileObj = arcpy.mapping.Layer(os.path.join(root, name))

            FileExtent = FileObj.getExtent()

            print "Processing: " + name

           #if image overlap or touches AOI polygon,  then clip image, write to output

            Extent_Disjoint = str(FileExtent.disjoint(AOIextent))

            if Extent_Disjoint == 'False':

                print "Clipping: " + name

                OutputPath = root.replace("Input","Output")

                if not os.path.exists(OutputPath):

                    os.makedirs(OutputPath)

                Clipname = os.path.splitext(name)[0] + "cp.tif"

                print "Writing " + (os.path.join(OutputPath,Clipname))

                arcpy.Clip_management((os.path.join(root, name)),"#",(os.path.join(OutputPath,Clipname)),clipshape,"#","ClippingGeometry")

        #for shapefiles, use a different clip, write to output.

        if os.path.splitext(name)[1] == ".shp":

            print(os.path.join(root, name))

            FileObj = arcpy.mapping.Layer(os.path.join(root, name))

            FileExtent = FileObj.getExtent()

            Extent_Disjoint = str(FileExtent.disjoint(AOIextent))

            if Extent_Disjoint == 'False':

                print "Clipping: " + name

                OutputPath = root.replace("Input","Output")

                if not os.path.exists(OutputPath):

                    os.makedirs(OutputPath)

                Clipname = os.path.splitext(name)[0] + "cp"

                print "Writing " + (os.path.join(OutputPath,Clipname))

                arcpy.Clip_analysis((os.path.join(root, name)),clipshape,(os.path.join(OutputPath,Clipname)))

   

0 Kudos
PaulHuffman
Occasional Contributor III

Now I also need to find any grids in the tree, clip them and write them out a output folder that corresponds to the grids' position in the input tree.  ListRasters makes a list of grids in the current workspace, not the walked folder,  so I had to redefine env.workspace inside a directory loop.  So how is my script going to know what folder the grids in my list were found in for formulating the out directory.  Another list of the input directories that the script matches up with the grid list based on order number?

rasterList=[]

#Find the input feature classes

for root, dirs, files in os.walk(InputFolder):

    for targetdir in dirs:

        print root + targetdir

        targetworkspace = root + targetdir

        arcpy.env.workspace = targetworkspace

        rasterList.append (arcpy.ListRasters("*", "GRID"))

for raster in rasterList:

        print raster

0 Kudos