Need to copy JPGs in the subfolders to one single folder

157
7
09-15-2011 06:21 PM
New Contributor
Hello,

I am a new Python learner and have a task that I have to accomplish.  I have a bunch of JPGs that are in the subfolder's subfolder (see below):

C:\Project\Date\FolderA1\FolderA2\XXX.jpg
C:\Project\Date\FolderA1\FolderA2\YYY.jpg
C:\Project\Date\FolderB1\FolderB2\ZZZ.jpg
C:\Project\Date\FolderB1\FolderB2\AAA.jpg

What I really want to do is to point my code to C:\Project, then it can go through the subfolders and retrieve all JPGs then put it one single folder somewhere.  I have been researching online, and cannot understand how this is done.  If someone can help, I will appreciate it a lot!
Tags (2)
Reply
0 Kudos
7 Replies
MVP Regular Contributor
You could use a recursive 'glob':


class rglob:
    '''A recursive/regex enhanced glob
       adapted from os-path-walk-example-3.py - http://effbot.org/librarybook/os-path.htm
    '''
    def __init__(self, directory, pattern="*", regex=False, regex_flags=0, recurse=True):
        ''' @type    directory: C{str}
            @param   directory: Path to search
            @type    pattern: C{type}
            @param   pattern: Regular expression/wildcard pattern to match files against
            @type    regex: C{boolean}
            @param   regex: Use regular expression matching (if False, use fnmatch)
                            See U{http://docs.python.org/library/re.html}
            @type    regex_flags: C{int}
            @param   regex_flags: Flags to pass to the regular expression compiler.
                                  See U{http://docs.python.org/library/re.html}
            @type    recurse: C{boolean}
            @param   recurse: Recurse into the directory?
        '''
        self.stack = [directory]
        self.pattern = pattern
        self.regex = regex
        self.recurse = recurse
        self.regex_flags = regex_flags
        self.files = []
        self.index = 0

    def __getitem__(self, index):
        import os
        while 1:
            try:
                file = self.files[self.index]
                self.index = self.index + 1
            except IndexError:
                # pop next directory from stack

                self.directory = self.stack.pop()
                try:
                    self.files = os.listdir(self.directory)
                    print self.files
                    self.index = 0
                except:pass
            else:
                # got a filename
                fullname = os.path.join(self.directory, file)
                if os.path.isdir(fullname) and not os.path.islink(fullname) and self.recurse:
                    self.stack.append(fullname)
                if self.regex:
                    import re
                    if re.search(self.pattern,file,self.regex_flags):
                        return fullname
                else:
                    import fnmatch
                    if fnmatch.fnmatch(file, self.pattern):
                        return fullname

import shutil
search_dir=r'C:\Project'
out_dir=r'C:\Workspace'
for jpg in rglob(search_dir,'*.jpg'):
    print 'Copying: ' + jpg
    shutil.copy(jpg,out_dir)
Reply
0 Kudos
New Contributor III
That.... is a thing of beauty!!! Thank you so much for sharing that, Luke!

I just want to say that in the interest of giving the poster a smaller bit of code to dissect, one could make a smaller version using a filter and the exact example you were inspired by on the OS example page.

import os

class DirectoryWalker:
      '''Callously stolen (with attribution!) from os-path-walk-example-3.py
      Copyright © 1995-2010 by Fredrik Lundh
      http://effbot.org/librarybook/os-path.htm'''
     
    def __init__(self, directory):
        self.stack = [directory]
        self.files = []
        self.index = 0

    def __getitem__(self, index):
        while 1:
            try:
                file = self.files[self.index]
                self.index = self.index + 1
            except IndexError:
                # pop next directory from stack
                self.directory = self.stack.pop()
                self.files = os.listdir(self.directory)
                self.index = 0
            else:
                # got a filename
                fullname = os.path.join(self.directory, file)
                if os.path.isdir(fullname) and not os.path.islink(fullname):
                    self.stack.append(fullname)
                return fullname

import shutil
search_dir = r'C:\Project'
out_dir = r'C:\Workspace'

# This filter takes all the results from DirectoryWalker and only iterates
# through those that match the condition 'file.endswith('.jpg')
for jpg in filter(lambda x: x.endswith('.jpg'), DirectoryWalker(search_dir)):
    print('Copying: ' + jpg)
    shutil.copy(jpg,out_dir)


Diana: If you need any help understanding either of these examples, feel free to ask.

Cheers,
Marc
Reply
0 Kudos
New Contributor
I really appreciate your help! Thank you thank you thank you!

You could use a recursive 'glob':


class rglob:
    '''A recursive/regex enhanced glob
       adapted from os-path-walk-example-3.py - http://effbot.org/librarybook/os-path.htm
    '''
    def __init__(self, directory, pattern="*", regex=False, regex_flags=0, recurse=True):
        ''' @type    directory: C{str}
            @param   directory: Path to search
            @type    pattern: C{type}
            @param   pattern: Regular expression/wildcard pattern to match files against
            @type    regex: C{boolean}
            @param   regex: Use regular expression matching (if False, use fnmatch)
                            See U{http://docs.python.org/library/re.html}
            @type    regex_flags: C{int}
            @param   regex_flags: Flags to pass to the regular expression compiler.
                                  See U{http://docs.python.org/library/re.html}
            @type    recurse: C{boolean}
            @param   recurse: Recurse into the directory?
        '''
        self.stack = [directory]
        self.pattern = pattern
        self.regex = regex
        self.recurse = recurse
        self.regex_flags = regex_flags
        self.files = []
        self.index = 0

    def __getitem__(self, index):
        import os
        while 1:
            try:
                file = self.files[self.index]
                self.index = self.index + 1
            except IndexError:
                # pop next directory from stack

                self.directory = self.stack.pop()
                try:
                    self.files = os.listdir(self.directory)
                    print self.files
                    self.index = 0
                except:pass
            else:
                # got a filename
                fullname = os.path.join(self.directory, file)
                if os.path.isdir(fullname) and not os.path.islink(fullname) and self.recurse:
                    self.stack.append(fullname)
                if self.regex:
                    import re
                    if re.search(self.pattern,file,self.regex_flags):
                        return fullname
                else:
                    import fnmatch
                    if fnmatch.fnmatch(file, self.pattern):
                        return fullname

import shutil
search_dir=r'C:\Project'
out_dir=r'C:\Workspace'
for jpg in rglob(search_dir,'*.jpg'):
    print 'Copying: ' + jpg
    shutil.copy(jpg,out_dir)
Reply
0 Kudos
New Contributor
Thank you Marc, I will digest the codes a little bit first myself.  This is some amazing stuff!

That.... is a thing of beauty!!! Thank you so much for sharing that, Luke!

I just want to say that in the interest of giving the poster a smaller bit of code to dissect, one could make a smaller version using a filter and the exact example you were inspired by on the OS example page.

import os

class DirectoryWalker:
      '''Callously stolen (with attribution!) from os-path-walk-example-3.py
      Copyright © 1995-2010 by Fredrik Lundh
      http://effbot.org/librarybook/os-path.htm'''
     
    def __init__(self, directory):
        self.stack = [directory]
        self.files = []
        self.index = 0

    def __getitem__(self, index):
        while 1:
            try:
                file = self.files[self.index]
                self.index = self.index + 1
            except IndexError:
                # pop next directory from stack
                self.directory = self.stack.pop()
                self.files = os.listdir(self.directory)
                self.index = 0
            else:
                # got a filename
                fullname = os.path.join(self.directory, file)
                if os.path.isdir(fullname) and not os.path.islink(fullname):
                    self.stack.append(fullname)
                return fullname

import shutil
search_dir = r'C:\Project'
out_dir = r'C:\Workspace'

# This filter takes all the results from DirectoryWalker and only iterates
# through those that match the condition 'file.endswith('.jpg')
for jpg in filter(lambda x: x.endswith('.jpg'), DirectoryWalker(search_dir)):
    print('Copying: ' + jpg)
    shutil.copy(jpg,out_dir)


Diana: If you need any help understanding either of these examples, feel free to ask.

Cheers,
Marc
Reply
0 Kudos
New Contributor III
Interestingly, they also take about the same time! (I used a folder with a lot of subfolders and a lot of PDF's)

def main1():
    x = []   
    for pdf in filter(lambda x: x.endswith('.pdf'), DirectoryWalker(search_dir)):
        x.append(pdf)

def main2():
    y = []
    for pdf in rglob(search_dir,'*.pdf'):
        y.append(pdf)

from timeit import Timer
t1 = Timer("main1()", "from __main__ import main1")
t2 = Timer("main2()", "from __main__ import main2")

print("DirectoryWalker: " + str(t1.timeit(100)))
print("rglob: "                  + str(t2.timeit(100)))


DirectoryWalker: 155.908681642
rglob:                 162.25917093
Reply
0 Kudos
New Contributor
OK, now I am looking for more help to organize these JPGs.

Now all the JPGs are in one folder (C:\temp), I want to copy them into different folders based on the field "Wall_Num" (numeric) in a feature class "Result", and rename them based on a field called "JPG" in the same feature class, e.g. "Tree.jpg".  The "Result" feature class holds the original name of these photos in a field called "PhotoPath2", e.g. IMG_0070430.jpg.  I also have created a table that holds unique values of the all "Wall_Num" values called tbl_Wall_Numbers.

My plan was to:
1.  Create folders based on the "Wall_Num" in tbl_Wall_Numbers using search cursor (done)
2.  Create Query Table with the name "Wallx", with records "Wall_Num" = x (x being "Wall_Num" value) (seem to have worked, no errors, but I could not tell if it executed successfully...also have tried with create feature layer and selectlayerbyattribute)
3.  Use search cursor to go through Query Table row by row, and if current filename matches "PhotoPath2", rename it using "JPG" (maybe this is where the problem is, I am trying to use a cursor nested inside another cursor...other ways to do this?)
            os.rename(file, file.replace("\"[PhotoPath2]\","\"[JPG]\"))
4.  Copy the JPG into the correct folder, based on its "Wall_Num" value until all JPGs in temp are copied
5.  Repeat 2 - 5 for all "Wall_Num" in tbl_Wall_Numbers.
Another question sort of related to this: how do I get rid of the querytable?  Once I try to run the same process again, it will say querytable "Wallx" already exists.  I have overwriting geoprocessing results turned on.

Thank you for pointing me to the correct direction!
Reply
0 Kudos
New Contributor II
I almost had a seizure after I read the replies to this post. Really impressive stuff. A quick and dirty way to do this, although less modular, might look like this:


search_dir=r'C:\Project'
out_dir=r'C:\Workspace'

import os, shutil

for root, dirs, files in os.walk(search_dir):
    print '____________________________________'
    print 'searching for files in', root
    print ''
    for f in files:
        if f.endswith('.jpg'):
            infile = os.path.join(root, f)
            outfile = os.path.join(out_dir, f)
            print 'copying', infile
            shutil.copy(infile, outfile)
Reply
0 Kudos