Last Update: 27.01.2008. By kerim in python
Since I started on a proposal for a new customer it really has been one of the most annoying tasks to keep up with his documents. Every week there are around 200 MB to 1 GB of files and although the customer may think that his hirarchy of directories is fine, i think differently. So I have several hundred files of “potential” interest hidden in over 8 GB of nested directories by now. All i know is that they and either with .txt, .doc or *.pdf
The same problem can be found when you download packed archives in which specific files with specific endings are packed. Normally you extract the whole archive and then copy the files of interest manually into some other directory. Afterwards you delete the extracted archive again. Thats easy enough when you deal with up to ten archives at a time and no sub-sub-sub-sub directories. here a small example:
c:\\temp c:\\temp\\testdelivery\\ c:\\temp\\testdelivery\\test1.doc c:\\temp\\estdelivery\\test2.doc c:\\temp\\testdelivery\\read.me c:\\temp\\productiondelivery\\ c:\\temp\\productiondelivery\\prod1.doc c:\\temp\\productiondelivery\\results.doc c:\\temp\\productiondelivery\\interview c:\\temp\\productiondelivery\\interview\\interview.doc
After a (very) short look at copy and xcopy i came to understand that normal means don’t work. But what do we have python for ?
So i made a small python script that does the work i normally do by hand. All you need is to extract all archives in one “top” directory. The rest will be done through the following code. It can either copy or move the files. Have a look at the copied and pasted “bsd” license … its longer than the code itself. That’s what i like about python :-))
#!/usr/bin/python
# -*- coding: iso-8859-1 -*-
import shutil,sys,os, fnmatch
"""
License:
Copyright (c) 2008 Kerim Mansour
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice,
this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of the author nor the names of other contributors
may be used to endorse or promote products derived from this software without
specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
def copy_or_move(src,dst,copy=True):
if copy:
shutil.copy(src,dst)
else:
shutil.move(src,dst)
"""
creates a new subdir in startdir, walks through all subdirectories in startDir
except for the created one, checks for files according to a pattern and
either copies or moves them to outdir.
"""
def consolidate_files(startDir, outDir='outdir', filenamepattern=None, mode='copy', cleanup=False):
outDir_path = os.path.join(startDir, outDir)
if not os.path.exists(outDir_path):
os.makedirs(outDir_path)
for dir, subdirs, files in os.walk(startDir):
#print dir
if dir==outDir_path:
continue
for file in files:
if filenamepattern!=None:
if fnmatch.fnmatch(file,filenamepattern):
copy_or_move(os.path.join(dir,file), outDir_path, mode=='copy')
else:
copy_or_move(os.path.join(dir,file), outDir_path, mode=='copy')
if __name__=="__main__":
startDir = "C:\\temp\\"
consolidate_files(startDir, filenamepattern="*.txt", mode='copy')