0

I have a folder that has hundreds of files in it. I have a list of files that I know need to be deleted, so I am trying to write code to figure out: which files in this list are in this folder, and which are not.

I am using the os module, and I know how to walk through all of the files in my folder using os.walk, but what I don't know is how to specify if the file is in my files_list.

So I want to check if the file name in my files_list is in "folder" and if it is, then append it to bad_list, and if it isn't then append it to good_list. This is what I have so far:

for root, dirs, files in os.walk(my_path):
    for file in files:
        if file in folder:
            badlist.append(file)
        else:
            good_list.append(file)

My question is, how do I put in the "is in files_list" part of this? I assume it should go after the if file in folder part, to say something like "is in files_list" but I can't figure out exactly how to write that in the code.

I am new to Python, so apologies if this is very simple.

hapigolucki
  • 47
  • 1
  • 7
  • 1
    would it be possible to give a snip of your file structure? Also, I'm not sure what `folder` means in your code. You don't post where you declared it... – Reedinationer Feb 15 '19 at 21:10
  • Hi, sorry, I can't give a snip of that at the moment, but I actually think I just figured out my own problem: where it says folder, I think that should be files_list, if that makes sense to you. Sorry for wasting your time with this! – hapigolucki Feb 15 '19 at 21:12
  • What does `files_list` contain? Do you have full, absolute paths, relative paths (relative to what directory if so), or only the base filenames? – Martijn Pieters Feb 15 '19 at 21:14
  • `files_list` contains the base filename, so 'xyz.csv' and so forth. Then the folder I am pointing to has the same thing. So essentially I just want to match if a filename in the `files_list` is also in that folder. – hapigolucki Feb 15 '19 at 21:24
  • Then all you need to do is check if `file in files_list` is true. You may want to make `files_list` a *set* to make that test a lot faster. – Martijn Pieters Feb 15 '19 at 21:28
  • @MartijnPieters that makes sense. That is what I was missing, would you recommend doing `if file not in _files_list`? – hapigolucki Feb 15 '19 at 21:38
  • @hapigolucki: you are sorting files into good and bad lists, it doesn't matter if you use `if file in files_list` or `if file not in files_list`, you are only swapping the two blocks for the two cases. – Martijn Pieters Feb 15 '19 at 21:40

2 Answers2

1

Why not just try to delete the files you know that are under the folder and ignore any error

import os

files_to_remove = ['a.txt', 'b.txt']
folder_name = '/the_files_folder'
for file_to_remove in files_to_remove:
    try:
        os.remove(os.path.join(folder_name, file_to_remove))
    except OSError:
       pass
balderman
  • 22,927
  • 7
  • 34
  • 52
1

Use a set for membership testing.

Assuming folder is a list of file names with extensions (e.g.'foo.txt'), make folder a set then use set methods to differentiate the files. Use os.path.join if you want to store complete paths in the good and bad lists.

folder = set(folder)
for root, dirs, files in os.walk(my_path):
    files = set(files)
    #badlist.extend(files.intersection(folder))
    for fname in files.intersection(folder):
        badlist.append(os.path.join(root, fname))
    #goodlist.extend(files.difference(folder))
    for fname in files.difference(folder):
        goodlist.append(os.path.join(root, fname))
wwii
  • 23,232
  • 7
  • 37
  • 77