Zip files are great for compressing data and are used in many different circumstances. If you’re building a Python script and need to create or decompress ZIP files, you’re in luck.
In this tutorial, you’re going to learn how to zip and unzip (decompress) zip files, all with Python!
Let’s get started!
Table of Contents
If you’d like to follow along with this tutorial, be sure you have the following:
Creating a Zip File with Python
Let’s get started and first focus on zipping files with Python. This tutorial will begin with the simplest possible method and will build upon various techniques.
To zip a single file:
1. Open your favorite text editor.
2. If you’d like to follow along exactly with the tutorial, create a directory at ~/pythonzipdemo and download these BMP files into it. Once you do, you should four bitmap files; all_black.bmp, all_blue.bmp, all_green.bmp, and all_red.bmp. You’ll also have a demo.py Python script that contains all of the code in this tutorial.
The tutorial files are bitmap files, which greatly benefit from compression. Not all types of files will compress as well.
3. Create a new Python script called demo.py. This tutorial will be saving the demo script in the home directory or ~/demo.py.
4. In your Python script, start coding. First, import the
zipfile module. The
zipfile module is a built-in Python module that contains all of the functions you’ll need to zip and unzip files with Python.
import zipfile # imports the zipfile module so you can make use of it
5. Next, instantiate a
zipfile object using the
ZipFile() method, open the zip file in write mode (
w), add a file to it, and then close the instance (
## Instantiate a new zipfile object creating the single_zip.zip archive. zf = zipfile.ZipFile('single_file.zip', mode='w') ## Add a file to the archive zf.write('all_black.bmp') ## Close the archive releasing it from memory zf.close()
This tutorial relies on relative paths. The relative path looks for the files in the current directory, where Python is currently running. You can specify absolute paths instead when specifying file names.
Once you run the script, you should now see a file called single_file.zip in the folder. Congrats! You’ve zipped up your first file! But wait. Notice that it’s not any smaller. By default, the zip file is not compressed. To compress the files inside, you must perform one more step.
zipfileis a context manager, you can also use the
with ZipFile('single_file.zip', 'w') as zipfile:
6. Finally, add a
compression parameter when instantiating the
zipfile object. The
zipfile module uses various compression attributes such as ZIP_STORED, ZIP_DEFLATED, ZIP_BZIP2, and ZIP_LZMA that dictate how Python compresses the files inside. By default, that attribute is set to
ZIP_STORED, meaning no compression.
To compress the files inside the archive, you must specify the
ZIP_DEFLATED attribute, as shown below.
import zipfile ## Instantiate a new zipfile object creating the single_zip.zip archive and compressing it zf = zipfile.ZipFile('single_file.zip', mode='w', compression=zipfile.ZIP_DEFLATED) ## Add a file to the archive zf.write('all_black.bmp') ## Close the archive releasing it from memory zf.close()
When you run this script, you should now see that the single_file.zip is much smaller than the actual bitmap file.
Applying Filters When Zipping Files
A lot about zipping files has been mentioned already, but only by referencing the files manually. When you have to deal with a large number of files, specifying each file manually is far from ideal.
You can filter the files you want to operate on based on filters. A filter can be based on the file size or the file name, for example.
There are no built-in ways to filter files being added to a zip file. You’ll need to build your own solution. One way to build your own solution is to create a Python function. A Python function is a set of code you can execute as one. Building a function is a great way to “encapsulate” code into one single unit.
The function to build for this task will consist of roughly four separate phases:
- Opening the zip file in write mode.
- Read all files in the source folder you’d like to add files to the zip file from.
- Iterate over each file and check to see if it matches a particular string.
- If the file matches the string, add it to the zip file.
You can see this function below. This function is more complex than what you may be used to but it shows how flexible Python is, allowing you to create whatever you need.
lambdain Python is used to create small anonymous functions. Strings are not callable, and to make
nameinto a function.
# imports the zipfile module so you can make use of it import zipfile # for manipulating directories import os # for deleting any prefix up to the last slash character of a pathname file from a given directory that matches the filter from os.path import basename # zip the files from given directory that matches the filter def filter_and_zip_files (destination_file, source_foulder, filter): with zipfile.ZipFile(destination_file, mode="w", compression=zipfile.ZIP_DEFLATED) as new_zip: # Iterate over all the files in the folder test = os.walk(source_foulder) for folder, subfolders, filenames in os.walk(source_foulder): for filename in filenames: if filter(filename): # create complete filepath of file in directory file_path = os.path.join(folder, filename) # Add file to zip new_zip.write(file_path, basename(file_path)) # zipping only colors that start with b to only_b_colors by using_filters filter_and_zip_files("only_b_colors.zip", ".", lambda name : name.startswith("all_b")) list_zip_file_contents("only_b_colors.zip")
Listing the Contents of a Zip File
Once you have created a zip file, you can also use Python to read the files inside. To do that, use the
namelist() method. The
namelist() method is a handy way to query all files in the zip file returned as an array of file names.
As shown below, building off the previous example, invoke the
namelist() method on the
import zipfile ## Open the zip file for reading zip_read = zipfile.ZipFile('single_file.zip', mode='r') ## Inspect the contents of single_file.zip zip_read.namelist() ## Close the archive releasing it from memory zip_read.close()
You’ll see that the
namelist() returns an array of each file in the zip file. At this time, you’ll only see one file (all_black.bmp) in the zip file.
If you’d like a more human-readable output of file names, use a Python for loop to read each file name and place a tab in the middle of each one, as shown below.
import zipfile ## Open the zip file for reading zip_read = zipfile.ZipFile('single_file.zip', mode='r') ## Inspect the contents of single_file.zip files = zip_read.namelist() for file_name in files: # prints the name of files within, with a tab for better readability print("\t", file_name) ## Close the archive releasing it from memory zip_read.close()
Adding Files to an Existing Zip File
So you’ve already got a zip file and would like to add some files to it. No problem! You’ll simply need to change the mode that you open the zip file in.
To add a file to an existing zip file, open the zip file in append mode (
mode='a'), then invoke the
write() method passing the file to add to the zip file, as shown below.
import zipfile ## Open the zip file for appending zip_file = zipfile.ZipFile('single_file.zip', mode='a') ## Optionally see the files in the zip before the addition zip_read.namelist() ## Add the file to the zip zip_file.write('all_blue.bmp') ## Optionally see the files in the zip before after the addition zip_read.namelist() ## Close the archive releasing it from memory zip_read.close()
Extracting the Contents of Zip Files with the
Let’s move on to extracting files from existing zip files. Using either the
extractall() methods, you can make it happen.
The example below is opening the zip for reading (
mode='r') then extracting a single file (
all_blue.bmp) from the zip to your home folder. It’s also demonstrating using the
extractall() method to extract all files in the zip to your home folder.
import zipfile ## Open the zip file for reading zip_file = zipfile.ZipFile('single_file.zip', mode='r') ## Extract a single file from the zip file to your home directory zip_file.extract('all_blue.bmp', path='~') ## Extract all files from the zip file to your home directory zip_file.extractall('path='~') ## Close the archive releasing it from memory zip_read.close()
Extracting the Contents of Zip Files with the
As an alternative to using the
zipfile module, you can also use another built-in Python module called
shutil is a module that’s not specifically focused on zip files but more general file management. It just so happens to have some handy zip file methods to use.
To extract the contents of zip files with the
shutil module, provide the zip file and path to extract all files to. Unlike the
extract() method on the
zipfile module, the
unpack_archive() method of
shutil does not allow you to pick files from the archive to extract.
# imports the shutil module import shutil ## Extract all files in the single_zip.zip file to your home folder. shutil.unpack_archive('single_file.zip', '~')
Applying Filters When Unzipping Files
As covered in the zip filtering section, let’s now apply the same methodology (creating a function) to unzip files matching a specific filter. You can see the function example below. This function reads a zip file, reads all files inside the zip file, iterates over each file, and if that file matches the defined filter, it passes it to the
import zipfile # unzip only files withtin the zip file that match the filter def unzip_filtered_files (source_file, destination_folder, filter): with zipfile.ZipFile(source_file, "r") as source: list_of_file_names = source.namelist() # you'll get an iterable object for file_name in list_of_file_names: # checks if the file matches the filter if filter(file_name): # Extract the files to destination_folder source.extract(file_name,path=destination_folder) unzip_filtered_files("multiple_files.zip", "not_b_colors", lambda name : not name.startswith("all_b"))
You should now have a good understanding of working with zip files in Python. Using both the
shutil module, along with even building some Python functions, you should now know how to manage whatever zip file task is thrown at you.
How will you integrate this newfound knowledge into your next Python project?
More from Adam The Automator & Friends