Description
A common use pattern for zipfile is to recursively compress an entire directory into a single .zip file. A common implementation of this pattern looks like this:
#!/usr/bin/env python3
import zipfile, pathlib
rootpath=pathlib.Path('targetdir')
with zipfile.ZipFile('outputfile', 'w') as archive:
for file_path in sorted(rootpath.rglob('*')):
arcname=file_path.relative_to(rootpath)
archive.write(file_path, arcname.as_posix())
However, if outputfile is a path that is a child of targetdir, this results in the operation hanging once the rglob
operation eventually causes the archive to attempt to write outfile
into outfile
, causing the write operation to continue indefinitely until the filesystem runs out of space or the archive hits its max file size, like can be observed in this example:
#!/usr/bin/env python3
import zipfile, pathlib
rootpath=pathlib.Path('./')
with zipfile.ZipFile('./foo.zip', 'w') as archive:
for file_path in sorted(rootpath.rglob('*')):
arcname=file_path.relative_to(rootpath)
archive.write(file_path, arcname.as_posix())
Needless to say, this is hardly an intuitive error path, and can cause difficulties with debugging.
Note that it is not simply third party libraries that allow this error to happen. Neither zipapp nor shutil.make_archive include a check making sure the output file is not a child of the target dir.
There are two ways I think this could be fixed:
- make zipfile.write simply check that self.file and filename are not equal, raising a ValueError if they are
- patch all users of zipfile to silently skip over outputfile when they are compressing.
I think the first, at the least, should be implemented, in order to provide an actual error message in this situation, instead of hanging for however long it takes for someone to notice a multi-gb zip file growing by the second. Ill be submitting a PR doing so shortly.
Linked PRs
Metadata
Metadata
Assignees
Projects
Status