Description
Bug report
This bug happens in Objects/stringlib/fastsearch.h:589 during matching the last symbol. In some cases, it causes crashes, but it's a bit hard to reproduce since in order this to happen, the last symbol should be the last in this particular memory page and the next page should not be read accessible or have a different non-contiguous address with the previous one.
The simplest script that reproduces the bug for me is:
import mmap
def bug():
with open("file.tmp", "wb") as f:
# this is the smallest size that triggers bug for me
f.write(bytes(8388608))
with open("file.tmp", "rb") as f:
with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as fm:
with open("/proc/self/maps", "rt") as f:
print(f.read())
# this triggers bug
res = fm.find(b"fo")
if __name__ == "__main__":
bug()
But since the result of this script depends on a file system, kernel, and perhaps even a moon phase 😄 , here's a much more reliable way to reproduce it:
import mmap
def read_maps():
with open("/proc/self/maps", "rt") as f:
return f.read()
def bug():
prev_map = frozenset(read_maps().split('\n'))
new_map = None
for i in range(0, 2049):
# guard mmap
with mmap.mmap(0, 4096 * (i + 1), flags=mmap.MAP_PRIVATE | mmap.MAP_ANONYMOUS, prot=0) as guard:
with mmap.mmap(0, 8388608 + 4096 * i, flags=mmap.MAP_ANONYMOUS | mmap.MAP_PRIVATE, prot=mmap.PROT_READ) as fm:
new_map = frozenset(read_maps().split('\n'))
for diff in new_map.difference(prev_map):
print(diff)
prev_map = new_map
# this crashes
fm.find(b"fo")
print("---")
if __name__ == "__main__":
bug()
This causes the bug across all Linux environments that I've tried. It uses a trick with inaccessible memory region to increase the chances of this bug happening and no files, to speed it up.
Here's some extra info from GDB:
Program received signal SIGSEGV, Segmentation fault.
0x000055555570ba81 in stringlib_default_find (s=0x7ffff6a00000 "", n=8388608, p=0x7ffff745a3e0 "fo", m=2, maxcount=-1, mode=1)
at Objects/stringlib/fastsearch.h:589
589 if (!STRINGLIB_BLOOM(mask, ss[i+1])) {
(gdb) pipe info proc mappings | grep -A 1 -B 1 file.tmp
0x555555cb4000 0x555555d66000 0xb2000 0x0 rw-p [heap]
0x7ffff6a00000 0x7ffff7200000 0x800000 0x0 r--s /home/slava/src/cpython/python_bug/file.tmp
0x7ffff7400000 0x7ffff7600000 0x200000 0x0 rw-p
(gdb) p &ss[i]
$1 = 0x7ffff71fffff ""
(gdb) p &ss[i + 1]
$2 = 0x7ffff7200000 <error: Cannot access memory at address 0x7ffff7200000>
(gdb) p i
$3 = 8388606
(gdb) p ss
$4 = 0x7ffff6a00001 ""
(gdb) p s
$5 = 0x7ffff6a00000 ""
Your environment
- CPython 3.11.3
- OS: Linux 6.1 (but it should be OS independent)
I've also tried a bit modified version of a script on OS X, and it crashes there as well.
cc @sweeneyde (since you are the author of d01dceb and 6ddb09f).