Skip to content

LLVM mis-optimize due to returntwice function #17288

@maleadt

Description

@maleadt

One of my packages (CUDAdrv) has recently started failing on julia master, with a segfault in typemap.c. I've bisected this issue to e2bd129 (all backtraces and line numbers below are on that commit's tree). I'm not sure where to start debugging this, so I'm at least reporting it here already.

This causes a segfault in jl_typemap_level_assoc_exact:

signal (11): Segmentation fault
while loading CUDAdrv/test/core.jl, in expression starting on line 172
sig_match_fast at src/gf.c:1707
jl_apply_generic at src/gf.c:1886
Type at CUDAdrv/src/module.jl:67
unknown function (ip: 0x7f6198ad561e)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
do_call at src/interpreter.c:65
eval at src/interpreter.c:188
eval_body at src/interpreter.c:469
eval_body at src/interpreter.c:515
jl_interpret_call at src/interpreter.c:573
jl_interpret_toplevel_thunk at src/interpreter.c:580
jl_toplevel_eval_flex at src/toplevel.c:543
jl_parse_eval_all at src/ast.c:700
jl_load at src/toplevel.c:566
jl_load_ at src/toplevel.c:575
include_from_node1 at ./loading.jl:426
unknown function (ip: 0x7f639ef3716c)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
do_call at src/interpreter.c:65
eval at src/interpreter.c:188
jl_interpret_toplevel_expr at src/interpreter.c:31
jl_toplevel_eval_flex at src/toplevel.c:529
jl_parse_eval_all at src/ast.c:700
jl_load at src/toplevel.c:566
jl_load_ at src/toplevel.c:575
include_from_node1 at ./loading.jl:426
unknown function (ip: 0x7f639ef3716c)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
process_options at ./client.jl:266
_start at ./client.jl:322
unknown function (ip: 0x7f639ef75124)
jl_call_method_internal at src/julia_internal.h:92
jl_apply_generic at src/gf.c:1931
jl_apply at ui/../src/julia.h:1396
true_main at ui/repl.c:546
main at ui/repl.c:674
unknown function (ip: 0x7f63a5013740)
unknown function (ip: 0x401818)
Allocations: 1748036 (Pool: 1746766; Other: 1270); GC: 4
Allocations: 1748036 (Pool: 1746766; Other: 1270); GC: 4

Running in GDB makes it segfault somewhere else, but I assume due to the same problem (jl_typeof(NULL)):

Thread 1 "julia" received signal SIGSEGV, Segmentation fault.
0x00007ffff76c9407 in jl_typemap_level_assoc_exact (cache=0x7ffdf1802950, args=0x7fffffffae90, n=3, offs=1 '\001') at src/typemap.c:788
788         jl_value_t *ty = (jl_value_t*)jl_typeof(a1);


(gdb) l
783 
784 jl_typemap_entry_t *jl_typemap_level_assoc_exact(jl_typemap_level_t *cache, jl_value_t **args, size_t n, int8_t offs)
785 {
786     if (n > offs) {
787         jl_value_t *a1 = args[offs];
788         jl_value_t *ty = (jl_value_t*)jl_typeof(a1);
789         assert(jl_is_datatype(ty));
790         if (ty == (jl_value_t*)jl_datatype_type && cache->targ != (void*)jl_nothing) {
791             union jl_typemap_t ml_or_cache = mtcache_hash_lookup(cache->targ, a1, 1, offs);
792             jl_typemap_entry_t *ml = jl_typemap_assoc_exact(ml_or_cache, args, n, offs+1);

(gdb) call jl_(args[0])
Base.#==()
(gdb) p args[1]
$3 = (jl_value_t *) 0x0
(gdb) call jl_(args[2])
CUDAdrv.CuError(code=209, info=Base.Nullable{String}(isnull=true, value=#<null>))

... with this comparison (against 209 == CUDAdrv.ERROR_NO_BINARY_FOR_GPU) originating from:

try
    @apicall(:cuModuleLoadDataEx,
            (Ptr{CuModule_t}, Ptr{Cchar}, Cuint, Ref{CUjit_option}, Ref{Ptr{Void}}),
            handle_ref, data, length(optionKeys), optionKeys, optionValues)
catch err
    (err == ERROR_NO_BINARY_FOR_GPU || err == ERROR_INVALID_IMAGE) || rethrow(err)
    options = decode(optionKeys, optionValues)
    rethrow(CuError(err.code, options[ERROR_LOG_BUFFER]))
end

I've not been able to reduce the test case, as reduced versions did not reliably trigger the segfault on all my systems anymore, while the full CUDAdrv test suite does. I've tested on two Linux64 systems (one Debian 8, one Arch), with fresh builds without any Makefile flags.

@yuyichao any ideas what might be causing this, or where to look for clues?

Metadata

Metadata

Assignees

Labels

bugIndicates an unexpected problem or unintended behaviorcompiler:codegenGeneration of LLVM IR and native codecompiler:llvmFor issues that relate to LLVMcorrectness bug ⚠Bugs that are likely to lead to incorrect results in user code without throwingupstreamThe issue is with an upstream dependency, e.g. LLVM

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions