Several tests FAIL on Solaris/sparcv9 where long double is 128 bits: Builtins-sparcv9-sunos :: addtf3_test.c Builtins-sparcv9-sunos :: divtf3_test.c Builtins-sparcv9-sunos :: extenddftf2_test.c Builtins-sparcv9-sunos :: extendsftf2_test.c Builtins-sparcv9-sunos :: floatditf_test.c Builtins-sparcv9-sunos :: floatsitf_test.c Builtins-sparcv9-sunos :: floattitf_test.c Builtins-sparcv9-sunos :: floatunditf_test.c Builtins-sparcv9-sunos :: floatunsitf_test.c Builtins-sparcv9-sunos :: floatuntitf_test.c Builtins-sparcv9-sunos :: multf3_test.c Builtins-sparcv9-sunos :: subtf3_test.c E.g. addtf3_test.c FAILs with error in test__addtf3(36.40888825164657541977, 0.96444431369742592240) = 37.37333256534401470898, expected 37.37333256534400134216 The error doesn't happen in a 1-stage build with gcc or in a Debug build. Via side-by-side debugging with addtf3.c.o compiled with clang -O vs. gcc -O (everything else from a regular 2-stage clang build), it turned out that both compilers produce the same result until the very end of __addtf3. The only difference is in the final fromRep call, which can be seen with this testcase: $ cat fr.c typedef long double fp_t; typedef __uint128_t rep_t; fp_t fromRep(rep_t x) { const union { fp_t f; rep_t i; } rep = {.i = x}; return rep.f; } gcc -m64 -O produces fromRep: add %sp, -144, %sp stx %o0, [%sp+2175] stx %o1, [%sp+2183] ldd [%sp+2175], %f0 ldd [%sp+2183], %f2 jmp %o7+8 add %sp, 144, %sp while clang yields fromRep: ! @fromRep ! %bb.0: ! %entry save %sp, -144, %sp add %fp, 2031, %i2 or %i2, 8, %i2 stx %i0, [%fp+2031] ldd [%fp+2031], %f0 ldd [%i2], %f2 stx %i1, [%i2] ret restore The long double return value is supposed to be in %f0 and %f2. gcc handles this just fine, and clang gets it right for %f0, too. However, it stores the contents of an uninitialized stack slot in %f2 and only then stores the second half (%i1) of the arg there. I don't have the slightest idea how to fix this codegen bug, but I have a workaround patch (to be posted for reference shortly) that wraps the affected functions in #pragma clang optimize off/on (nothing more than a hack to show that this fixes all the failures above).
FWIW this is not clang being miscompiled: I've tried all of * stage 1-clang from a Release build with gcc * stage 2-clang from the same build * stage 2-clang from a Debug build and they generate the same wrong code at -O and above.
FWIW a bisect identified BISECT: running pass (64) Machine Instruction Scheduler on function (fromRep) as the culprit (on a minimal testcase, not yet confirmed on the real code).