-
Notifications
You must be signed in to change notification settings - Fork 18.3k
Description
Hello,
I'm from the Go Platform team at Uber, and we've been running into what appears to be a linker bug in macOS/M1 while trying to upgrade to Go 1.20.
What version of Go are you using (go version
)?
This repros on all of 1.20 minor point releases 1.20.2.
$ go version 1.20.2
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
The issue only manifests in M1 macs.
We are in a Bazel sandbox environment, using rules_go.
go env
Output
$ go env ❯ go env GO111MODULE="off" GOARCH="arm64" GOBIN="/Users/sungyoon/go/bin/" GOCACHE="/Users/sungyoon/Library/Caches/go-build" GOENV="/Users/sungyoon/Library/Application Support/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="arm64" GOHOSTOS="darwin" GOINSECURE="" GOMODCACHE="/Users/sungyoon/go-code/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/sungyoon/go-code" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/private/var/tmp/_bazel_sungyoon/ac08da491286b6bc27a8f65f3e5696d3/external/go_sdk" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/private/var/tmp/_bazel_sungyoon/ac08da491286b6bc27a8f65f3e5696d3/external/go_sdk/pkg/tool/darwin_arm64" GOVCS="" GOVERSION="go1.20.2" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" GOWORK="" CGO_CFLAGS="-O2 -g" CGO_CPPFLAGS="" CGO_CXXFLAGS="-O2 -g" CGO_FFLAGS="-O2 -g" CGO_LDFLAGS="-O2 -g" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/zl/5_l54cc5231fssrrq3thj6z40000gn/T/go-build2387064609=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
We received reports of some tests in our Go Monorepo that are only failing in M1 after upgrading to Go 1.20.
The panic trace depends on the failing targets, but all of them panic in some form during init.
Invalid return address:
runtime: g 15: unexpected return pc for runtime.callers called from 0x0
stack: frame={sp:0x14001dd3e80, fp:0x14001dd3ef0} stack=[0x14001dd2000,0x14001dd4000)
0x0000014001dd3d80: 0x00000140004216f0 0x0000000104d6f903 <testing.tRunner+0x0000000000000033>
0x0000014001dd3d90: 0x00000001162e2560 0x00000001121c655b
0x0000014001dd3da0: 0x000000000000000f 0x000000011485543f
0x0000014001dd3db0: 0x000000000000001d 0x000000000000059a
0x0000014001dd3dc0: 0x0000000000000599 0x0000000104d6f8d0 <testing.tRunner+0x0000000000000000>
0x0000014001dd3dd0: 0x00000001162e2560 0x00000001183881a0
0x0000014001dd3de0: 0x0000000104d6f903 <testing.tRunner+0x0000000000000033> 0x00000001162e2560
0x0000014001dd3df0: 0x00000001121c655b 0x000000000000000f
0x0000014001dd3e00: 0x0000000000000000 0x000000010d318ef5
0x0000014001dd3e10: 0x000000000000002a 0x000000010d2d9395
0x0000014001dd3e20: 0x0000000000000026 0x000000010d44460a
0x0000014001dd3e30: 0x0000000000000036 0x000000010d22f3ca
0x0000014001dd3e40: 0x000000000000001b 0x000000010d3d07d4
0x0000014001dd3e50: 0x0000000000000032 0x000000010d2d936f
0x0000014001dd3e60: 0x0000000000000026 0x000000010d2906eb
0x0000014001dd3e70: 0x0000000000000021 0x0000000000000000
runtime.callers(0x0?, {0x0?, 0x0?, 0x0?})
GOROOT/src/runtime/traceback.go:843 +0x78 fp=0x14001dd3ef0 sp=0x14001dd3e80 pc=0x104c09988
created by testing.(*T).Run
GOROOT/src/testing/testing.go:1629 +0x36c
Panics from callee:
case 1:
goroutine 1 [running]:
reflect.Value.lenNonSlice({0x10c2a5b30?, 0x140091352c0?, 0x140091c3080?})
GOROOT/src/reflect/value.go:1714 +0x160
github.com/go-playground/validator/v10.init.0()
external/com_github_go_playground_validator_v10/postcode_regexes.go:170 +0x38
case 2:
panic: reflect: call of reflect.Value.Uint on kind31 Value
goroutine 1 [running]:
reflect.Value.Uint({0x0?, 0x0?, 0x111bd4680?})
GOROOT/src/reflect/value.go:2666 +0xfc
github.com/shopspring/decimal.newFromFloat(0x3de5d8fd1fd19ccd, 0x3de5d8fd1fd19ccd, 0x1160395a0)
external/com_github_shopspring_decimal/decimal.go:302 +0x154
github.com/shopspring/decimal.NewFromFloat(0x3de5d8fd1fd19ccd)
external/com_github_shopspring_decimal/decimal.go:262 +0x7c
github.com/shopspring/decimal.init()
external/com_github_shopspring_decimal/decimal.go:1735 +0x23c
Looking through the disassembly, we're seeing calls to runtime.duffzero
getting linked with some arbitrary functions in the problematic targets. If the linked callee panics from invalid args, then it causes this panic to occur. Sometimes the panic happens because the linked target expects a different stack size from one caller set up, and panics from invalid return pc.
Below is part of the disassembled init func of one of monorepo dependencies: (github.com/shopspring/decimal).
0x5ba0e a93eeffd STP (R29, R27), -24(RSP)
0x5ba12 d10063fd SUB $24, RSP, R29
0x5ba16 94000000 CALL 0(PC) [0:4]R_CALLARM64:runtime.duffzero<1>+52
0x5ba1a d10023fd SUB $8, RSP, R29
0x5ba1e f901dfff MOVD ZR, 952(RSP)
0x5ba22 f94023e1 MOVD 64(RSP), R1
0x5ba26 910223e0 ADD $136, RSP, R0
0x5ba2a 94000000 CALL 0(PC) [0:4]R_CALLARM64:github.com/shopspring/decimal.(*decimal).Assign
0x5ba2e f94247e2 MOVD 1160(RSP), R2
This is what we see in the intermediate archive file generated for compile
, before it's linked.
But in the final binary, we're seeing the linker somehow linked the call to runtime.duffzero
with reflect.Value
in the same init function:
0x107c89d38 a93eeffd STP (R29, R27), -24(RSP)
0x107c89d3c d10063fd SUB $24, RSP, R29
0x107c89d40 97fded21 CALL reflect.Value.Float.island(SB) // ???
0x107c89d44 d10023fd SUB $8, RSP, R29
0x107c89d48 f901dfff MOVD ZR, 952(RSP)
0x107c89d4c f94023e1 MOVD 64(RSP), R1
0x107c89d50 910223e0 ADD $136, RSP, R0
0x107c89d54 9400022f CALL github.com/shopspring/decimal.(*decimal).Assign(SB)
0x107c89d58 f94247e2 MOVD 1160(RSP), R2
Similar issue happens with runtime.duffcopy
in another target:
Pre-linking:
0xe6364 90000014 ADRP 0(PC), R20 [0:8]R_ADDRARM64:gopkg.in/yaml%2ev3..stmp_41<1>
0xe6368 91000294 ADD $0, R20, R20
0xe636c 1000009b ADR 16(PC), R27
0xe6370 a93eeffd STP (R29, R27), -24(RSP)
0xe6374 d10063fd SUB $24, RSP, R29
0xe6378 94000000 CALL 0(PC) [0:4]R_CALLARM64:runtime.duffcopy<1>+288
0xe637c d10023fd SUB $8, RSP, R29
0xe6380 910c63e0 ADD $792, RSP, R0
0xe6384 f900bfe0 MOVD R0, 376(RSP)
0xe6388 3980001b MOVB (R0), R27
0xe638c 90000000 ADRP 0(PC), R0 [0:8]R_ADDRARM64:type:bool
Post-linking:
0x107cb2884 d004f174 ADRP 165863424(PC), R20
0x107cb2888 91304294 ADD $3088, R20, R20
0x107cb288c 1000009b ADR 16(PC), R27
0x107cb2890 a93eeffd STP (R29, R27), -24(RSP)
0x107cb2894 d10063fd SUB $24, RSP, R29
0x107cb2898 97fd49a8 CALL reflect.New.island(SB) // ???
0x107cb289c d10023fd SUB $8, RSP, R29
0x107cb28a0 910c63e0 ADD $792, RSP, R0
0x107cb28a4 f900bfe0 MOVD R0, 376(RSP)
0x107cb28a8 3980001b MOVB (R0), R27
0x107cb28ac 90043da0 ADRP 142295040(PC), R0
This issue does not occur with every binary that uses these dependencies, but only some of them. Another point worth noting is that when we change the binary layout by turning inline optimization off or all optimizations off with -N
/-l
gcflags, the issues go away, but it starts happening on some other targets that were passing with the optimizations.
This issue does not occur on any other environments we have (Linux amd64 or darwin amd64).
What did you expect to see?
Linker correctly links correct binaries.
What did you see instead?
Panics as described above.