[DRAFT] parallel rustc shipping strategy

# [DRAFT] parallel rustc shipping strategy The serialness/parallelness of the compiler is dictated by code in a small number of places, such as `compiler/rustc_data_structures/src/sync.rs`, which defines types like `Lock` and operations like `par_iter`. Currently there are two code paths for these places. The code to use is selected at rustc build time via `cfg(parallel_compiler)`. E.g. `Lock` is a wrapper around `RefCell` if `!cfg(parallel_compiler)`, or a wrapper around `parking_lot::Mutex` if `cfg(parallel_compiler)`. The end goal is to reduce these places to a single parallel code path, and for the compiler to be multithreaded. (The number of threads to use is an open question, but any default greater than 1 that gives better performance than the serial compiler would constitute success.) ## Evolution The simplest approach would be to simply remove the serial code paths and ship the parallel compiler with multithreading on by default in a single step. But that isn't reasonable due to the complexity and risk of the changes. We will need to evolve the serial/parallel/multithreading structure over a number of releases: serial (current), then parallel but single-threaded by default (`-Zthreads=1`), and finally parallel and multithreaded by default. This lets us pause or even go backwards if major problems occur. There are two main paths possible. ### Shorter path Here is one possible path. In the following, "\||c" is a compile-time choice between code paths, "\||r" is a runtime choice between code paths, and square brackets indicate defaults. 1. [**serial**] \||c **parallel[`-Zthreads=1`]** This is the current state. We have serial code, and parallel code. The choice is made via `parallel-compiler` in `config.toml`. When using the parallel code, the default number of threads is 1. 1. **serial** \||c [**parallel[`-Zthreads=1`]**] If the single-threaded parallel paths are not too slow, once they are sufficiently reliable we can switch to them by default. Importantly, at this point we are shipping a parallel compiler! This means users can try out a multithreaded parallel compiler with `-Zthreads`, which will give us useful data about performance and reliability. 1. **serial** ||c **parallel[`-Zthreads=2+`]** Once performance and reliability are good enough, multi-threading can be made the default. 1. **parallel[`-Zthreads=2+`]** Finally, the **serial** code paths can be removed. Steps 3 and 4 could be switched, or even combined. ### Longer path If the single-threaded parallel paths are too slow, a longer path will be required, taking more work. This is the path that @SparrowLii is currently pursuing. 1. [**serial**] \||c **parallel[`-Zthreads=1`]** This is the current state, as above. 1. [**serial**] \||c (**parallel-single-threaded** \||r **parallel-multithreaded**) This step introduces a new, temporary set of code paths on the parallel side of the build-time choice. The synchronization forms, etc., used when `cfg(parallel_compiler)` is true are now chosen at runtime, depending on the value of `-Zthreads`. (XXX: requires #109776, DynSend). For example, the parallel-single-threaded paths would still use `RefCell` for `Lock`, which would make their speed closer to that of the serial code. The runtime selectability does have a non-zero performance cost, but hopefully it will be small. 1. **serial** \||c ([**parallel-single-threaded**] \||r **parallel-multithreaded**) Once the parallel-single-threaded code paths are reliable enough, we can switch the default to them. Importantly, at this point we are shipping a parallel compiler! 1. **serial** \||c (**parallel-single-threaded** \||r [**parallel-multithreaded**]) Once performance is good enough, the default number of threads can increase to 2 or more, switching to the parallel-multithreaded code paths with full-strength synchronization. 1. **parallel[`-Zthreads=2+`]** Finally, the serial and parallel-single-threaded code paths can be removed, leaving only have parallel code paths and a default number of threads of more than 1. (Users can still choose to use `-Zthreads=1`, but they will get full-strength synchronization.) Removing the runtime selectability will slightly speed up the multithreaded case. Similar to the shorter path, some later steps can be reordered or combined.