Description
Proposal
cc @rust-lang/project-portable-simd
Summary
- Set an expectation that
simd_*
intrinsics fromextern "platform-intrinsic"
will form the basis of backend-agnostic intrinsics within the standard library. - Fill in any missing gaps in existing
simd_*
intrinsics so thatcore::simd
can avoid any direct LLVM dependency. - Provide non-vectorized reference implementations for all
simd_*
intrinsics that are exposed throughcompiler_builtins
.
Implementation of this MCP will be coordinated between the Portable SIMD project and Compiler team for Cranelift and LLVM implementations.
Motivations
The user-facing goal of the core::simd
API being developed by the @rust-lang/project-portable-simd group is to let users take advantage of SIMD intrinsics in a way that's automatically portable across hardware. Realizing that goal requires special backend support so that core::simd
can utilize existing intrinsics rather than having to target ISAs specifically. Since rustc
has multiple compiler backends (LLVM and Cranelift) the core::simd
implementation also needs to also be portable across compiler backends in order to be portable across hardware in the way we want.
We currently have a best-effort approach to backend-agnostic SIMD intrinsics through the extern "platform-intrinsic"
mechanism. There are a number of existing platform intrinsics like simd_add
and simd_mul
that are used throughout core::arch
, but also many ISA idiosyncrasies that aren't reasonable to create platform intrinsics for so it also uses a mix of LLVM intrinsics and inline assembly. The story for core::simd
is a bit different. Its goal is specifically to expose standard lowest-common-denominator functionality that's consistent across all supported ISAs. That makes core::simd
a great fit for platform intrinsics, and core::simd
would like to use them exclusively over LLVM intrinsics.
We can think of core::simd
almost like a thin user-friendly wrapper over platform intrinsics. Since it's going to rely on these intrinsics and make promises about their behavior across platforms we should write reference implementations of them all in plain Rust. These reference implementations will both document the conformant behavior of these intrinsics and give compiler backends a way to support core::simd
without necessarily having to generate vectorized code for all platforms.
Implementation
The overall picture of how Rust's portable SIMD story will fit together covers a few areas of the compiler and libraries:
- Libraries like
core::simd
call platform intrinsics throughextern "platform-intrinsic"
. These are generic functions that are backend agnostic. They work with types that are#[repr(simd)]
. - Compiler backends like LLVM or Cranelift receive these generic platform intrinsics and can either emit appropriate ISA-specific vector instructions, or they can emit a function call to a generic fallback implementation.
- The fallback implementations that backends can call are monomorphic intrinsics from
compiler_builtins
. They're named so that backends can easily name them given the platform intrinsic and shape of the inputs. Instead of working with some generic#[repr(simd)] struct T
, they work with[T; N]
s. - While the fallback implementations are monomorphic so they can live in
compiler_builtins
, they can be backed by a single generic implementation to make them easier to write.
To get a better idea of how this is expected to hang together, let's consider a working example of a conceptual core::simd
and Rust compiler.
This example implements addition for i64x4
, a vector type with 4 i64
lanes:
#![feature(core_intrinsics, min_specialization, min_const_generics)]
#![allow(non_camel_case_types)]
#[test]
fn i64x4_add() {
use crate::simd::i64x4;
let a = i64x4([1, 2, 3, 4]);
let b = i64x4([4, 3, 2, 1]);
assert_eq!(i64x4([5, 5, 5, 5]), a + b);
}
pub mod simd {
/*!
An external crate, `core::simd`.
It wants to call platform intrinsics instead of direct backend intrinsics.
*/
use std::ops::Add;
use crate::compiler::{ReprSimd, platform_intrinsics};
/**
A vector type with 4 `i64` lanes.
*/
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct i64x4(pub [i64; 4]);
/**
Our vector type is `#[repr(simd)]`
*/
impl ReprSimd for i64x4 {
type Type = i64;
const SIZE: usize = 4;
fn into_fields(self) -> Vec<Self::Type> {
vec![self.0[0], self.0[1], self.0[2], self.0[3]]
}
fn from_fields(fields: Vec<Self::Type>) -> Self {
i64x4([fields[0], fields[1], fields[2], fields[3]])
}
}
/**
We can add two of our vectors together using the `simd_add` platform intrinsic.
Hopefully it'll generate vectorized code! But it's ok if it doesn't.
*/
impl Add for i64x4 {
type Output = Self;
fn add(self, rhs: Self) -> Self::Output {
platform_intrinsics::simd_add(self, rhs)
}
}
}
pub mod compiler {
/*!
Our Rust compiler.
It contains a few private child modules to demonstrate where each piece lives
and how they interact to expose the single `platform-intrinsics` API.
*/
mod builtins {
/*!
The compiler builtins.
We expose manually monomorphized SIMD fallback intrinsics here that
codegen backends can call into if they don't support generating
specialized code for a SIMD platform intrinsic yet.
*/
use std::{
intrinsics,
ops::Add,
};
pub fn simd_fallback_add_i64(a: &[i64], b: &[i64], r: &mut [i64]) { i64::simd_add(a, b, r) }
pub fn simd_fallback_add_f64(a: &[f64], b: &[f64], r: &mut [f64]) { f64::simd_add(a, b, r) }
/**
Let's model our `simd_add` operation as a trait so we can specialize.
It doesn't really matter how we do this, so long as it's readable.
*/
trait SimdAdd {
type Output;
fn simd_add(a: &[Self], b: &[Self], r: &mut [Self::Output])
where
Self: Sized;
}
impl<T> SimdAdd for T
where
T: Add + Copy,
{
type Output = T;
default fn simd_add(a: &[Self], b: &[Self], r: &mut [Self::Output]) {
for i in 0..r.len() {
// behavior on overflow is one of the edge cases the fallback should specify
r[i] = intrinsics::wrapping_add(a[i], b[i]);
}
}
}
impl SimdAdd for f64 {
fn simd_add(a: &[Self], b: &[Self], r: &mut [Self]) {
for i in 0..r.len() {
// we might want to treat floats specially to guarantee some behavior for NaN or Infinity
r[i] = unsafe { intrinsics::fadd_fast(a[i], b[i]) };
}
}
}
}
mod codegen_support {
/*!
Some backend-agnostic helpers.
*/
use std::{
mem,
any::type_name,
};
use super::builtins;
/**
Find a builtin fallback function by operation and type.
This is what backends would do if they can't generate specific code
for a given platform intrinsic.
*/
pub fn find_simd_fallback<T>(op: &str) -> fn(&[T], &[T], &mut [T]) {
match (op, type_name::<T>()) {
("simd_add", "i64") => unsafe { mem::transmute::<for<'r, 's, 't0> fn(&'r [i64], &'s [i64], &'t0 mut [i64]), for<'r, 's, 't0> fn(&'r [T], &'s [T], &'t0 mut [T])>(builtins::simd_fallback_add_i64) },
("simd_add", "f64") => unsafe { mem::transmute::<for<'r, 's, 't0> fn(&'r [f64], &'s [f64], &'t0 mut [f64]), for<'r, 's, 't0> fn(&'r [T], &'s [T], &'t0 mut [T])>(builtins::simd_fallback_add_f64) },
(op, ty) => unimplemented!("{} is unsupported for {}", op, ty),
}
}
}
mod codegen_cranelift {
/*!
An example of the code a backend might use to implement platform intrinsics.
*/
use super::ReprSimd;
use super::codegen_support;
pub fn simd_add<T>(a: T, b: T) -> T
where
T: ReprSimd,
{
// We don't generate specific code for `simd_add`, so use a fallback
// to find the appropriate function to call
simd_fallback("simd_add", a, b)
}
fn simd_fallback<T>(op: &str, a: T, b: T) -> T
where
T: ReprSimd,
{
let a = a.into_fields();
let b = b.into_fields();
let mut r = vec![T::Type::default(); T::SIZE];
assert_eq!(T::SIZE, a.len());
assert_eq!(T::SIZE, b.len());
(codegen_support::find_simd_fallback::<T::Type>(op))(&*a, &*b, &mut *r);
T::from_fields(r)
}
}
pub mod platform_intrinsics {
/*!
Let's expose our backend's `simd_add` as a platform intrinsic.
This is the 'public' API that our `simd` consumer can call.
*/
pub use super::codegen_cranelift::simd_add;
}
/**
A codeified representation of the requirements we expect `#[repr(simd)]`
types to satisfy.
We're using `Vec` here instead of `[T; N]` just to make things compile
for this example.
*/
pub trait ReprSimd {
type Type: Default + Copy;
const SIZE: usize;
fn into_fields(self) -> Vec<Self::Type>;
fn from_fields(fields: Vec<Self::Type>) -> Self;
}
}
At the top level, in core::simd
we call into generic platform intrinsics.
In our compiler backend, we have one of two options; emit ISA-specific vectorized code for simd_add
(such as a call to x86's _mm256_add_epi64
), or emit a call to the fallback implementation from compiler_builtins
. The #[repr(simd)]
attribute ensures we can convert to and from arrays for the fallback implementation. Given a #[repr(simd)]
struct with N
fields of type T
we can read them into a [T; N]
and then write them from a [T; N]
back into the #[repr(simd)]
struct.
We pass these fallback vectors to the intrinsic as slices instead of arrays, so that arbitrarily sized inputs can be supported.
New SIMD operations can be defined with a reference implementation in compiler_builtins
. They can then be picked up by compiler backends automatically until they choose to generate specific code for them.
There's an implicit bound on the T
in simd_add
, which is that it somehow maps to an intrinsic like simd_fallback_add_i64
. This isn't currently communicated in its contract, and could be masked by compiler backends that don't forward to intrinsics. We suggest compiler backends always validate an appropriate fallback exists for a given platform intrinsic and vector type, even if it doesn't plan to use it.
Who defines intrinsics?
The Portable SIMD group will be responsible for determining the behavior of the simd_*
intrinsics it needs. These will take the form of the reference/fallback implementations in compiler_builtins
.
What do we need from T-compiler
?
We'll need a lot of help figuring out where to put reference implementations and how to make them available to the various backends.
Mentors or Reviewers
- @bjorn3 who's been working on
rustc
's Cranelift backend andextern "platform-intrinsic"
.
Process
The main points of the Major Change Process is as follows:
- File an issue describing the proposal.
- A compiler team member or contributor who is knowledgeable in the area can second by writing
@rustbot second
.- Finding a "second" suffices for internal changes. If however you are proposing a new public-facing feature, such as a
-C flag
, then full team check-off is required. - Compiler team members can initiate a check-off via
@rfcbot fcp merge
on either the MCP or the PR.
- Finding a "second" suffices for internal changes. If however you are proposing a new public-facing feature, such as a
- Once an MCP is seconded, the Final Comment Period begins. If no objections are raised after 10 days, the MCP is considered approved.
You can read more about Major Change Proposals on forge.
Comments
This issue is not meant to be used for technical discussion. There is a Zulip stream for that. Use this issue to leave procedural comments, such as volunteering to review, indicating that you second the proposal (or third, etc), or raising a concern that you would like to be addressed.