So, there are two questions at hand here:
How do we memcmp
two "bags" of bytes ([u8]
) to test for equality?
Easy, just use the ==
operator on these bytes: <[u8] as Eq>::eq()
does use memcmp
.
Can I see a struct as just a "bag of bytes"? If so, how?
Now, this is where things are subtle. The general answer is: "No, you cannot!"
Indeed, you can't go and feed a struct you know nothing about to the following function:
unsafe // Can be UB!
fn as_bag_of_bytes<T: ?Sized> (
ptr: &'_ T,
) -> &'_ [u8]
{
::core::slice::from_raw_parts(
// ptr
ptr as *const T as *const u8,
// len
::core::mem::size_of_val(ptr),
)
}
The root cause of that being UB is that T
may have padding bytes, as @Hyeonu said (I didn't know about the zero-initialized "exception" to the rule; anyways, since you won't be calling that function on a statically known zero-initialized struct, that exception in practice doesn't even count). So, if your struct has padding, you cannot call as_bag_of_bytes()
on it.
That being said, it would be nice to have the above function for padding-less types, such as primitives or structs that have been carefully crafted to ensure they do not contain any padding.
And this is indeed possible:
-
Define a new
unsafe trait AsBytes
(orNoPadding
), that we will implement for types that have no padding. This way the above generic function can have aT : AsBytes
bound and no longer be markedunsafe
!/// Unsafe marker trait for types that are valid to cast as a slice of bytes. /// /// This is true of primitive types, and recursively for `#[repr(C)]` /// compositions of such types, **as long as there is no padding** (such as /// arrays). /// /// The derive macro takes care of deriving this trait with the necessary /// compile-time guards. unsafe trait AsBytes { fn as_bytes (self: &'_ Self) -> &'_ [u8] { unsafe { // # Safety // // - contract of the trait ::core::slice::from_raw_parts( self as *const Self as *const u8 , ::core::mem::size_of_val(self), ) } } }
-
unsafe impl AsBytes
for:-
primitive types:
( unsafe impl $Trait:path, for primitive_types!() $(;)? ) => ( impl_macro!(@impl_for_all unsafe impl $Trait, for [ u8, i8, u16, i16, u32, i32, usize, isize, u64, i64, u128, i128, f32, f64, {T : ?Sized} *const T, {T : ?Sized} *mut T, (), {T : ?Sized} ::core::marker::PhantomData<T>, // the following are only safe to **view** as bytes, // do not create them from bytes! bool, ::core::num::NonZeroU8, ::core::num::NonZeroI8, ::core::num::NonZeroU16, ::core::num::NonZeroI16, ::core::num::NonZeroU32, ::core::num::NonZeroI32, ::core::num::NonZeroUsize, ::core::num::NonZeroIsize, ::core::num::NonZeroU64, ::core::num::NonZeroI64, ::core::num::NonZeroU128, ::core::num::NonZeroI128, {'a, T : 'a + ?Sized} &'a T, {'a, T : 'a + ?Sized} &'a mut T, {T : ?Sized} ::core::ptr::NonNull<T>, str, ] );
-
composite types; tuples are
#[repr(Rust)]
structs, so I do not include them, since their layout is allowed to change; we end up with composite types being arrays and slices:( unsafe impl $Trait:path, for array_types!() $(;)? ) => ( impl_macro!(@impl_for_all unsafe impl $Trait, for [ {T : $Trait} [T], {T } [T; 0], {T : $Trait} [T; 1], {T : $Trait} [T; 2], {T : $Trait} [T; 3], {T : $Trait} [T; 4], {T : $Trait} [T; 5], {T : $Trait} [T; 6], {T : $Trait} [T; 7], {T : $Trait} [T; 8], {T : $Trait} [T; 9], {T : $Trait} [T; 10], {T : $Trait} [T; 11], {T : $Trait} [T; 12], {T : $Trait} [T; 13], {T : $Trait} [T; 14], {T : $Trait} [T; 15], {T : $Trait} [T; 16], {T : $Trait} [T; 17], {T : $Trait} [T; 18], {T : $Trait} [T; 19], {T : $Trait} [T; 20], {T : $Trait} [T; 21], {T : $Trait} [T; 22], {T : $Trait} [T; 23], {T : $Trait} [T; 24], {T : $Trait} [T; 25], {T : $Trait} [T; 26], {T : $Trait} [T; 27], {T : $Trait} [T; 28], {T : $Trait} [T; 29], {T : $Trait} [T; 30], {T : $Trait} [T; 31], {T : $Trait} [T; 32], {T : $Trait} [T; 64], {T : $Trait} [T; 128], {T : $Trait} [T; 256], {T : $Trait} [T; 512], {T : $Trait} [T; 1024], {T : $Trait} [T; 2048], {T : $Trait} [T; 4096], ] ); );
impl_macro! { unsafe impl AsBytes, for primitive_types!() } impl_macro! { unsafe impl AsBytes, for array_types!() }
-
-
Generate a
#[derive(AsBytes)]
procedural macro (macro_rules!
macro for the playground) that checks:-
that the struct is
#[repr(C)]
or#[repr(transparent)]
, since it is mandatory when wanting to rely on the layout of a struct (in the case of amacro_rules!
macro for the playground, I have skipped the#[repr(transparent)]
case; -
that each field of the
struct
isAsBytes
on its own:$( const_assert!( $field_ty : $crate::AsBytes, ); )*
-
that there is no padding, by checking that the total size of the struct is equal to the sum of the sizes of its constituents:
const_assert!( ::core::mem::size_of::<$StructName>() == (0 $(+ ::core::mem::size_of::<$field_ty>())*) );
-
so that it can soundly
unsafe impl AsBytes
for that struct:unsafe impl $crate::AsBytes for $StructName {}
-
And now you can just call .as_bytes()
on valid types and you'll get a zero-cost &[u8]
, that you can then ==
compare to get an efficienct memcmp
!
derive_AsBytes! {
#[repr(C)]
struct Ok {
a: u16,
b: u8,
c: u8,
}
}
#[cfg(FALSE)] // Uncomment this line to get compilation errors
mod fails {
derive_AsBytes! {
#[repr(C)]
struct InnerPadding {
a: u8,
// inner padding byte
b: u16,
}
}
derive_AsBytes! {
#[repr(C)]
struct TrailingPadding {
a: u16,
b: u8,
// trailing padding byte
}
}
}
fn main ()
{
dbg!(Ok { a: 10752, b: 27, c: 0 }.as_bytes());
}
-
yields
[src/main.rs:260] Ok{a: 10752, b: 27, c: 0,}.as_bytes() = [ 0, 42, 27, 0, ]
If that sounds like a tedious macro to write, and you think a crate should be exporting such functionality, then you are right! There already is such a crate, from which I've taken this idea:
There is another way to avoid padding, and that's by adding a #[repr(packed)]
attribute on a struct
. However, this adds a whole can of worms / bugs on itself, since now all the reads and writes on the fields of the struct need to be unaligned
reads/writes using raw pointers, which is quite error-prone and thus unsafe
. The only way this solution is easy to do is when all its fields have an alignment of 1
, such as when using ::zerocopy::byteorder
integer types. But in that case #[repr(packed)]
is not doing anything, and we are back to a #[repr(C)]
struct carefully crafted without padding bytes.