Check for zero C struct memory

So, there are two questions at hand here:

How do we memcmp two "bags" of bytes ([u8]) to test for equality?

Easy, just use the == operator on these bytes: <[u8] as Eq>::eq() does use memcmp.

Can I see a struct as just a "bag of bytes"? If so, how?

Now, this is where things are subtle. The general answer is: "No, you cannot!"

Indeed, you can't go and feed a struct you know nothing about to the following function:

unsafe // Can be UB!
fn as_bag_of_bytes<T: ?Sized> (
    ptr: &'_ T,
) -> &'_ [u8]
{
    ::core::slice::from_raw_parts(
        // ptr
        ptr as *const T as *const u8,

        // len
        ::core::mem::size_of_val(ptr),
    )
}

The root cause of that being UB is that T may have padding bytes, as @Hyeonu said (I didn't know about the zero-initialized "exception" to the rule; anyways, since you won't be calling that function on a statically known zero-initialized struct, that exception in practice doesn't even count). So, if your struct has padding, you cannot call as_bag_of_bytes() on it.

That being said, it would be nice to have the above function for padding-less types, such as primitives or structs that have been carefully crafted to ensure they do not contain any padding.

And this is indeed possible:

  1. Define a new unsafe trait AsBytes (or NoPadding), that we will implement for types that have no padding. This way the above generic function can have a T : AsBytes bound and no longer be marked unsafe!

    /// Unsafe marker trait for types that are valid to cast as a slice of bytes.
    ///
    /// This is true of primitive types, and recursively for `#[repr(C)]`
    /// compositions of such types, **as long as there is no padding** (such as
    /// arrays).
    ///
    /// The derive macro takes care of deriving this trait with the necessary
    /// compile-time guards.
    unsafe trait AsBytes {
        fn as_bytes (self: &'_ Self)
            -> &'_ [u8]
        {
            unsafe {
                // # Safety
                //
                //   - contract of the trait
                ::core::slice::from_raw_parts(
                    self
                        as *const Self
                        as *const u8
                    ,
                    ::core::mem::size_of_val(self),
                )
            }
        }
    }
    
  2. unsafe impl AsBytes for:

    • primitive types:

      (
          unsafe
          impl $Trait:path, for primitive_types!() $(;)?
      ) => (
      impl_macro!(@impl_for_all
          unsafe
          impl $Trait, for [
              u8,     i8,
              u16,    i16,
              u32,    i32,
              usize,  isize,
              u64,    i64,
              u128,   i128,
              f32,
              f64,
              {T : ?Sized} *const T,
              {T : ?Sized} *mut T,
              (),
              {T : ?Sized} ::core::marker::PhantomData<T>,
              
              // the following are only safe to **view** as bytes,
              // do not create them from bytes!
              bool,
              ::core::num::NonZeroU8,     ::core::num::NonZeroI8,
              ::core::num::NonZeroU16,    ::core::num::NonZeroI16,
              ::core::num::NonZeroU32,    ::core::num::NonZeroI32,
              ::core::num::NonZeroUsize,  ::core::num::NonZeroIsize,
              ::core::num::NonZeroU64,    ::core::num::NonZeroI64,
              ::core::num::NonZeroU128,   ::core::num::NonZeroI128,
              {'a, T : 'a + ?Sized} &'a T,
              {'a, T : 'a + ?Sized} &'a mut T,
              {T : ?Sized} ::core::ptr::NonNull<T>,
              str,
          ]
      );
      
    • composite types; tuples are #[repr(Rust)] structs, so I do not include them, since their layout is allowed to change; we end up with composite types being arrays and slices:

      (
          unsafe
          impl $Trait:path, for array_types!() $(;)?
      ) => (
          impl_macro!(@impl_for_all
              unsafe
              impl $Trait, for [
                  {T : $Trait} [T],
                  {T         } [T;    0],
                  {T : $Trait} [T;    1],
                  {T : $Trait} [T;    2],
                  {T : $Trait} [T;    3],
                  {T : $Trait} [T;    4],
                  {T : $Trait} [T;    5],
                  {T : $Trait} [T;    6],
                  {T : $Trait} [T;    7],
                  {T : $Trait} [T;    8],
                  {T : $Trait} [T;    9],
                  {T : $Trait} [T;   10],
                  {T : $Trait} [T;   11],
                  {T : $Trait} [T;   12],
                  {T : $Trait} [T;   13],
                  {T : $Trait} [T;   14],
                  {T : $Trait} [T;   15],
                  {T : $Trait} [T;   16],
                  {T : $Trait} [T;   17],
                  {T : $Trait} [T;   18],
                  {T : $Trait} [T;   19],
                  {T : $Trait} [T;   20],
                  {T : $Trait} [T;   21],
                  {T : $Trait} [T;   22],
                  {T : $Trait} [T;   23],
                  {T : $Trait} [T;   24],
                  {T : $Trait} [T;   25],
                  {T : $Trait} [T;   26],
                  {T : $Trait} [T;   27],
                  {T : $Trait} [T;   28],
                  {T : $Trait} [T;   29],
                  {T : $Trait} [T;   30],
                  {T : $Trait} [T;   31],
                  {T : $Trait} [T;   32],
                  {T : $Trait} [T;   64],
                  {T : $Trait} [T;  128],
                  {T : $Trait} [T;  256],
                  {T : $Trait} [T;  512],
                  {T : $Trait} [T; 1024],
                  {T : $Trait} [T; 2048],
                  {T : $Trait} [T; 4096],
              ]
          );
      );
      
    impl_macro! {
        unsafe
        impl AsBytes, for primitive_types!()
    }
    impl_macro! {
        unsafe
        impl AsBytes, for array_types!()
    }
    
  3. Generate a #[derive(AsBytes)] procedural macro (macro_rules! macro for the playground) that checks:

    • that the struct is #[repr(C)] or #[repr(transparent)], since it is mandatory when wanting to rely on the layout of a struct (in the case of a macro_rules! macro for the playground, I have skipped the #[repr(transparent)] case;

    • that each field of the struct is AsBytes on its own:

      $(
          const_assert!(
              $field_ty : $crate::AsBytes,
          );
      )*
      
    • that there is no padding, by checking that the total size of the struct is equal to the sum of the sizes of its constituents:

      const_assert!(
          ::core::mem::size_of::<$StructName>() ==
          (0 $(+ ::core::mem::size_of::<$field_ty>())*)
      );
      
    • so that it can soundly unsafe impl AsBytes for that struct:

      unsafe impl $crate::AsBytes for $StructName {}
      

And now you can just call .as_bytes() on valid types and you'll get a zero-cost &[u8], that you can then == compare to get an efficienct memcmp!

derive_AsBytes! {
    #[repr(C)]
    struct Ok {
        a: u16,
        b: u8,
        c: u8,
    }
}

#[cfg(FALSE)] // Uncomment this line to get compilation errors
mod fails {
    derive_AsBytes! {
        #[repr(C)]
        struct InnerPadding {
            a: u8,
            // inner padding byte
            b: u16,
        }
    }
    
    derive_AsBytes! {
        #[repr(C)]
        struct TrailingPadding {
            a: u16,
            b: u8,
            // trailing padding byte
        }
    }
}

fn main ()
{
    dbg!(Ok { a: 10752, b: 27, c: 0 }.as_bytes());
}
  • yields

    [src/main.rs:260] Ok{a: 10752, b: 27, c: 0,}.as_bytes() = [
        0,
        42,
        27,
        0,
    ]
    
  • Playground


If that sounds like a tedious macro to write, and you think a crate should be exporting such functionality, then you are right! There already is such a crate, from which I've taken this idea:


There is another way to avoid padding, and that's by adding a #[repr(packed)] attribute on a struct. However, this adds a whole can of worms / bugs on itself, since now all the reads and writes on the fields of the struct need to be unaligned reads/writes using raw pointers, which is quite error-prone and thus unsafe. The only way this solution is easy to do is when all its fields have an alignment of 1, such as when using ::zerocopy::byteorder integer types. But in that case #[repr(packed)] is not doing anything, and we are back to a #[repr(C)] struct carefully crafted without padding bytes.

8 Likes