Native pointer vectors

# Native pointer vectors

This bug is a _request for comment_.  I am but a novice Rust developer, but I suspect that this would be a strong addition to the language.

The exact mechanics were fleshed out further in this bug; read down for more on what actually was implemented.
## Motivation

While working on bindings for FFTW (the Fastest Fourier Transform in the West), I ran into the hairy issue that it doesn't seem possible to have Rust hold vector pointers to the outside world.  What do I mean by this -- and why do we care?  Well, I'll provide (a distilled version of) the FFTW APIs in question, by way of example:

``` c
double *fftw_malloc(size_t sz);
fftw_plan_t *fftw_plan_fft(size_t n, double *in, double *out, int flags); /* morally */
void fftw_execute(fftw_plan_t *plan);
```

The general course of action when using FFTW is to use `fftw_malloc` to get a chunk of space to store your inputs and outputs, then create a "plan" with `fftw_plan` (a potentially expensive operation, but if you intend to run many FFTs, ultimately time-saving), and then finally execute the plan with `fftw_execute` when you've populated your buffers.  You can `fftw_execute` the same plan many times (and, in fact, if you can, you should!) after refilling the buffers and taking data out.

In FFTW-land, `fftw_malloc` is not strictly necessary as the only way to get memory, but it's a Damn Good Idea -- and as I'll discuss in a moment, not using it doesn't save us.  `fftw_malloc` goes out of its way to obtain memory that has "nice" properties; for instance, it tries hard to align things not just to `double` boundaries, but also to cache lines, depending on how much memory you ask for.  If it knows anything else interesting about your system, it takes that into account.  So, it's not fatal to not use it, if that's the only thing you can do ... but it sure does hurt.

Currently, in Rust, there is no way to import a pointer from the outside world and use it mutably, nor is there a way to tell Rust that this pointer may be written to by an outside API.  The closest thing that Rust has is `vec::unsafe::from_buf`, which in turn calls `rustrt::vec_from_buf_shared`, but the first thing that `vec_from_buf_shared` does is _allocate a new space and copy the memory away_!  This makes it unsuitable for referencing both the `in` and the `out` pointers; changing a pointer that has been imported through this mechanism will cause changes not to get written back to the outside world, and executing a transform to (and overwriting the contents of) a pointer that has been imported through this mechanism will cause Rust to not see the changes that happen after the copy.

In this case, we have another option, though.  We could create a vector inside Rust, and use `vec::unsafe::to_ptr` to create a pointer to it.  This will work, but it is dangerously broken (violates safety) in three ways.  In the first, Rust is not in control of the lifecycle of the external reference; Rust cannot know when the external reference no longer exists, and may prematurely garbage collect the vector.  This can happen, for instance, in the case in which a reference to the `out` pointer is still live, and a reference to the plan is still live, but the reference to the `in` pointer is dead; calling `fftw_execute` on this `plan` will result in doing accesses to the dead `in` vector, which may have been garbage collected.

In the second, Rust may reallocate memory out from under the external application.  The vector can be appended to, which may cause a reallocation to occur.  The external application's pointer will not be updated, and when it goes to access that pointer, that memory will be dead.  In the case of a "correct" usage, this will not occur (i.e., the programmer can be instructed not to do that), but this violates safety; at the very least, `+=` now becomes an `unsafe` operation.

In the third, perhaps most compellingly, this locks Rust into a world in which it is not permissible to have a copying garbage collector.  Right now, Rust does ref-counting and garbage collection with free()... but this need not always be the case!  Another implementation of Rust could very conceivably experiment with garbage collection schemes for better performance.  If the behavior of pointers were already specified, it would be one story, but currently that behavior is not -- and I argue, for the better.

In short, the language's existing capabilities for this are not sufficient to operate with at least one API.
## Use cases

The FFTW API is not the only system in which the current capabilities for referring to native memory are insufficient.  Consider also:
- A framebuffer interface, in which an external resource has `mmap()`ed some memory.  Accesses must go to exactly those addresses; none else will do.
- A command queue interface, in which a set of commands are provided for an external device (say, a graphics card) to DMA to and from chunks of memory.  (This is how modern graphics cards operate.)  Accesses must go to some prescribed address, usually managed through a remote resource manager (DRI, et al); no other address will do.

These are both examples of requiring a specific address... recall, however, that there are surely many applications that require some mutable chunk of memory, any mutable chunk at all!  Consider, for instance, the `ioctl()` interface on Linux, which has a similar "combination input/output buffer".

The FFTW API is one of many cases that require mutable memory accessible to the native system.
## Proposed solution

I propose the addition of a vector qualifier, `[native T]`.  The type `[native T]` does not unify with `[T]`; it is mainly distinct from the normal vector, but that dereferencing indexes in it and iterating over it both work.

The type `[native T]` has one introduction form:

``` rust
unsafe fn std::vec::unsafe::native<T>(ptr: *T, elts: uint) -> [native T];
```

The following elimination forms of vectors function for native vectors:
- `v[a]` as an expression (with size checking)
- `v[a]` as an lvalue (with size checking)
- `for t: T in v` as a loop construct

Notably, the following form does not function:
- `v += vp` as an append

When a native vector goes out of scope, the native memory pointed to is not modified or otherwise operated upon.

These are the basic rules for a native vector.
## Implementation

A native vector has the following internal representation:

`type native_repr<T> = { len: uint, data: *T };`

It is distinct in the type system because it does not share a representation with a Rust vector.  This choice was made to avoid the performance cost of having to check at run time whether any given vector is a Rust vector or a native vector before accessing it.

Translation is presumably very similar to Rust vectors.

It could be the case that no `rustrt` support is needed for this, since the `native_repr` type can be constructed purely in Rust, and then can be `reinterpret_cast`ed into a native vector.
## Extensions

The above describes a basic semantics for a native vector.  It provides a semantics sufficient to behave safely, but missing are two potentially useful extensions.  These are optional, and certainly not required for a first pass implementation, but would make the native vector substantially more usable.
### `[native? T]` unification

There is currently a substantial vector library built up to operate on Rust vectors.  Just because some operations do not apply, it does not make sense to have to duplicate the ones that do.  For that, I propose the `[native? T]` qualifier, similar to `[mutable? T]`.  Presumably a separate code path would have to be emitted at translate time.  I do not know enough about the inner workings of `mutable?` to comment on how similar this might be, and how possible this might be given the existing Rust codebase.
### Built in destructors

It can be potentially useful to have Rust take over lifecycle management of memory, if the FFI binding builder is careful.  Classically, the mechanism by which one might do this is as such:

``` rust
type mem = {
  vec : [native f64],
  dtor : @mem_res
};

fn malloc(n : uint) -> mem {
  let mem = fftw_native::fftw_malloc(n * 8u);
  ret { vec: unsafe { vec::unsafe::native(mem as *f64, n) }, dtor: @mem_res(mem) };
}

resource mem_res(mem: fftw_native::mem) {
  fftw_native::fftw_free(mem);
}
```

This has the unfortunate downside that a user of the API can extract the `vec` from the `type mem`, and let the `type mem` itself go out of scope (or otherwise become dead).  When the `type mem` itself becomes dead, the resource can immediately be freed, even though the native vector may still be live; this can violate safety.

A native vector, then, may wish to have an introduction form that includes a resource pointer associated with it, for built-in cleanup.
## Conclusion

In this document, I describe a new form of vector called the native vector.  The native vector allows Rust to safely interact with external memory.  The introduction form is unsafe, so although it would permit inter-task shared memory communication, it does not do so in a particularly novel fashion.  This proposed solution addresses all of the mentioned use cases in what seems (to the untrained eye!) like an elegant, Rust-like fashion.

Thoughts?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Native pointer vectors #1167

Native pointer vectors

Motivation

Use cases

Proposed solution

Implementation

Extensions

`[native? T]` unification

Built in destructors

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Native pointer vectors #1167

Description

Native pointer vectors

Motivation

Use cases

Proposed solution

Implementation

Extensions

[native? T] unification

Built in destructors

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`[native? T]` unification