The Lang ref mentions here (https://llvm.org/docs/Atomics.html#notatomic) that when there is a data-race, loads return undef. What if the load is volatile? Does it also return undef ? Does this mean that the write that participates in the race does not invoke undefined behavior? What happens with write-write races? What if the NonAtomic writes involved are volatile?
The effects of "volatile" are target-specific... LangRef mentions that. I guess that means we won't return undef, but it's not really worth spending any time to codify it, I think. The compiler won't try to optimize volatile races, and users shouldn't be using volatile for cross-thread synchronization. Under the current model, races are not UB on the writing side. I guess that's more generous than the C++ standard model; it theoretically blocks certain obscure optimizations which involve loading the stored value. Maybe we could look into changing it. IIRC someone pointed out at one point that the model also specifies behavior for overlapping atomic operations which might not match hardware. The specification wasn't really written very carefully originally.
> Under the current model, races are not UB on the writing side. Does this include write-write races? *Something* has to happen there, it can't just be "one of the two wins", that's not compatible with plenty of optimizations.
(In reply to Ralf Jung from comment #2) > Does this include write-write races? *Something* has to happen there, it > can't just be "one of the two wins", that's not compatible with plenty of > optimizations. Neither wins; under the LangRef model, it just produces undef if you try to read the result.
@Eli Volatile forces the reads to be emitted, but if two consecutive unsynchronized reads return different values, then there must have been a data-race, because some other thread of execution must have modified the value of the memory. Is that so? Is the following "optimization" sound ? From volatile int* p; int a = *p; int b = *p; into volatile int* p; int a = *p; int b = *p; if (a != b) { a = undef; b = undef; } If a read-write race returns undef, then volatile appears to be only useful to force reads / writes with side-effects, but without a way to "launder" the undef (e.g. llvm.freeze), inspecting the value of a read feels useless.
(In reply to Gonzalo BG from comment #4) > Is the following "optimization" sound ? There isn't really a yes-or-no answer to this. We don't, and can't, promise that arbitrary volatile accesses are usable for cross-thread synchronization. Even if we implemented no optimizations at all, the underlying instruction sets don't allow us to guarantee that you'll get a useful result for all combinations of type/alignment. There two interesting constraints on the optimizations LLVM can perform which apply to well-behaved code. One, volatile accesses don't necessarily access regular memory. Two, we want "volatile sig_atomic_t" to work the way the C standard specifies, where it's feasible. Also, even if we could theoretically optimize out a volatile access, we probably wouldn't. We want to be friendly to anyone using the volatile for other reasons, like making assembly more readable, or trying to work around a compiler bug.
> Neither wins; under the LangRef model, it just produces undef if you try to read the result. Oh, that was not clear to me. The docs at https://llvm.org/docs/Atomics.html say that "it could indirectly introduce undefined behavior if another thread can access x at the same time"; we interpreted this as meaning that the write-write race is UB (or reading from it is), but I guess what it means is that the read becomes undef and *that* can be UB? I propose a patch like this to make it clearer: https://github.com/RalfJung/llvm-project/commit/e03cea06010c7b02df2c44772c8df052344d73d7. This is an interesting model. Do you know of any formal study of such a memory model?
> The effects of "volatile" are target-specific... LangRef mentions that. I guess that means > we won't return undef, but it's not really worth spending any time to codify it, I think. > The compiler won't try to optimize volatile races, and users > shouldn't be using volatile for cross-thread synchronization. I think it would be worth a lot to codify that! There are very few ways to do anything with undef in LLVM without causing UB down the line, so having a guarantee like "volatile reads never return undef" would be *enormously helpful* for code that has to deal with uninitialized memory or padding bytes. The interaction with concurrency here is just a minor detail, but would also be welcome. For example, code that debug-prints an arbitrary array of bytes is currently UB if it tries to print a padding byte. If volatile reads could be used to avoid that, that'd be useful. Similar for something like `memcmp`. As another example, algorithms like https://research.swtch.com/sparse are incorrect in LLVM because of the "unstable" nature of undef (there is UB in `is-member`). However, if volatile reads do never return undef, `is-member` could be fixed to use a single volatile read of `sparse[i]`. (Of course it would be better if something less optimization-inhbiting could be used, but having *some* correct way of doing this is better than having none, even if the only correct way is not as fast as it could be.)
> but I guess what it means is that the read becomes undef and *that* can be UB? Yes. > This is an interesting model. Do you know of any formal study of such a memory model? Atomics were implemented before anyone spent any significant effort formalizing undef/poison. The work that has been done (https://www.cs.utah.edu/~regehr/papers/undef-pldi17.pdf etc.) hasn't addressed multithreaded programs specifically, as far as I know.
In terms of adding semantics to volatile, I think that's not a good approach. There's way too much history of under-defined rules and weird semantics associated with volatile operations. And I don't want to unintentionally constrain better solutions for handling "undefined" memory like padding, given we're still not completely sure what they look like.
I'd love if there was a better way to handle "undef"/"poison" memory than volatile, but is there anything on the horizon? I know there were proposals to add a "freeze" operation/intrinsic, but that seems to have stalled, unfortunately. I also don't see how saying that volatile reads return "frozen" memory (basically giving it READ_ONCE semantics) constrains other solutions. But I appreciate that the interactions can be hard to predict.
This issue seems interesting, and I agree that it would be great if there's a construct that helps load always read non-undef value. But I'm not sure whether volatile should be the one - we can define volatile load to return non-undef value, and let programmer/compiler use it for correctness, but I guess they wouldn't be satisfied in the long run. Volatile loads cannot be vectorized / hoisted / removed, and people who want their code to be optimized will request another solution which allows their code to be faster and smaller. Regarding the padding and undef thing, I suggest another solution: what about defining an intrinsics, say `llvm.freeze_mem(ptr, size)`, which fills all undef bits/poison values in [ptr, ptr+size) with random bits? For example, p = alloca {i32, i8} // Say that the last 3 bytes are undef (padding bits) llvm.freeze_mem(p, 8) // Now the last 3 bytes are random integers, not undef print(((char *)p)[7]) // This will not raise UB, because p[7] is frozen and it is just a random integer llvm.freeze_mem is no-op in assembly, so it wouldn't cause any runtime overhead. Also, value analysis/alias analysis can work well on llvm.freeze_mem as well. Any thoughts?
> Any thoughts? I think it is worth remarking that since `llvm.freeze_mem` does not touch the memory and does not actually write random bits to it, every time you read the memory you might get a different result. This happens, for example, if you `madvise` a memory page with `MADV_FREE`, and then read from it without writing to it first.
> Volatile loads cannot be vectorized / hoisted / removed, and people who want their code to be optimized will request another solution which allows their code to be faster and smaller. As I said above, this is an "interim solution" -- it is something that a correct interpretation of volatile semantics pretty much has to guarantee anyway, and the alternative seems to be to wait at least several more years until the freeze/undef/poison thing is finally resolved. > what about defining an intrinsics, say `llvm.freeze_mem(ptr, size)`, which fills all undef bits/poison values in [ptr, ptr+size) with random bits? There has been a long discussion about this on the Rust side in https://github.com/rust-lang/rust/pull/58363 -- but without LLVM-side support Rust would have to resort to hacks to implement this (e.g., using an empty inline assembly block with the right annotations). > llvm.freeze_mem is no-op in assembly As mentioned by Gonzalo, this would not be a correct implementation. Real memory can change until written for the first time [1,2]. [1]: https://internals.rust-lang.org/t/what-the-hardware-does-is-not-what-your-program-does-uninitialized-memory/10561/27?u=ralfjung [2]: https://www.youtube.com/watch?v=kPR8h4-qZdk&feature=youtu.be&t=1150 The implementation would have to make sure to write to every page in the given memory range at least once.
(In reply to Gonzalo BG from comment #12) > This happens, for example, if you `madvise` a memory page with `MADV_FREE`, > and then read from it without writing to it first. (In reply to Ralf Jung from comment #13) > > llvm.freeze_mem is no-op in assembly > > As mentioned by Gonzalo, this would not be a correct implementation. Real > memory can change until written for the first time [1,2]. > This looks pretty interesting. So hardware can also contributes to the nondeterminism. This means that, to make llvm.freeze_mem correct, at least one byte write per page should happen. If llvm.freeze_mem can freely choose which address to write, wouldn't llvm.freeze_mem be still lowered into no-op in many cases? Given llvm.freeze_mem(ptr, size), what llvm.freeze_mem does is equivalent to movb (ptr2), %al movb %al, (ptr2) where ptr2 is any pointer chosen from range [ptr, ptr+size). Then, whenever there's any other store following llvm.freeze_mem, we can lower llvm.freeze_paddings into no-op, because we can choose ptr2 so that it is equivalent to the store's address. p = alloca {i32, i8} llvm.freeze_mem(p, 8) // Chooses any p2 s.t. p <= p2 < p+8, and do the roundtrip store 255, (gep p, 1) // We can say that p2 was (gep p, 1), so lower llvm.freeze_mem into nop. Things may get complicated if the size given to llvm.freeze_mem is larger than page size, but we can first support the case where size is smaller than page size and expand it later. > There has been a long discussion about this on the Rust side in > https://github.com/rust-lang/rust/pull/58363 -- but without LLVM-side > support Rust would have to resort to hacks to implement this (e.g., using an > empty inline assembly block with the right annotations). Yeah, I think existing optimizers / analyses should be aware of llvm.freeze_mem, otherwise this will block many transformation. The discussion was too long, so I couldn't read all of these, so the idea above would already have appeared in the PR, let me know. If you don't mind, I can write down an initial version of llvm.freeze_mem (with smallest support from existing optimizations/analysis) in this weekend and share it.
(In reply to Juneyoung Lee from comment #14) > Then, whenever there's any other store following llvm.freeze_mem, we can > lower llvm.freeze_paddings into no-op, because we can choose ptr2 so that it > is equivalent to the store's address. Sorry, typo here: llvm.freeze_paddings -> llvm.freeze_mem
I certainly don't mind. I had expected a value-based `freeze` to happen first, as that seemed easier, but maybe that was misguided.
> This means that, to make llvm.freeze_mem correct, at least one byte write per page should happen. It depends in which semantics you want freeze to have. If you want freeze to always return the same value if the memory isn't modified, then you need to do this, at least on targets in which this can happen (e.g. Linux with MADV_FREE is such a target). Alternatively, freeze could just aim to guarantee that undef is not read, and that's it. We could have some other intrinsic that also guarantees that the same memory is read every time, but I don't know how useful that is.
Since LLVM will duplicate reads, memory that can change "by itself" *is* `undef`.