Existing logic in vector dialect ([Vector] Vector distribution (large vector to small vector) - #23 by ThomasRaoux) works well together with xegpu ops distribution ([MLIR][XeGPU] Xegpu distribution patterns for load_nd, store_nd, and create_nd_tdesc. by kurapov-peter · Pull Request #112945 · llvm/llvm-project · GitHub and here’s a crude example of stitching the xegpu and vector distribution patterns together [MLIR][XeGPU] XeGPU ops distribution demo by kurapov-peter · Pull Request #111989 · llvm/llvm-project · GitHub).
warp_execute_on_lane_0 currently lies in the vector dialect, hence, expects only vector types to be distributed. Xegpu introduces another type (a tensor descriptor) that should be distributed. Although the only change required for the op to accommodate the new type is in distributed type validation, there is nothing specific to vector dialect in the distribution process. The execute_on_lane_0 op can potentially distribute any shaped type (or a narrower set of DistributionTypeInterface
implementations as suggested in the PR), so it doesn’t really belong to vector dialect.
A natural solution to this would be to move the op out of the vector dialect, however, there seems to be no good place for it at the moment. This leads to the proposal of a distribution
dialect that would contain operations such as warp_execute_on_lane_0 and contain all the logic related to distribution.
Today there is a simple solution to the problem (a small change to the op validation: [MLIR][Vector] Allow any shaped type to be distributed for vector.wa… by kurapov-peter · Pull Request #114215 · llvm/llvm-project · GitHub), so the proposal is just an idea of a clean design.
I’d like to collect feedback and concerns first, if any. Thoughts?