Linalg.generic and reading elements two-by-two

Hi all,

I am still quite new to all things mlir-related and I was wondering if there’s a way with linalg.generic to read two elements at a time in a tensor/memref (perhaps with a stride?)?

I am thinking of a way to write something like:

linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel"]} 
   ins(%in : tensor<32xf32, strides:[2]>) outs(%out : tensor<16xf32, strides:[1]>) {
        ^bb0(%x: f32, %y: f32, %z: f32):
          %res = arith.add %x, %y : f32
          linalg.yield %res : f32
   }

where x and y would both come from the input (%in), and be two consecutive elements.

Thanks in advance!

This looks like a reduction on %in, you could represent it like that in linalg. You can use tensor.expand_shape. It would look like this:

%0 = tensor.expand_shape %in [[0, 1]] : tensor<32xf32> into tensor<16x2xf32>
// Need to initialize the tensor to identity value. (that would usually get optimized away after vectorization)
%1 = linalg.init_tensor [16] : tensor<16xf32>
%out = linalg.fill ins(%cst : f32) outs(%1 : tensor<16xf32>) -> tensor<16xf32>
linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)], iterator_types = ["parrallel", "reduction"]} 
   ins(%0 : tensor<16x2xf32>) outs(%out : tensor<16xf32>) {
        ^bb0(%x: f32, %z: f32):
          %res = arith.add %x, %z : f32
          linalg.yield %res : f32
   }

Thank you for your answer!

I didn’t make it clear that my question was more general than this specific example, my apologies. I am actually wondering if there is a way for the computation block in a linalg.generic to act on, say, pairs of consecutive elements of a tensor, not specifically for such a reduction.

From what I gathered there’s no way to use a block that takes something else than a scalar value corresponding to the element type of the input (and output) tensor, right?

For a more involved example, say I have two tensor<MxNx2xT> as input, and a tensor<MxNx2XT> as output, and I want to use these as a representation of tensors of complex numbers (where the last dimension separates real and imaginary part), and I want to implement complex multiplication. I would thus need to read 2 consecutive elements at the time (the real and imaginary part) from my inputs to write 2 consecutive elements to my output, is there any way to do this in linalg (other than using the complex<T> element type)?

The following does not work because the computation block only accepts basic element types, but is there a general way to simulate it?

linalg.generic {indexing_maps = [#map, #map], iterator_types = ["parallel, parallel"]} 
   ins(%in1 : tensor<1x8x2xf32>, %in2 : tensor<1x8x2xf32>) outs(%out : tensor<1x8x2xf32>) {
        ^bb0(%x: tensor<2xf32>, %y: tensor<2xf32>, %z: tensor<2xf32>):
          %x_im = tensor.extract %x[0] : tensor<2xf32>
          %x_re = tensor.extract %x[1] : tensor<2xf32>
          %y_im = tensor.extract %y[0] : tensor<2xf32>
          %y_re = tensor.extract %y[1] : tensor<2xf32>
          // (...) complex multiplication operations omitted
          %z_im = tensor.insert %res_im %z[0] : tensor<2xf32>
          %z_re = tensor.insert %res_re %z[1] : tensor<2xf32>
          linalg.yield %z: tensor<2xf32>
   }

The element type you see in the generic is indeed the element type of the tensor and there is no support for tensor<...xtensor<2xf32>>.

However, you could use tensor<1x8xvector<2xf32>> and you would see

^bb0(%x: vector<2xf32>, %y: vector<2xf32>, %z: vector<2xf32>)

I think we also have a complex type available somewhere ?

1 Like

There is a builtin complex type, and the complex dialect provides operations on it.