Quantized Tosa IR lowing to llvm IR

For now I have sucessfully converted tosa float IR to linalg on tensor-> llvm ir. Then JIT run it.

But for a Quantized Tosa IR, which including !quant.uniform<> type for some ops like tosa.rescale/tosa.conv2d. I failed to do the lowing.

In llvm-project/tosa-to-linalg.mlir at main · llvm/llvm-project · GitHub and llvm-project/TosaToLinalg.cpp at main · llvm/llvm-project · GitHub, seems tosa can only support float conv2d to linalg on tensor conversion.

So is there a way converting Quantized Tosa IR to llvm ir? If not, is there a plan to do it?

Thanks

1 Like

Representing the zero point shift of a couple of tosa ops is still being worked on – iirc, it is blocked by some cleanup in linalg that is planned but just not yet gotten to. @gysit @rsuderman

I believe there is a separate set of patterns for lowering rescale that you need to opt in to (the most efficient wear to represent it is target specific).

1 Like

I am working on linalg extensions to improve the support for scalar parameters. Together with other developments this extension should enable support for more complex operations. So I expect some progress over the course of the next weeks.

2 Likes

@stellaraccident @gysit Thanks alot, it really help.

Hi,

I ended up here while looking for ideas on how to lower a quantized TFLite model. When I run mlir-opt -pass-pipeline="builtin.module(func.func(tosa-to-tensor, tosa-to-linalg-named, tosa-to-linalg, tosa-to-arith))" on an MLIR file generated starting from a quantized TFLite model, I get an error: custom op 'tosa.conv2d' has no custom assembly form.

Is this because I am not supposed to go through linalg for quantized TOSA ops? What is the alternative?

Seems you have a version mismatch in MLIR version - e.g., the textual format of MLIR you are producing and consuming doesn’t match and the version skew is resulting in an error.

Trying with mlir-opt from the same version I used to generate the higher-level representations gives me a different error: error: failed to legalize operation 'tosa.conv2d' and little more insight into what is wrong (it just prints the instruction where it is failing).

Could you run with --debug? That should hopefully say more. If you could extract to a small repro, that would be great.

Sorry for the late response. I cannot share the model I am working on, unfortunately. Which command should I run with --debug?

Mlir-opt - but you need to build with debugging enabled else you won’t see the flag.

This is the debug output, it looks like something went wrong with the data types (I removed the constants to make the output more readable).

Legalizing operation : 'tosa.conv2d'(0x55dd4f2d74b0) {
  %21 = "tosa.conv2d"(%20, %17, %16) <{dilation = array<i64: 1, 1>, pad = array<i64: 0, 0, 0, 0>, quantization_info = #tosa.conv_quant<input_zp = -128, weight_zp = 0>, stride = array<i64: 1, 1>}> : (tensor<?x1024x1024x3x!quant.uniform<i8:f32, 0.0039215679280459881:-128>>, tensor<32x3x3x3x!quant.uniform<i8<-127:127>:f32:0, {...}>>, tensor<32x!quant.uniform<i32:f32:0, {...}>>) -> tensor<?x1022x1022x32xi32>

  * Fold {
  } -> FAILURE : unable to fold

  * Pattern : 'tosa.conv2d -> ()' {
Trying to match "(anonymous namespace)::ConvConverter<mlir::tosa::Conv2DOp, mlir::linalg::Conv2DNhwcFhwcOp, mlir::linalg::Conv2DNhwcFhwcQOp>"
ImplicitTypeIDRegistry::lookupOrInsert(mlir::arith::detail::ConstantOpGenericAdaptorBase::Properties)
ImplicitTypeIDRegistry::lookupOrInsert(mlir::InferIntRangeInterface::Trait<Empty>)
ImplicitTypeIDRegistry::lookupOrInsert(mlir::InferTypeOpInterface::Trait<Empty>)
    ** Insert  : 'arith.constant'(0x55dd4f34d140)
    ** Insert  : 'tensor.dim'(0x55dd4f34e680)
tf-opt: external/llvm-project/mlir/lib/IR/Types.cpp:124: unsigned int mlir::Type::getIntOrFloatBitWidth() const: Assertion `isIntOrFloat() && "only integers and floats have a bitwidth"' failed.

For reference, I quantized the trained model as follows:

converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_generator
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]

converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

quantized_tflite_model = converter.convert()

Could you get the stack trace for the assert failure? (You probably will want to build tf-opt with line numbers enabled, also pity MLIR reproducers arent enabled here else this would be easier/can avoid python)

I figured out what was wrong, I was missing -tosa-strip-quant-types in the earlier tf-opt command. Sorry for bothering you.