Add importance matrix calculation to non-CPU back-ends

The `imatrix` tool, which computes an "importance matrix" that can be used to improve quantization accuracy, currently only works when run on the CPU, which is quite slow. In addition, when `llama.cpp` is built with CUDA support enabled, the call to the data collection function is bypassed, and one gets an empty result, which is inconvenient and leads to confusion.

Also, given the discussions around PRs #4897, #4861, #4856, #4773, where importance matrix capabilities were added to `llama.cpp`, there appears to be a lot of interest in experimenting with different training dataset to create the importance matrix. But experimentation is difficult with the much lower CPU performance compared to the GPU.

So, overall, it would be very useful to support importance matrix calculations on faster back-ends (CUDA, Metal, etc.). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add importance matrix calculation to non-CPU back-ends #4931

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add importance matrix calculation to non-CPU back-ends #4931

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions