Skip to content

Add importance matrix calculation to non-CPU back-ends #4931

Closed
@ikawrakow

Description

@ikawrakow

The imatrix tool, which computes an "importance matrix" that can be used to improve quantization accuracy, currently only works when run on the CPU, which is quite slow. In addition, when llama.cpp is built with CUDA support enabled, the call to the data collection function is bypassed, and one gets an empty result, which is inconvenient and leads to confusion.

Also, given the discussions around PRs #4897, #4861, #4856, #4773, where importance matrix capabilities were added to llama.cpp, there appears to be a lot of interest in experimenting with different training dataset to create the importance matrix. But experimentation is difficult with the much lower CPU performance compared to the GPU.

So, overall, it would be very useful to support importance matrix calculations on faster back-ends (CUDA, Metal, etc.).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions