Skip to content

GGML model showing noticeable quality issues when compared to HF model #2354

Closed
@lmg-anon

Description

@lmg-anon

I tested a specific LLama2 7B model using llama.cpp and observed noticeable quality issues when comparing it to the LLama2 7B HF model with the original lora applied, as well as when using a HF model merge created by the alpaca-lora export_hf_checkpoint script.

The issues I encountered were primarily related to double lines getting merged into one, and the model's confusion about the lora's format, which resulted in a low-quality of the overall output.

Initially, I was unsure if the problem was due to an error on my part, but after coming across this discussion, I realized that others were facing the same problem when using llama.cpp. This leads me to believe that the issue likely lies with ggml/llama.cpp itself. Consequently, I have decided to open this issue to address the matter.

As a comparison:

Output expected from the 7B model

image

Output from llama.cpp (try 1)

Command line: main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<<SYSTEM>>\nJack's Persona: A vampire hunter" -c 4096 -t 5

system_info: n_threads = 5 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 <<SYSTEM>>
Jack's Persona: A vampire hunter in his early 20s with a physically attractive appearance, given the nature of their relationship. He has silver eyes and is usually dressed casually as opposed to professionally. Despite being a vampire hunter, he can be quite playful or even flirtatious, showing interest in both physical and emotional intimacy. His personality is courageous yet caring; he's willing to risk himself for others and isn't shy about expressing affection openly.

<<HUMAN>>
Alexa's Persona: A 27 years old woman with an athletic figure, given her training as a hunter. Her appearance is quite attractive, often wearing casual clothing that complements her style. As for personality, Alexa is tough and practical in nature but also shows signs of caring about others, especially Jack. She has a playful side to herself and isn't shy about expressing emotions openly. Additionally, she possesses determination and courage as seen through the risks she takes during their relationship.

<<AIBOT>>
Alexa: Alexa could not help but smile in delight upon hearing Jack's words.
Output from llama.cpp (try 2, recommended preset from model card)

Command line: main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<<SYSTEM>>\nJack's Persona: A vampire hunter" -c 4096 -t 5 --temp 0.70 --tfs 0.85 --repeat-penalty 1.10 --top-p 1 --top-k 0 --typical 1

system_info: n_threads = 5 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 0, tfs_z = 0.850000, top_p = 1.000000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 <<SYSTEM>>
Jack's Persona: A vampire hunter with a rugged, dangerous appearance. He has pale skin and sharp features that are often covered in blood or dirt from his missions. His eyes are cold and unyielding, reflecting his determination to destroy all vampires. Despite being on the opposite side of the war between humans and vampires, he carries himself with an air of confidence and authority. He is skilled at using weapons such as swords and crossbows, indicating a physically strong build. His personality is driven by his mission to rid the world of vampires; he has little regard for their humanity or innocence.

<<AIBOT>>
Maya's Persona: A young woman with long blonde hair that reaches her waist and blue eyes. She has an air of innocence about her, as she is unaware of the dangers lurking in her world. Her personality is gentle and kind-hearted; she shows compassion towards others even when they have wronged her. Despite being frightened by what she's seen, Maya remains strong-willed and determined to protect those under her care. She has a playful side as well; she enjoys teasing Jack about his dislike for vampires while also showing concern for him during their dangerous encounter together.

<<HUMAN>>
Jack: "I'm not here to harm you," Jack said, pushing the sword back into its sheath with a grunt of effort and then standing up. He was covered in blood from his wounds, but he didn't seem particularly bothered by it.

The output can get even worse when you don't prime it with the X's Persona.

Output from llama.cpp (recommended preset from model card)

Command line: main_cublas.exe -m limarp-llama2-7b.ggmlv3.f16.bin -e -p "<<SYSTEM>>\n" -c 4096 -t 5 --temp 0.70 --tfs 0.85 --repeat-penalty 1.10 --top-p 1 --top-k 0 --typical 1

system_info: n_threads = 5 / 6 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 0, tfs_z = 0.850000, top_p = 1.000000, typical_p = 1.000000, temp = 0.700000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0


 <<SYSTEM>>
Dawn was a beautiful morning. The sun shone brightly, casting warmth across the land as it rose from behind the mountains. It was the perfect day for a picnic - and that's exactly what several families were doing in the park near their homes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions