[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search)

This is an idea I had which could theoretically produce better results. I thought about this years ago in relation to how humans think.

The way it works is this.

First generate tokens in the usual manner. (Lets assume temperature is set quite low so we are selecting the token with the highest probability each time). Let our prompt be "The biggest animal in the world"

Then the next tokens are:

"is" 99%
"the" 99%
"giraffe" 50%

So we keep going until it stops. We can give this sentence a confidence score perhaps this is just product (or sum?) of all the probabilities of the the selected tokens (or something more elaborate).

This is our first option. But we don't just say the first thing that comes into our heads! So the next thing we do is go back to the word with the smallest probability. In this case it was "giraffe" and we look at the next highest token "blue-whale" and then continue on from there. Creating a second sentence which we can give a confidence score to. Note- the further we go back in the sentence the more time it take to generate a new sentence so we have to estimate the payoff each time.

Our aim is to create as a sentence with the highest overall confidence score as we can in a given time limit. The algorithm that will do this can be worked out by probability theory.

Note that even though we choose a less likely token at some point, the overall confidence score of the whole sentence could be higher.

The assumption is that a sentence that "makes sense" and are "self-consistent" will have a higher overall confidence score than one that is more random since it will have more parts of the sentence in which follow naturally. On the other hand this may just tend to produce sentences which over-fit. I don't know I haven't tried it!!!

I'm sure this has been thought of before and no doubt someone has written a paper about it!


---

A more elaborate scheme would be for the LLM to split into two when it get's to a particularly low probability token, and then then have some overall controller which dictates which LLM to run one step, perhaps depending on it's current confidence score. This may have the disadvantage that perhaps none of the LLMS will make it to the end of the sentence before the time limit. (Equivalent to a human who can't think of what to say)

The bad thing about this scheme is that although a human can re-think a sentence, they remember the old sentence. Whereas this method, it loses memory of the old sentence past the token it is changing.

For the confidence scoring one might also use the entire output of the LLM not just the last token.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Enhancement]: Global probability maximization with backtracking for better results. (aka Beam search) #1392

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions