Skip to content

Class: DraftSequenceTokenPredictor

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:20

Predicts the next tokens by evaluating the current state of the target sequence on a draft sequence from a smaller and faster draft model.

See

Using Token Predictors: Draft Model Token Predictor

Extends

Constructors

new DraftSequenceTokenPredictor()

ts
new DraftSequenceTokenPredictor(draftSequence: LlamaContextSequence, options: {
  minTokens: number;
  maxTokens: number;
  evaluateOptions: Pick<SequenceEvaluateOptions, 
     | "contextShift"
     | "evaluationPriority"
     | "temperature"
     | "minP"
     | "topK"
     | "topP"
     | "seed"
     | "repeatPenalty"
     | "tokenBias">;
  minConfidence: number;
 }): DraftSequenceTokenPredictor

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:41

Parameters

ParameterTypeDescription
draftSequenceLlamaContextSequence-
options{ minTokens: number; maxTokens: number; evaluateOptions: Pick<SequenceEvaluateOptions, | "contextShift" | "evaluationPriority" | "temperature" | "minP" | "topK" | "topP" | "seed" | "repeatPenalty" | "tokenBias">; minConfidence: number; }-
options.minTokens?numberThe minimum number of tokens to draft. Defaults to 0.
options.maxTokens?numberMaximum number of tokens to draft. Defaults to 16.
options.evaluateOptions?Pick<SequenceEvaluateOptions, | "contextShift" | "evaluationPriority" | "temperature" | "minP" | "topK" | "topP" | "seed" | "repeatPenalty" | "tokenBias">Evaluate options default to the values of the target sequence. You can override any of the options for the prediction here.
options.minConfidence?numberMinimum token confidence (probability of the token to be generated, assigned by the model) to consider the token as a prediction. When the generated token confidence is lower than this value, the prediction process will stop until all the predicted tokens are exhausted (either by a token that was not predicted being pushed, or all the generated predictions are consumed). A number between 0 and 1 representing the minimum probability of the token to be generated. Set to 0 to disable. Defaults to 0.6.

Returns

DraftSequenceTokenPredictor

Overrides

TokenPredictor.constructor

Accessors

draftSequence

Get Signature

ts
get draftSequence(): LlamaContextSequence

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:88

Returns

LlamaContextSequence


minTokens

Get Signature

ts
get minTokens(): number

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:92

Returns

number


maxTokens

Get Signature

ts
get maxTokens(): number

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:96

Returns

number


minConfidence

Get Signature

ts
get minConfidence(): undefined | number

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:100

Returns

undefined | number

Methods

updateInputTokens()

ts
updateInputTokens(tokens: Token[]): void

Defined in: evaluator/LlamaContext/TokenPredictor.ts:57

Called with the input tokens before the generation starts when using LlamaChatSession, LlamaChat, and LlamaCompletion.

Parameters

ParameterType
tokensToken[]

Returns

void

Inherited from

TokenPredictor.updateInputTokens


reset()

ts
reset(__namedParameters: {
  targetSequence: LlamaContextSequence;
  stateTokens: Token[];
  evaluateOptions: Readonly<SequenceEvaluateOptions>;
 }): Promise<void>

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:104

Resets the state of the predictor.

Called before the generation starts.

Parameters

ParameterType
__namedParameters{ targetSequence: LlamaContextSequence; stateTokens: Token[]; evaluateOptions: Readonly<SequenceEvaluateOptions>; }
__namedParameters.targetSequenceLlamaContextSequence
__namedParameters.stateTokensToken[]
__namedParameters.evaluateOptionsReadonly<SequenceEvaluateOptions>

Returns

Promise<void>

Overrides

TokenPredictor.reset


pushTokens()

ts
pushTokens(tokens: Token[]): void

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:156

Parameters

ParameterType
tokensToken[]

Returns

void

Overrides

TokenPredictor.pushTokens


predictTokens()

ts
predictTokens(): 
  | Token[]
  | Promise<Token[]>

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:192

Predicts the next tokens based on the current state.

If the generation should wait until the minimum predications are ready, this method should return a promise that resolves when the minimum predictions are ready.

A background prediction process can be started when this function is called, so that the next predictions will be ready when this function is called again.

Returns

| Token[] | Promise<Token[]>

Overrides

TokenPredictor.predictTokens


stop()

ts
stop(untilPredictionsExhausted: boolean): void

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:221

Stops the prediction process when it runs in the background.

Parameters

ParameterTypeDefault valueDescription
untilPredictionsExhaustedbooleanfalseIf true, the prediction process should not resume until the current predictions are exhausted.

Returns

void

Overrides

TokenPredictor.stop


dispose()

ts
dispose(): void

Defined in: evaluator/LlamaContext/tokenPredictors/DraftSequenceTokenPredictor.ts:235

Returns

void

Overrides

TokenPredictor.dispose