Class: LlamaContextSequence

Defined in: evaluator/LlamaContext/LlamaContext.ts:933

Properties

onDispose

readonly onDispose: EventRelay<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:956

Accessors

disposed

Get Signature

get disposed(): boolean;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1008

Returns

boolean

context

Get Signature

get context(): LlamaContext;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1012

Returns

LlamaContext

model

Get Signature

get model(): LlamaModel;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1016

Returns

LlamaModel

contextSize

Get Signature

get contextSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1021

The maximum number of tokens that the sequence state can hold

Returns

number

nextTokenIndex

Get Signature

get nextTokenIndex(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1026

The index where the next evaluated token will be placed in the context

Returns

number

contextTokens

Get Signature

get contextTokens(): Token[];

Defined in: evaluator/LlamaContext/LlamaContext.ts:1031

The current context state tokens

Returns

Token[]

tokenMeter

Get Signature

get tokenMeter(): TokenMeter;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1038

Returns

TokenMeter

tokenPredictor

Get Signature

get tokenPredictor(): undefined | TokenPredictor;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1045

The token predictor used when creating this sequence.

Returns

undefined | TokenPredictor

stateCellsStartIndex

Get Signature

get stateCellsStartIndex(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1068

Get the index of the first token in the KV cache.

If you remove any tokens from the state that come before this index, no cached prefix tokens evaluation state will be used for the next evaluation.

For example, if stateCellsStartIndex is 10 and you remove the range {start: 11, end: 16} then the cached state for range 0-10 will be used in the next evaluation, but if you remove the range {start: 10, end: 16} (or {start: 9, end: 16}) then the cached state will not be used at all and will be re-evaluated in the next evaluation.

This index can be greater than 0 only when SWA (Sliding Window Attention) is used (only on supported models).

When SWA is used, this index will usually be Math.max(-1, .nextTokenIndex - .model.fileInsights.swaSize) or larger.

When the KV cache is empty, this index will be -1.

You can disable SWA by setting the swaFullCache option to true when creating a context.

Returns

number

tokenPredictions

Get Signature

get tokenPredictions(): {
  used: number;
  unused: number;
  validated: number;
  refuted: number;
};

Defined in: evaluator/LlamaContext/LlamaContext.ts:1083

Statistics of token predictions using the sequence's tokenPredictor.

The statistics change only when token prediction is used in this sequence.

validated + refuted = total number of evaluated predictions.

Prefer using validated and refuted to evaluate the effectiveness of token prediction.

Returns

used

used: number;

Number of token predictions that were actually used (tokens that were validated and then consumed)

unused

unused: number;

Number of token predictions that were not used (tokens that were validated and were not consumed)

validated

validated: number;

Number of token predictions that were validated successfully

refuted

refuted: number;

Number of token predictions that were refuted

isLoadedToMemory

Get Signature

get isLoadedToMemory(): boolean;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1104

Returns

boolean

Methods

dispose()

dispose(): void;

Defined in: evaluator/LlamaContext/LlamaContext.ts:992

Returns

void

compareContextTokens()

compareContextTokens(tokens: Token[]): {
  firstDifferentIndex: number;
};

Defined in: evaluator/LlamaContext/LlamaContext.ts:1108

Parameters

Parameter	Type
`tokens`	`Token`[]

Returns

{
  firstDifferentIndex: number;
}

firstDifferentIndex

firstDifferentIndex: number;

adaptStateToTokens()

adaptStateToTokens(tokens: Token[], allowShift: boolean): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1135

Erase parts of the context state to align it with the given tokens.

If the given tokens do not align with the current context state, the context state will be erased to align with the given tokens.

To find the first different token index between the context state and the given tokens, access the nextTokenIndex property.

If allowShift is true (the default), shifting tokens may happen to align the context state with the given tokens, which incurs token evaluation of the shifted tokens.

Parameters

Parameter	Type	Default value
`tokens`	`Token`[]	`undefined`
`allowShift`	`boolean`	`true`

Returns

Promise<void>

clearHistory()

clearHistory(): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1186

Clear the history of the sequence.

Returns

Promise<void>

eraseContextTokenRanges()

eraseContextTokenRanges(ranges: ContextTokensDeleteRange[]): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1197

Erase context tokens in the provided ranges to free up space for new tokens to be generated. The start of each range is inclusive, and the end of each range is exclusive. For example, the range {start: 0, end: 1} will remove the token at the 0 index only.

Parameters

Parameter	Type
`ranges`	`ContextTokensDeleteRange`[]

Returns

Promise<void>

evaluate()

evaluate(tokens: Token[], options: SequenceEvaluateOptions): AsyncGenerator<Token, void, 
  | void
  | Token
  | Token[]>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1334

Evaluate the provided tokens into the context sequence, and continue generating new tokens on iterator iterations.

This method uses the token predictor (when provided) to generate new tokens faster.

Parameters

Parameter	Type
`tokens`	`Token`[]
`options`	`SequenceEvaluateOptions`

Returns

AsyncGenerator<Token, void, | void | Token | Token[]>

evaluateWithMetadata()

evaluateWithMetadata<Metadata>(
   tokens: Token[], 
   metadata: Metadata, 
   options: SequenceEvaluateOptions): AsyncGenerator<SequenceEvaluateOutput<Metadata>, void, 
  | void
  | Token
  | Token[]>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1356

Like `.evaluate(...)`, but with additional metadata for each generated token.

Configure the additional metadata options to choose which metadata to include.

Type Parameters

Type Parameter
`Metadata` extends `SequenceEvaluateMetadataOptions`

Parameters

Parameter	Type
`tokens`	`Token`[]
`metadata`	`Metadata`
`options`	`SequenceEvaluateOptions`

Returns

AsyncGenerator<SequenceEvaluateOutput<Metadata>, void, | void | Token | Token[]>

evaluateWithoutGeneratingNewTokens()

evaluateWithoutGeneratingNewTokens(tokens: Token[], options: {
  evaluationPriority?: EvaluationPriority;
  contextShift?: ContextShiftOptions;
}): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1422

Evaluate the provided tokens into the context sequence without generating new tokens.

Parameters

Parameter	Type	Description
`tokens`	`Token`[]	-
`options`	{ `evaluationPriority?`: `EvaluationPriority`; `contextShift?`: `ContextShiftOptions`; }	-
`options.evaluationPriority?`	`EvaluationPriority`	When a lot of tokens are queued for the next batch, more than the configured `batchSize`, the tokens for each sequence will be evaluated based on the strategy chosen for the context. By default, the `"maximumParallelism"` strategy is used, which will try to evaluate as many sequences in parallel as possible, but at some point, it'll have to choose which sequences to evaluate more tokens of, so it'll prioritize the sequences with the highest evaluation priority. Also, a custom strategy can be used to prioritize the sequences differently, but generally, the higher the evaluation priority is, the more likely and more tokens will be evaluated for that sequence in the next queued batch.
`options.contextShift?`	`ContextShiftOptions`	Override the sequence context shift options for this evaluation

Returns

Promise<void>

controlledEvaluate()

controlledEvaluate(input: ControlledEvaluateInputItem[], options?: {
  evaluationPriority?: EvaluationPriority;
  contextShift?: ContextShiftOptions;
  onTokenResult?: void;
}): Promise<(
  | undefined
  | ControlledEvaluateIndexOutput)[]>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1504

Evaluate the provided tokens into the context sequence with custom options for each token.

This method allows for more precise control of the generation process.

A next token will be generated for a given token only if any of the generateNext options for it are used.

To generate more tokens after this method finishes, use it again with token(s) you selected to add to the context from the previous evaluation.

This method doesn't use the token predictor (when provided) since it cannot predict which tokens are actually needed. Use the evaluate method when you need to use token prediction.

Parameters

Parameter	Type	Description
`input`	`ControlledEvaluateInputItem`[]	-
`options?`	{ `evaluationPriority?`: `EvaluationPriority`; `contextShift?`: `ContextShiftOptions`; `onTokenResult?`: `void`; }	-
`options.evaluationPriority?`	`EvaluationPriority`	When a lot of tokens are queued for the next batch, more than the configured `batchSize`, the tokens for each sequence will be evaluated based on the strategy chosen for the context. By default, the `"maximumParallelism"` strategy is used, which will try to evaluate as many sequences in parallel as possible, but at some point, it'll have to choose which sequences to evaluate more tokens of, so it'll prioritize the sequences with the highest evaluation priority. Also, a custom strategy can be used to prioritize the sequences differently, but generally, the higher the evaluation priority is, the more likely and more tokens will be evaluated for that sequence in the next queued batch.
`options.contextShift?`	`ContextShiftOptions`	Override the sequence context shift options for this evaluation
`options.onTokenResult?`	-

Returns

Promise<( | undefined | ControlledEvaluateIndexOutput)[]>

An array where for each token in the input array, there can be an output item at the same index in the output array. For indexes that have no output, there won't be any value at the corresponding index in the output array.

It's recommended to iterate from 0 up to the length of the input array to check the results in the output array.

saveStateToFile()

saveStateToFile(filePath: string): Promise<{
  fileSize: number;
}>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1635

Save the current context sequence evaluation state to a file.

Parameters

Parameter	Type
`filePath`	`string`

Returns

Promise<{ fileSize: number; }>

See

Saving and restoring a context sequence evaluation state

loadStateFromFile()

loadStateFromFile(filePath: string, acceptRisk: {
  acceptRisk: true;
}): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:1668

Load a context sequence evaluation state from a file.

Trying to load a state file with a longer context size than the current sequence's context size will fail and throw an error.

You must ensure that the file was created from the exact same model, otherwise, using this function may crash the process.

Parameters

Parameter	Type	Description
`filePath`	`string`	-
`acceptRisk`	{ `acceptRisk`: `true`; }	-
`acceptRisk.acceptRisk`	`true`	Loading a state file created using a different model may crash the process. You must accept this risk to use this feature.

Returns

Promise<void>

See

Saving and restoring a context sequence evaluation state

LlamaModelTokens

ChatModelResponse

GgufMetadata

LlamaContextOptions

BatchingOptions

LlamaChatSessionOptions

LLamaChatPromptOptions

JinjaTemplateChatWrapperOptions

Class: LlamaContextSequence ​

Properties ​

onDispose ​

Accessors ​

disposed ​

Get Signature ​

Returns ​

context ​

Get Signature ​

Returns ​

model ​

Get Signature ​

Returns ​

contextSize ​

Get Signature ​

Returns ​

nextTokenIndex ​

Get Signature ​

Returns ​

contextTokens ​

Get Signature ​

Returns ​

tokenMeter ​

Get Signature ​

Returns ​

tokenPredictor ​

Get Signature ​

Returns ​

stateCellsStartIndex ​

Get Signature ​

Returns ​

tokenPredictions ​

Get Signature ​

Returns ​

used ​

unused ​

validated ​

refuted ​

isLoadedToMemory ​

Get Signature ​

Returns ​

Methods ​

dispose() ​

Returns ​

compareContextTokens() ​

Parameters ​

Returns ​

firstDifferentIndex ​

adaptStateToTokens() ​

Parameters ​

Returns ​

clearHistory() ​

Returns ​

eraseContextTokenRanges() ​

Parameters ​

Returns ​

evaluate() ​

Parameters ​

Returns ​

evaluateWithMetadata() ​

Type Parameters ​

Parameters ​

Returns ​

evaluateWithoutGeneratingNewTokens() ​

Parameters ​

Returns ​

controlledEvaluate() ​

Parameters ​

Returns ​

saveStateToFile() ​

Parameters ​

Returns ​

Class: LlamaContextSequence

Properties

onDispose

Accessors

disposed

Get Signature

Returns

context

Get Signature

Returns

model

Get Signature

Returns

contextSize

Get Signature

Returns

nextTokenIndex

Get Signature

Returns

contextTokens

Get Signature

Returns

tokenMeter

Get Signature

Returns

tokenPredictor

Get Signature

Returns

stateCellsStartIndex

Get Signature

Returns

tokenPredictions

Get Signature

Returns

used

unused

validated

refuted

isLoadedToMemory

Get Signature

Returns

Methods

dispose()

Returns

compareContextTokens()

Parameters

Returns

firstDifferentIndex

adaptStateToTokens()

Parameters

Returns

clearHistory()

Returns

eraseContextTokenRanges()

Parameters

Returns

evaluate()

Parameters

Returns

evaluateWithMetadata()

Type Parameters

Parameters

Returns

evaluateWithoutGeneratingNewTokens()

Parameters

Returns

controlledEvaluate()

Parameters

Returns

saveStateToFile()

Parameters

Returns

See

loadStateFromFile()

Parameters

Returns

See