Class: LlamaContext

Defined in: evaluator/LlamaContext/LlamaContext.ts:40

Properties

onDispose

readonly onDispose: EventRelay<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:72

Accessors

disposed

Get Signature

get disposed(): boolean;

Defined in: evaluator/LlamaContext/LlamaContext.ts:191

Returns

boolean

model

Get Signature

get model(): LlamaModel;

Defined in: evaluator/LlamaContext/LlamaContext.ts:195

Returns

LlamaModel

contextSize

Get Signature

get contextSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:199

Returns

number

batchSize

Get Signature

get batchSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:203

Returns

number

flashAttention

Get Signature

get flashAttention(): boolean;

Defined in: evaluator/LlamaContext/LlamaContext.ts:207

Returns

boolean

stateSize

Get Signature

get stateSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:215

The actual size of the state in the memory in bytes. This value is provided by llama.cpp and doesn't include all the memory overhead of the context.

Returns

number

currentThreads

Get Signature

get currentThreads(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:222

The number of threads currently used to evaluate tokens

Returns

number

idealThreads

Get Signature

get idealThreads(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:233

The number of threads that are preferred to be used to evaluate tokens.

The actual number of threads used may be lower when other evaluations are running in parallel.

Returns

number

totalSequences

Get Signature

get totalSequences(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:246

Returns

number

sequencesLeft

Get Signature

get sequencesLeft(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:250

Returns

number

Methods

dispose()

dispose(): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:177

Returns

Promise<void>

getAllocatedContextSize()

getAllocatedContextSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:237

Returns

number

getSequence()

getSequence(options: {
  contextShift?: ContextShiftOptions;
  tokenPredictor?: TokenPredictor;
}): LlamaContextSequence;

Defined in: evaluator/LlamaContext/LlamaContext.ts:258

Before calling this method, make sure to call sequencesLeft to check if there are any sequences left. When there are no sequences left, this method will throw an error.

Parameters

Parameter	Type	Description
`options`	{ `contextShift?`: `ContextShiftOptions`; `tokenPredictor?`: `TokenPredictor`; }	-
`options.contextShift?`	`ContextShiftOptions`	-
`options.tokenPredictor?`	`TokenPredictor`	Token predictor to use for the sequence. Don't share the same token predictor between multiple sequences. Using a token predictor doesn't affect the generation output itself - it only allows for greater parallelization of the token evaluation to speed up the generation. > Note: that if a token predictor is too resource intensive, > it can slow down the generation process due to the overhead of running the predictor. > > Testing the effectiveness of a token predictor on the target machine is recommended before using it in production. Automatically disposed when disposing the sequence. See Using Token Predictors

Returns

LlamaContextSequence

dispatchPendingBatch()

dispatchPendingBatch(): void;

Defined in: evaluator/LlamaContext/LlamaContext.ts:309

Returns

void

printTimings()

printTimings(): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:617

Print the timings of token evaluation since that last print for this context.

Requires the performanceTracking option to be enabled.

Note: it prints on the LlamaLogLevel.info level, so if you set the level of your Llama instance higher than that, it won't print anything.

Returns

Promise<void>

LlamaModelTokens

ChatModelResponse

GgufMetadata

LlamaContextOptions

BatchingOptions

LlamaChatSessionOptions

LLamaChatPromptOptions

JinjaTemplateChatWrapperOptions

Class: LlamaContext ​

Properties ​

onDispose ​

Accessors ​

disposed ​

Get Signature ​

Returns ​

model ​

Get Signature ​

Returns ​

contextSize ​

Get Signature ​

Returns ​

batchSize ​

Get Signature ​

Returns ​

flashAttention ​

Get Signature ​

Returns ​

stateSize ​

Get Signature ​

Returns ​

currentThreads ​

Get Signature ​

Returns ​

idealThreads ​

Get Signature ​

Returns ​

totalSequences ​

Get Signature ​

Returns ​

sequencesLeft ​

Get Signature ​

Returns ​

Methods ​

dispose() ​

Returns ​

getAllocatedContextSize() ​

Returns ​

getSequence() ​

Parameters ​

Returns ​

dispatchPendingBatch() ​

Returns ​

printTimings() ​

Returns ​

Class: LlamaContext

Properties

onDispose

Accessors

disposed

Get Signature

Returns

model

Get Signature

Returns

contextSize

Get Signature

Returns

batchSize

Get Signature

Returns

flashAttention

Get Signature

Returns

stateSize

Get Signature

Returns

currentThreads

Get Signature

Returns

idealThreads

Get Signature

Returns

totalSequences

Get Signature

Returns

sequencesLeft

Get Signature

Returns

Methods

dispose()

Returns

getAllocatedContextSize()

Returns

getSequence()

Parameters

Returns

dispatchPendingBatch()

Returns

printTimings()

Returns