Skip to content

Class: LlamaContext

Defined in: evaluator/LlamaContext/LlamaContext.ts:40

Properties

onDispose

ts
readonly onDispose: EventRelay<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:71

Accessors

disposed

Get Signature

ts
get disposed(): boolean;

Defined in: evaluator/LlamaContext/LlamaContext.ts:183

Returns

boolean


model

Get Signature

ts
get model(): LlamaModel;

Defined in: evaluator/LlamaContext/LlamaContext.ts:187

Returns

LlamaModel


contextSize

Get Signature

ts
get contextSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:191

Returns

number


batchSize

Get Signature

ts
get batchSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:195

Returns

number


flashAttention

Get Signature

ts
get flashAttention(): boolean;

Defined in: evaluator/LlamaContext/LlamaContext.ts:199

Returns

boolean


stateSize

Get Signature

ts
get stateSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:207

The actual size of the state in the memory in bytes. This value is provided by llama.cpp and doesn't include all the memory overhead of the context.

Returns

number


currentThreads

Get Signature

ts
get currentThreads(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:214

The number of threads currently used to evaluate tokens

Returns

number


idealThreads

Get Signature

ts
get idealThreads(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:225

The number of threads that are preferred to be used to evaluate tokens.

The actual number of threads used may be lower when other evaluations are running in parallel.

Returns

number


totalSequences

Get Signature

ts
get totalSequences(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:238

Returns

number


sequencesLeft

Get Signature

ts
get sequencesLeft(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:242

Returns

number

Methods

dispose()

ts
dispose(): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:169

Returns

Promise<void>


getAllocatedContextSize()

ts
getAllocatedContextSize(): number;

Defined in: evaluator/LlamaContext/LlamaContext.ts:229

Returns

number


getSequence()

ts
getSequence(options: {
  contextShift?: ContextShiftOptions;
  tokenPredictor?: TokenPredictor;
}): LlamaContextSequence;

Defined in: evaluator/LlamaContext/LlamaContext.ts:250

Before calling this method, make sure to call sequencesLeft to check if there are any sequences left. When there are no sequences left, this method will throw an error.

Parameters

ParameterTypeDescription
options{ contextShift?: ContextShiftOptions; tokenPredictor?: TokenPredictor; }-
options.contextShift?ContextShiftOptions-
options.tokenPredictor?TokenPredictorToken predictor to use for the sequence. Don't share the same token predictor between multiple sequences. Using a token predictor doesn't affect the generation output itself - it only allows for greater parallelization of the token evaluation to speed up the generation. > Note: that if a token predictor is too resource intensive, > it can slow down the generation process due to the overhead of running the predictor. > > Testing the effectiveness of a token predictor on the target machine is recommended before using it in production. Automatically disposed when disposing the sequence. See Using Token Predictors

Returns

LlamaContextSequence


dispatchPendingBatch()

ts
dispatchPendingBatch(): void;

Defined in: evaluator/LlamaContext/LlamaContext.ts:301

Returns

void


printTimings()

ts
printTimings(): Promise<void>;

Defined in: evaluator/LlamaContext/LlamaContext.ts:609

Print the timings of token evaluation since that last print for this context.

Requires the performanceTracking option to be enabled.

Note: it prints on the LlamaLogLevel.info level, so if you set the level of your Llama instance higher than that, it won't print anything.

Returns

Promise<void>