Class: LlamaContext
Defined in: evaluator/LlamaContext/LlamaContext.ts:35
Properties
onDispose
readonly onDispose: EventRelay<void>;
Defined in: evaluator/LlamaContext/LlamaContext.ts:66
Accessors
disposed
Get Signature
get disposed(): boolean
Defined in: evaluator/LlamaContext/LlamaContext.ts:178
Returns
boolean
model
Get Signature
get model(): LlamaModel
Defined in: evaluator/LlamaContext/LlamaContext.ts:182
Returns
contextSize
Get Signature
get contextSize(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:186
Returns
number
batchSize
Get Signature
get batchSize(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:190
Returns
number
flashAttention
Get Signature
get flashAttention(): boolean
Defined in: evaluator/LlamaContext/LlamaContext.ts:194
Returns
boolean
stateSize
Get Signature
get stateSize(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:202
The actual size of the state in the memory in bytes. This value is provided by llama.cpp
and doesn't include all the memory overhead of the context.
Returns
number
currentThreads
Get Signature
get currentThreads(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:209
The number of threads currently used to evaluate tokens
Returns
number
idealThreads
Get Signature
get idealThreads(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:220
The number of threads that are preferred to be used to evaluate tokens.
The actual number of threads used may be lower when other evaluations are running in parallel.
Returns
number
totalSequences
Get Signature
get totalSequences(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:233
Returns
number
sequencesLeft
Get Signature
get sequencesLeft(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:237
Returns
number
Methods
dispose()
dispose(): Promise<void>
Defined in: evaluator/LlamaContext/LlamaContext.ts:164
Returns
Promise
<void
>
getAllocatedContextSize()
getAllocatedContextSize(): number
Defined in: evaluator/LlamaContext/LlamaContext.ts:224
Returns
number
getSequence()
getSequence(options: {
contextShift: ContextShiftOptions;
tokenPredictor: TokenPredictor;
}): LlamaContextSequence
Defined in: evaluator/LlamaContext/LlamaContext.ts:245
Before calling this method, make sure to call sequencesLeft
to check if there are any sequences left. When there are no sequences left, this method will throw an error.
Parameters
Parameter | Type | Description |
---|---|---|
options | { contextShift : ContextShiftOptions ; tokenPredictor : TokenPredictor ; } | - |
options.contextShift ? | ContextShiftOptions | - |
options.tokenPredictor ? | TokenPredictor | Token predictor to use for the sequence. Don't share the same token predictor between multiple sequences. Using a token predictor doesn't affect the generation output itself - it only allows for greater parallelization of the token evaluation to speed up the generation. > Note: that if a token predictor is too resource intensive, > it can slow down the generation process due to the overhead of running the predictor. > > Testing the effectiveness of a token predictor on the target machine is recommended before using it in production. Automatically disposed when disposing the sequence. See Using Token Predictors |
Returns
dispatchPendingBatch()
dispatchPendingBatch(): void
Defined in: evaluator/LlamaContext/LlamaContext.ts:296
Returns
void
printTimings()
printTimings(): Promise<void>
Defined in: evaluator/LlamaContext/LlamaContext.ts:598
Print the timings of token evaluation since that last print for this context.
Requires the performanceTracking
option to be enabled.
Note: it prints on the
LlamaLogLevel.info
level, so if you set the level of yourLlama
instance higher than that, it won't print anything.
Returns
Promise
<void
>