Skip to content

Class: LlamaModel

Defined in: evaluator/LlamaModel/LlamaModel.ts:209

Properties

tokenizer

ts
readonly tokenizer: Tokenizer;

Defined in: evaluator/LlamaModel/LlamaModel.ts:238


onDispose

ts
readonly onDispose: EventRelay<void>;

Defined in: evaluator/LlamaModel/LlamaModel.ts:239

Accessors

disposed

Get Signature

ts
get disposed(): boolean;

Defined in: evaluator/LlamaModel/LlamaModel.ts:361

Returns

boolean


llama

Get Signature

ts
get llama(): Llama;

Defined in: evaluator/LlamaModel/LlamaModel.ts:365

Returns

Llama


tokens

Get Signature

ts
get tokens(): LlamaModelTokens;

Defined in: evaluator/LlamaModel/LlamaModel.ts:369

Returns

LlamaModelTokens


filename

Get Signature

ts
get filename(): string | undefined;

Defined in: evaluator/LlamaModel/LlamaModel.ts:373

Returns

string | undefined


fileInfo

Get Signature

ts
get fileInfo(): GgufFileInfo;

Defined in: evaluator/LlamaModel/LlamaModel.ts:377

Returns

GgufFileInfo


fileInsights

Get Signature

ts
get fileInsights(): GgufInsights;

Defined in: evaluator/LlamaModel/LlamaModel.ts:381

Returns

GgufInsights


gpuLayers

Get Signature

ts
get gpuLayers(): number;

Defined in: evaluator/LlamaModel/LlamaModel.ts:389

Number of layers offloaded to the GPU. If GPU support is disabled, this will always be 0.

Returns

number


useMmap

Get Signature

ts
get useMmap(): boolean;

Defined in: evaluator/LlamaModel/LlamaModel.ts:399

Whether the model is loaded using mmap (memory-mapped file) or not.

When Direct I/O (setting the useDirectIo option to true) is used it'll override mmap and this value may be out of sync with the actual usage of mmap for the loading of this model instance.

Returns

boolean


size

Get Signature

ts
get size(): number;

Defined in: evaluator/LlamaModel/LlamaModel.ts:408

Total model size in memory in bytes.

When using mmap, actual memory usage may be higher than this value due to llama.cpp's performance optimizations.

Returns

number


flashAttentionSupported

Get Signature

ts
get flashAttentionSupported(): boolean;

Defined in: evaluator/LlamaModel/LlamaModel.ts:414

Returns

boolean


defaultContextFlashAttention

Get Signature

ts
get defaultContextFlashAttention(): boolean | "auto";

Defined in: evaluator/LlamaModel/LlamaModel.ts:418

Returns

boolean | "auto"


defaultContextSwaFullCache

Get Signature

ts
get defaultContextSwaFullCache(): boolean;

Defined in: evaluator/LlamaModel/LlamaModel.ts:422

Returns

boolean


defaultContextKvCacheKeyType

Get Signature

ts
get defaultContextKvCacheKeyType(): GgmlType;

Defined in: evaluator/LlamaModel/LlamaModel.ts:426

Returns

GgmlType


defaultContextKvCacheValueType

Get Signature

ts
get defaultContextKvCacheValueType(): GgmlType;

Defined in: evaluator/LlamaModel/LlamaModel.ts:430

Returns

GgmlType


trainContextSize

Get Signature

ts
get trainContextSize(): number;

Defined in: evaluator/LlamaModel/LlamaModel.ts:719

The context size the model was trained on

Returns

number


embeddingVectorSize

Get Signature

ts
get embeddingVectorSize(): number;

Defined in: evaluator/LlamaModel/LlamaModel.ts:729

The size of an embedding vector the model can produce

Returns

number


vocabularyType

Get Signature

ts
get vocabularyType(): LlamaVocabularyType;

Defined in: evaluator/LlamaModel/LlamaModel.ts:738

Returns

LlamaVocabularyType

Methods

dispose()

ts
dispose(): Promise<void>;

Defined in: evaluator/LlamaModel/LlamaModel.ts:347

Returns

Promise<void>


tokenize()

Call Signature

ts
tokenize(
   text: string, 
   specialTokens?: boolean, 
   options?: "trimLeadingSpace"): Token[];

Defined in: evaluator/LlamaModel/LlamaModel.ts:444

Transform text into tokens that can be fed to the model

Parameters
ParameterTypeDescription
textstringthe text to tokenize
specialTokens?booleanif set to true, text that correspond to special tokens will be tokenized to those tokens. For example, <s> will be tokenized to the BOS token if specialTokens is set to true, otherwise it will be tokenized to tokens that corresponds to the plaintext <s> string.
options?"trimLeadingSpace"additional options for tokenization. If set to "trimLeadingSpace", a leading space will be trimmed from the tokenized output if the output has an additional space at the beginning.
Returns

Token[]

Call Signature

ts
tokenize(text: BuiltinSpecialTokenValue, specialTokens: "builtin"): Token[];

Defined in: evaluator/LlamaModel/LlamaModel.ts:445

Transform text into tokens that can be fed to the model

Parameters
ParameterTypeDescription
textBuiltinSpecialTokenValuethe text to tokenize
specialTokens"builtin"if set to true, text that correspond to special tokens will be tokenized to those tokens. For example, <s> will be tokenized to the BOS token if specialTokens is set to true, otherwise it will be tokenized to tokens that corresponds to the plaintext <s> string.
Returns

Token[]


detokenize()

ts
detokenize(
   tokens: readonly Token[], 
   specialTokens?: boolean, 
   lastTokens?: readonly Token[]): string;

Defined in: evaluator/LlamaModel/LlamaModel.ts:559

Transform tokens into text

Parameters

ParameterTypeDefault valueDescription
tokensreadonly Token[]undefinedthe tokens to detokenize.
specialTokens?booleanfalseif set to true, special tokens will be detokenized to their corresponding token text representation. Recommended for debugging purposes only. > Note: there may be additional spaces around special tokens that were not present in the original text - this is not a bug, this is how the tokenizer is supposed to work. Defaults to false.
lastTokens?readonly Token[]undefinedthe last few tokens that preceded the tokens to detokenize. If provided, the last few tokens will be used to determine whether a space has to be added before the current tokens or not, and apply other detokenizer-specific heuristics to provide the correct text continuation to the existing tokens. Using it may have no effect with some models, but it is still recommended.

Returns

string


getTokenAttributes()

ts
getTokenAttributes(token: Token): TokenAttributes;

Defined in: evaluator/LlamaModel/LlamaModel.ts:580

Parameters

ParameterType
tokenToken

Returns

TokenAttributes


isSpecialToken()

ts
isSpecialToken(token: Token | undefined): boolean;

Defined in: evaluator/LlamaModel/LlamaModel.ts:591

Check whether the given token is a special token (a control-type token or a token with no normal text representation)

Parameters

ParameterType
tokenToken | undefined

Returns

boolean


iterateAllTokens()

ts
iterateAllTokens(): Generator<Token, void, unknown>;

Defined in: evaluator/LlamaModel/LlamaModel.ts:606

Returns

Generator<Token, void, unknown>


isEogToken()

ts
isEogToken(token: Token | undefined): boolean;

Defined in: evaluator/LlamaModel/LlamaModel.ts:619

Check whether the given token is an EOG (End Of Generation) token, like EOS or EOT.

Parameters

ParameterType
tokenToken | undefined

Returns

boolean


createContext()

ts
createContext(options?: LlamaContextOptions): Promise<LlamaContext>;

Defined in: evaluator/LlamaModel/LlamaModel.ts:626

Parameters

ParameterType
optionsLlamaContextOptions

Returns

Promise<LlamaContext>


createEmbeddingContext()

ts
createEmbeddingContext(options?: LlamaEmbeddingContextOptions): Promise<LlamaEmbeddingContext>;

Defined in: evaluator/LlamaModel/LlamaModel.ts:643

Parameters

ParameterType
optionsLlamaEmbeddingContextOptions

Returns

Promise<LlamaEmbeddingContext>

See

Using Embedding tutorial


createRankingContext()

ts
createRankingContext(options?: LlamaRankingContextOptions): Promise<LlamaRankingContext>;

Defined in: evaluator/LlamaModel/LlamaModel.ts:653

Parameters

ParameterType
optionsLlamaRankingContextOptions

Returns

Promise<LlamaRankingContext>

See

Reranking Documents tutorial


getWarnings()

ts
getWarnings(): string[];

Defined in: evaluator/LlamaModel/LlamaModel.ts:665

Get warnings about the model file that would affect its usage.

These warnings include all the warnings generated by GgufInsights, but are more comprehensive.

Returns

string[]