Class: GgufInsights
Defined in: gguf/insights/GgufInsights.ts:16
Accessors
ggufFileInfo
Get Signature
get ggufFileInfo(): GgufFileInfo;Defined in: gguf/insights/GgufInsights.ts:59
Returns
configurationResolver
Get Signature
get configurationResolver(): GgufInsightsConfigurationResolver;Defined in: gguf/insights/GgufInsights.ts:63
Returns
GgufInsightsConfigurationResolver
tokens
Get Signature
get tokens(): GgufInsightsTokens;Defined in: gguf/insights/GgufInsights.ts:67
Returns
trainContextSize
Get Signature
get trainContextSize(): undefined | number;Defined in: gguf/insights/GgufInsights.ts:72
The context size the model was trained on
Returns
undefined | number
embeddingVectorSize
Get Signature
get embeddingVectorSize(): undefined | number;Defined in: gguf/insights/GgufInsights.ts:77
The size of an embedding vector the model can produce
Returns
undefined | number
totalLayers
Get Signature
get totalLayers(): number;Defined in: gguf/insights/GgufInsights.ts:81
Returns
number
modelSize
Get Signature
get modelSize(): number;Defined in: gguf/insights/GgufInsights.ts:86
Returns
number
flashAttentionSupported
Get Signature
get flashAttentionSupported(): boolean;Defined in: gguf/insights/GgufInsights.ts:90
Returns
boolean
hasEncoder
Get Signature
get hasEncoder(): boolean;Defined in: gguf/insights/GgufInsights.ts:110
Returns
boolean
hasDecoder
Get Signature
get hasDecoder(): boolean;Defined in: gguf/insights/GgufInsights.ts:120
Returns
boolean
isRecurrent
Get Signature
get isRecurrent(): boolean;Defined in: gguf/insights/GgufInsights.ts:129
Returns
boolean
supportsRanking
Get Signature
get supportsRanking(): boolean;Defined in: gguf/insights/GgufInsights.ts:143
Returns
boolean
swaSize
Get Signature
get swaSize(): undefined | number;Defined in: gguf/insights/GgufInsights.ts:171
The size of the SWA (Sliding Window Attention).
When undefined, the model does not use sliding window attention.
Returns
undefined | number
Methods
getWarnings()
getWarnings(modelFilePath?: string): string[];Defined in: gguf/insights/GgufInsights.ts:39
Get warnings about the model file that would affect its usage.
Most of these warnings are also generated by llama.cpp
Parameters
| Parameter | Type |
|---|---|
modelFilePath? | string |
Returns
string[]
estimateModelResourceRequirements()
estimateModelResourceRequirements(__namedParameters: {
gpuLayers: number;
useMmap?: boolean;
gpuSupportsMmap?: boolean;
}): GgufInsightsResourceRequirements;Defined in: gguf/insights/GgufInsights.ts:183
Parameters
| Parameter | Type |
|---|---|
__namedParameters | { gpuLayers: number; useMmap?: boolean; gpuSupportsMmap?: boolean; } |
__namedParameters.gpuLayers | number |
__namedParameters.useMmap? | boolean |
__namedParameters.gpuSupportsMmap? | boolean |
Returns
GgufInsightsResourceRequirements
estimateContextResourceRequirements()
estimateContextResourceRequirements(__namedParameters: {
contextSize: number;
modelGpuLayers: number;
batchSize?: number;
sequences?: number;
isEmbeddingContext?: boolean;
flashAttention?: boolean;
includeGraphOverhead?: boolean;
swaFullCache?: boolean;
}): GgufInsightsResourceRequirements;Defined in: gguf/insights/GgufInsights.ts:201
Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.
Parameters
| Parameter | Type |
|---|---|
__namedParameters | { contextSize: number; modelGpuLayers: number; batchSize?: number; sequences?: number; isEmbeddingContext?: boolean; flashAttention?: boolean; includeGraphOverhead?: boolean; swaFullCache?: boolean; } |
__namedParameters.contextSize | number |
__namedParameters.modelGpuLayers | number |
__namedParameters.batchSize? | number |
__namedParameters.sequences? | number |
__namedParameters.isEmbeddingContext? | boolean |
__namedParameters.flashAttention? | boolean |
__namedParameters.includeGraphOverhead? | boolean |
__namedParameters.swaFullCache? | boolean |
Returns
GgufInsightsResourceRequirements
from()
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>;Defined in: gguf/insights/GgufInsights.ts:584
Parameters
| Parameter | Type | Description |
|---|---|---|
ggufFileInfo | GgufFileInfo | |
llama? | Llama | If you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance. |
Returns
Promise<GgufInsights>