Class: GgufInsights
Defined in: gguf/insights/GgufInsights.ts:17
Accessors
ggufFileInfo
Get Signature
get ggufFileInfo(): GgufFileInfo;Defined in: gguf/insights/GgufInsights.ts:61
Returns
configurationResolver
Get Signature
get configurationResolver(): GgufInsightsConfigurationResolver;Defined in: gguf/insights/GgufInsights.ts:65
Returns
GgufInsightsConfigurationResolver
tokens
Get Signature
get tokens(): GgufInsightsTokens;Defined in: gguf/insights/GgufInsights.ts:69
Returns
trainContextSize
Get Signature
get trainContextSize(): number | undefined;Defined in: gguf/insights/GgufInsights.ts:74
The context size the model was trained on
Returns
number | undefined
embeddingVectorSize
Get Signature
get embeddingVectorSize(): number | undefined;Defined in: gguf/insights/GgufInsights.ts:79
The size of an embedding vector the model can produce
Returns
number | undefined
totalLayers
Get Signature
get totalLayers(): number;Defined in: gguf/insights/GgufInsights.ts:83
Returns
number
modelSize
Get Signature
get modelSize(): number;Defined in: gguf/insights/GgufInsights.ts:88
Returns
number
flashAttentionSupported
Get Signature
get flashAttentionSupported(): boolean;Defined in: gguf/insights/GgufInsights.ts:92
Returns
boolean
hasEncoder
Get Signature
get hasEncoder(): boolean;Defined in: gguf/insights/GgufInsights.ts:112
Returns
boolean
hasDecoder
Get Signature
get hasDecoder(): boolean;Defined in: gguf/insights/GgufInsights.ts:122
Returns
boolean
isRecurrent
Get Signature
get isRecurrent(): boolean;Defined in: gguf/insights/GgufInsights.ts:131
Returns
boolean
isHybrid
Get Signature
get isHybrid(): boolean;Defined in: gguf/insights/GgufInsights.ts:146
Returns
boolean
dominantTensorType
Get Signature
get dominantTensorType(): GgmlType | undefined;Defined in: gguf/insights/GgufInsights.ts:170
Get the dominant tensor type used in the model file
Returns
GgmlType | undefined
supportsRanking
Get Signature
get supportsRanking(): boolean;Defined in: gguf/insights/GgufInsights.ts:177
Returns
boolean
swaSize
Get Signature
get swaSize(): number | undefined;Defined in: gguf/insights/GgufInsights.ts:205
The size of the SWA (Sliding Window Attention).
When undefined, the model does not use sliding window attention.
Returns
number | undefined
Methods
getWarnings()
getWarnings(modelFilePath?: string): string[];Defined in: gguf/insights/GgufInsights.ts:41
Get warnings about the model file that would affect its usage.
Most of these warnings are also generated by llama.cpp
Parameters
| Parameter | Type |
|---|---|
modelFilePath? | string |
Returns
string[]
estimateModelResourceRequirements()
estimateModelResourceRequirements(__namedParameters: {
gpuLayers: number;
useMmap?: boolean;
gpuSupportsMmap?: boolean;
}): GgufInsightsResourceRequirements;Defined in: gguf/insights/GgufInsights.ts:217
Parameters
| Parameter | Type |
|---|---|
__namedParameters | { gpuLayers: number; useMmap?: boolean; gpuSupportsMmap?: boolean; } |
__namedParameters.gpuLayers | number |
__namedParameters.useMmap? | boolean |
__namedParameters.gpuSupportsMmap? | boolean |
Returns
GgufInsightsResourceRequirements
estimateContextResourceRequirements()
estimateContextResourceRequirements(__namedParameters: {
contextSize: number;
modelGpuLayers: number;
batchSize?: number;
sequences?: number;
isEmbeddingContext?: boolean;
flashAttention?: boolean;
includeGraphOverhead?: boolean;
swaFullCache?: boolean;
kvCacheKeyType?: GgmlType;
kvCacheValueType?: GgmlType;
}): GgufInsightsResourceRequirements;Defined in: gguf/insights/GgufInsights.ts:235
Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.
Parameters
| Parameter | Type |
|---|---|
__namedParameters | { contextSize: number; modelGpuLayers: number; batchSize?: number; sequences?: number; isEmbeddingContext?: boolean; flashAttention?: boolean; includeGraphOverhead?: boolean; swaFullCache?: boolean; kvCacheKeyType?: GgmlType; kvCacheValueType?: GgmlType; } |
__namedParameters.contextSize | number |
__namedParameters.modelGpuLayers | number |
__namedParameters.batchSize? | number |
__namedParameters.sequences? | number |
__namedParameters.isEmbeddingContext? | boolean |
__namedParameters.flashAttention? | boolean |
__namedParameters.includeGraphOverhead? | boolean |
__namedParameters.swaFullCache? | boolean |
__namedParameters.kvCacheKeyType? | GgmlType |
__namedParameters.kvCacheValueType? | GgmlType |
Returns
GgufInsightsResourceRequirements
from()
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>;Defined in: gguf/insights/GgufInsights.ts:736
Parameters
| Parameter | Type | Description |
|---|---|---|
ggufFileInfo | GgufFileInfo | - |
llama? | Llama | If you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance. |
Returns
Promise<GgufInsights>