Class: GgufInsights
Defined in: gguf/insights/GgufInsights.ts:15
Accessors
ggufFileInfo
Get Signature
get ggufFileInfo(): GgufFileInfo
Defined in: gguf/insights/GgufInsights.ts:55
Returns
configurationResolver
Get Signature
get configurationResolver(): GgufInsightsConfigurationResolver
Defined in: gguf/insights/GgufInsights.ts:59
Returns
GgufInsightsConfigurationResolver
trainContextSize
Get Signature
get trainContextSize(): undefined | number
Defined in: gguf/insights/GgufInsights.ts:64
The context size the model was trained on
Returns
undefined
| number
embeddingVectorSize
Get Signature
get embeddingVectorSize(): undefined | number
Defined in: gguf/insights/GgufInsights.ts:69
The size of an embedding vector the model can produce
Returns
undefined
| number
totalLayers
Get Signature
get totalLayers(): number
Defined in: gguf/insights/GgufInsights.ts:73
Returns
number
modelSize
Get Signature
get modelSize(): number
Defined in: gguf/insights/GgufInsights.ts:83
Returns
number
flashAttentionSupported
Get Signature
get flashAttentionSupported(): boolean
Defined in: gguf/insights/GgufInsights.ts:87
Returns
boolean
hasEncoder
Get Signature
get hasEncoder(): boolean
Defined in: gguf/insights/GgufInsights.ts:107
Returns
boolean
hasDecoder
Get Signature
get hasDecoder(): boolean
Defined in: gguf/insights/GgufInsights.ts:117
Returns
boolean
isRecurrent
Get Signature
get isRecurrent(): boolean
Defined in: gguf/insights/GgufInsights.ts:126
Returns
boolean
Methods
getWarnings()
getWarnings(modelFilePath?: string): string[]
Defined in: gguf/insights/GgufInsights.ts:35
Get warnings about the model file that would affect its usage.
Most of these warnings are also generated by llama.cpp
Parameters
Parameter | Type |
---|---|
modelFilePath ? | string |
Returns
string
[]
estimateModelResourceRequirements()
estimateModelResourceRequirements(__namedParameters: {
gpuLayers: number;
useMmap: boolean;
gpuSupportsMmap: boolean;
}): GgufInsightsResourceRequirements
Defined in: gguf/insights/GgufInsights.ts:136
Parameters
Parameter | Type |
---|---|
__namedParameters | { gpuLayers : number ; useMmap : boolean ; gpuSupportsMmap : boolean ; } |
__namedParameters.gpuLayers | number |
__namedParameters.useMmap ? | boolean |
__namedParameters.gpuSupportsMmap ? | boolean |
Returns
GgufInsightsResourceRequirements
estimateContextResourceRequirements()
estimateContextResourceRequirements(__namedParameters: {
contextSize: number;
modelGpuLayers: number;
batchSize: number;
sequences: number;
isEmbeddingContext: boolean;
flashAttention: boolean;
includeGraphOverhead: boolean;
}): GgufInsightsResourceRequirements
Defined in: gguf/insights/GgufInsights.ts:154
Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp
. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.
Parameters
Parameter | Type |
---|---|
__namedParameters | { contextSize : number ; modelGpuLayers : number ; batchSize : number ; sequences : number ; isEmbeddingContext : boolean ; flashAttention : boolean ; includeGraphOverhead : boolean ; } |
__namedParameters.contextSize | number |
__namedParameters.modelGpuLayers | number |
__namedParameters.batchSize ? | number |
__namedParameters.sequences ? | number |
__namedParameters.isEmbeddingContext ? | boolean |
__namedParameters.flashAttention ? | boolean |
__namedParameters.includeGraphOverhead ? | boolean |
Returns
GgufInsightsResourceRequirements
from()
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>
Defined in: gguf/insights/GgufInsights.ts:497
Parameters
Parameter | Type | Description |
---|---|---|
ggufFileInfo | GgufFileInfo | |
llama ? | Llama | If you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance. |