Class: GgufInsights
Accessors
ggufFileInfo
get ggufFileInfo(): GgufFileInfo
Returns
Defined in
gguf/insights/GgufInsights.ts:55
configurationResolver
get configurationResolver(): GgufInsightsConfigurationResolver
Returns
GgufInsightsConfigurationResolver
Defined in
gguf/insights/GgufInsights.ts:59
trainContextSize
get trainContextSize(): undefined | number
The context size the model was trained on
Returns
undefined
| number
Defined in
gguf/insights/GgufInsights.ts:64
embeddingVectorSize
get embeddingVectorSize(): undefined | number
The size of an embedding vector the model can produce
Returns
undefined
| number
Defined in
gguf/insights/GgufInsights.ts:69
totalLayers
get totalLayers(): number
Returns
number
Defined in
gguf/insights/GgufInsights.ts:73
modelSize
get modelSize(): number
Returns
number
Defined in
gguf/insights/GgufInsights.ts:83
flashAttentionSupported
get flashAttentionSupported(): boolean
Returns
boolean
Defined in
gguf/insights/GgufInsights.ts:87
Methods
getWarnings()
getWarnings(modelFilePath?: string): string[]
Get warnings about the model file that would affect its usage.
Most of these warnings are also generated by llama.cpp
Parameters
Parameter | Type |
---|---|
modelFilePath ? | string |
Returns
string
[]
Defined in
gguf/insights/GgufInsights.ts:35
estimateModelResourceRequirements()
estimateModelResourceRequirements(__namedParameters: {
gpuLayers: number;
}): GgufInsightsResourceRequirements
Parameters
Parameter | Type |
---|---|
__namedParameters | object |
__namedParameters.gpuLayers | number |
Returns
GgufInsightsResourceRequirements
Defined in
gguf/insights/GgufInsights.ts:107
estimateContextResourceRequirements()
estimateContextResourceRequirements(__namedParameters: {
contextSize: number;
modelGpuLayers: number;
batchSize: number;
sequences: number;
isEmbeddingContext: false;
flashAttention: false;
includeGraphOverhead: true;
}): GgufInsightsResourceRequirements
Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp
. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.
Parameters
Parameter | Type |
---|---|
__namedParameters | object |
__namedParameters.contextSize | number |
__namedParameters.modelGpuLayers | number |
__namedParameters.batchSize ? | number |
__namedParameters.sequences ? | number |
__namedParameters.isEmbeddingContext ? | boolean |
__namedParameters.flashAttention ? | boolean |
__namedParameters.includeGraphOverhead ? | boolean |
Returns
GgufInsightsResourceRequirements
Defined in
gguf/insights/GgufInsights.ts:121
from()
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>
Parameters
Parameter | Type | Description |
---|---|---|
ggufFileInfo | GgufFileInfo | |
llama ? | Llama | If you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance. |