Class: GgufInsights
Defined in: gguf/insights/GgufInsights.ts:15
Accessors
ggufFileInfo
Get Signature
get ggufFileInfo(): GgufFileInfo;
Defined in: gguf/insights/GgufInsights.ts:55
Returns
configurationResolver
Get Signature
get configurationResolver(): GgufInsightsConfigurationResolver;
Defined in: gguf/insights/GgufInsights.ts:59
Returns
GgufInsightsConfigurationResolver
trainContextSize
Get Signature
get trainContextSize(): undefined | number;
Defined in: gguf/insights/GgufInsights.ts:64
The context size the model was trained on
Returns
undefined
| number
embeddingVectorSize
Get Signature
get embeddingVectorSize(): undefined | number;
Defined in: gguf/insights/GgufInsights.ts:69
The size of an embedding vector the model can produce
Returns
undefined
| number
totalLayers
Get Signature
get totalLayers(): number;
Defined in: gguf/insights/GgufInsights.ts:73
Returns
number
modelSize
Get Signature
get modelSize(): number;
Defined in: gguf/insights/GgufInsights.ts:83
Returns
number
flashAttentionSupported
Get Signature
get flashAttentionSupported(): boolean;
Defined in: gguf/insights/GgufInsights.ts:87
Returns
boolean
hasEncoder
Get Signature
get hasEncoder(): boolean;
Defined in: gguf/insights/GgufInsights.ts:107
Returns
boolean
hasDecoder
Get Signature
get hasDecoder(): boolean;
Defined in: gguf/insights/GgufInsights.ts:117
Returns
boolean
isRecurrent
Get Signature
get isRecurrent(): boolean;
Defined in: gguf/insights/GgufInsights.ts:126
Returns
boolean
Methods
getWarnings()
getWarnings(modelFilePath?: string): string[];
Defined in: gguf/insights/GgufInsights.ts:35
Get warnings about the model file that would affect its usage.
Most of these warnings are also generated by llama.cpp
Parameters
Parameter | Type |
---|---|
modelFilePath? | string |
Returns
string
[]
estimateModelResourceRequirements()
estimateModelResourceRequirements(__namedParameters: {
gpuLayers: number;
useMmap?: boolean;
gpuSupportsMmap?: boolean;
}): GgufInsightsResourceRequirements;
Defined in: gguf/insights/GgufInsights.ts:136
Parameters
Parameter | Type |
---|---|
__namedParameters | { gpuLayers : number ; useMmap? : boolean ; gpuSupportsMmap? : boolean ; } |
__namedParameters.gpuLayers | number |
__namedParameters.useMmap? | boolean |
__namedParameters.gpuSupportsMmap? | boolean |
Returns
GgufInsightsResourceRequirements
estimateContextResourceRequirements()
estimateContextResourceRequirements(__namedParameters: {
contextSize: number;
modelGpuLayers: number;
batchSize?: number;
sequences?: number;
isEmbeddingContext?: boolean;
flashAttention?: boolean;
includeGraphOverhead?: boolean;
}): GgufInsightsResourceRequirements;
Defined in: gguf/insights/GgufInsights.ts:154
Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp
. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.
Parameters
Parameter | Type |
---|---|
__namedParameters | { contextSize : number ; modelGpuLayers : number ; batchSize? : number ; sequences? : number ; isEmbeddingContext? : boolean ; flashAttention? : boolean ; includeGraphOverhead? : boolean ; } |
__namedParameters.contextSize | number |
__namedParameters.modelGpuLayers | number |
__namedParameters.batchSize? | number |
__namedParameters.sequences? | number |
__namedParameters.isEmbeddingContext? | boolean |
__namedParameters.flashAttention? | boolean |
__namedParameters.includeGraphOverhead? | boolean |
Returns
GgufInsightsResourceRequirements
from()
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>;
Defined in: gguf/insights/GgufInsights.ts:514
Parameters
Parameter | Type | Description |
---|---|---|
ggufFileInfo | GgufFileInfo | |
llama? | Llama | If you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance. |
Returns
Promise
<GgufInsights
>