Skip to content

Class: GgufInsights

Defined in: gguf/insights/GgufInsights.ts:15

Accessors

ggufFileInfo

Get Signature

ts
get ggufFileInfo(): GgufFileInfo

Defined in: gguf/insights/GgufInsights.ts:55

Returns

GgufFileInfo


configurationResolver

Get Signature

ts
get configurationResolver(): GgufInsightsConfigurationResolver

Defined in: gguf/insights/GgufInsights.ts:59

Returns

GgufInsightsConfigurationResolver


trainContextSize

Get Signature

ts
get trainContextSize(): undefined | number

Defined in: gguf/insights/GgufInsights.ts:64

The context size the model was trained on

Returns

undefined | number


embeddingVectorSize

Get Signature

ts
get embeddingVectorSize(): undefined | number

Defined in: gguf/insights/GgufInsights.ts:69

The size of an embedding vector the model can produce

Returns

undefined | number


totalLayers

Get Signature

ts
get totalLayers(): number

Defined in: gguf/insights/GgufInsights.ts:73

Returns

number


modelSize

Get Signature

ts
get modelSize(): number

Defined in: gguf/insights/GgufInsights.ts:83

Returns

number


flashAttentionSupported

Get Signature

ts
get flashAttentionSupported(): boolean

Defined in: gguf/insights/GgufInsights.ts:87

Returns

boolean


hasEncoder

Get Signature

ts
get hasEncoder(): boolean

Defined in: gguf/insights/GgufInsights.ts:107

Returns

boolean


hasDecoder

Get Signature

ts
get hasDecoder(): boolean

Defined in: gguf/insights/GgufInsights.ts:117

Returns

boolean


isRecurrent

Get Signature

ts
get isRecurrent(): boolean

Defined in: gguf/insights/GgufInsights.ts:126

Returns

boolean

Methods

getWarnings()

ts
getWarnings(modelFilePath?: string): string[]

Defined in: gguf/insights/GgufInsights.ts:35

Get warnings about the model file that would affect its usage.

Most of these warnings are also generated by llama.cpp

Parameters

ParameterType
modelFilePath?string

Returns

string[]


estimateModelResourceRequirements()

ts
estimateModelResourceRequirements(__namedParameters: {
  gpuLayers: number;
  useMmap: boolean;
  gpuSupportsMmap: boolean;
 }): GgufInsightsResourceRequirements

Defined in: gguf/insights/GgufInsights.ts:136

Parameters

ParameterType
__namedParameters{ gpuLayers: number; useMmap: boolean; gpuSupportsMmap: boolean; }
__namedParameters.gpuLayersnumber
__namedParameters.useMmap?boolean
__namedParameters.gpuSupportsMmap?boolean

Returns

GgufInsightsResourceRequirements


estimateContextResourceRequirements()

ts
estimateContextResourceRequirements(__namedParameters: {
  contextSize: number;
  modelGpuLayers: number;
  batchSize: number;
  sequences: number;
  isEmbeddingContext: boolean;
  flashAttention: boolean;
  includeGraphOverhead: boolean;
 }): GgufInsightsResourceRequirements

Defined in: gguf/insights/GgufInsights.ts:154

Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.

Parameters

ParameterType
__namedParameters{ contextSize: number; modelGpuLayers: number; batchSize: number; sequences: number; isEmbeddingContext: boolean; flashAttention: boolean; includeGraphOverhead: boolean; }
__namedParameters.contextSizenumber
__namedParameters.modelGpuLayersnumber
__namedParameters.batchSize?number
__namedParameters.sequences?number
__namedParameters.isEmbeddingContext?boolean
__namedParameters.flashAttention?boolean
__namedParameters.includeGraphOverhead?boolean

Returns

GgufInsightsResourceRequirements


from()

ts
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>

Defined in: gguf/insights/GgufInsights.ts:497

Parameters

ParameterTypeDescription
ggufFileInfoGgufFileInfo
llama?LlamaIf you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance.

Returns

Promise<GgufInsights>