Skip to content

Class: GgufInsights

Defined in: gguf/insights/GgufInsights.ts:15

Accessors

ggufFileInfo

Get Signature

ts
get ggufFileInfo(): GgufFileInfo;

Defined in: gguf/insights/GgufInsights.ts:55

Returns

GgufFileInfo


configurationResolver

Get Signature

ts
get configurationResolver(): GgufInsightsConfigurationResolver;

Defined in: gguf/insights/GgufInsights.ts:59

Returns

GgufInsightsConfigurationResolver


trainContextSize

Get Signature

ts
get trainContextSize(): undefined | number;

Defined in: gguf/insights/GgufInsights.ts:64

The context size the model was trained on

Returns

undefined | number


embeddingVectorSize

Get Signature

ts
get embeddingVectorSize(): undefined | number;

Defined in: gguf/insights/GgufInsights.ts:69

The size of an embedding vector the model can produce

Returns

undefined | number


totalLayers

Get Signature

ts
get totalLayers(): number;

Defined in: gguf/insights/GgufInsights.ts:73

Returns

number


modelSize

Get Signature

ts
get modelSize(): number;

Defined in: gguf/insights/GgufInsights.ts:83

Returns

number


flashAttentionSupported

Get Signature

ts
get flashAttentionSupported(): boolean;

Defined in: gguf/insights/GgufInsights.ts:87

Returns

boolean


hasEncoder

Get Signature

ts
get hasEncoder(): boolean;

Defined in: gguf/insights/GgufInsights.ts:107

Returns

boolean


hasDecoder

Get Signature

ts
get hasDecoder(): boolean;

Defined in: gguf/insights/GgufInsights.ts:117

Returns

boolean


isRecurrent

Get Signature

ts
get isRecurrent(): boolean;

Defined in: gguf/insights/GgufInsights.ts:126

Returns

boolean

Methods

getWarnings()

ts
getWarnings(modelFilePath?: string): string[];

Defined in: gguf/insights/GgufInsights.ts:35

Get warnings about the model file that would affect its usage.

Most of these warnings are also generated by llama.cpp

Parameters

ParameterType
modelFilePath?string

Returns

string[]


estimateModelResourceRequirements()

ts
estimateModelResourceRequirements(__namedParameters: {
  gpuLayers: number;
  useMmap?: boolean;
  gpuSupportsMmap?: boolean;
}): GgufInsightsResourceRequirements;

Defined in: gguf/insights/GgufInsights.ts:136

Parameters

ParameterType
__namedParameters{ gpuLayers: number; useMmap?: boolean; gpuSupportsMmap?: boolean; }
__namedParameters.gpuLayersnumber
__namedParameters.useMmap?boolean
__namedParameters.gpuSupportsMmap?boolean

Returns

GgufInsightsResourceRequirements


estimateContextResourceRequirements()

ts
estimateContextResourceRequirements(__namedParameters: {
  contextSize: number;
  modelGpuLayers: number;
  batchSize?: number;
  sequences?: number;
  isEmbeddingContext?: boolean;
  flashAttention?: boolean;
  includeGraphOverhead?: boolean;
}): GgufInsightsResourceRequirements;

Defined in: gguf/insights/GgufInsights.ts:154

Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.

Parameters

ParameterType
__namedParameters{ contextSize: number; modelGpuLayers: number; batchSize?: number; sequences?: number; isEmbeddingContext?: boolean; flashAttention?: boolean; includeGraphOverhead?: boolean; }
__namedParameters.contextSizenumber
__namedParameters.modelGpuLayersnumber
__namedParameters.batchSize?number
__namedParameters.sequences?number
__namedParameters.isEmbeddingContext?boolean
__namedParameters.flashAttention?boolean
__namedParameters.includeGraphOverhead?boolean

Returns

GgufInsightsResourceRequirements


from()

ts
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>;

Defined in: gguf/insights/GgufInsights.ts:514

Parameters

ParameterTypeDescription
ggufFileInfoGgufFileInfo
llama?LlamaIf you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance.

Returns

Promise<GgufInsights>