Skip to content

Class: GgufInsights

Accessors

ggufFileInfo

ts
get ggufFileInfo(): GgufFileInfo

Returns

GgufFileInfo

Defined in

gguf/insights/GgufInsights.ts:55


configurationResolver

ts
get configurationResolver(): GgufInsightsConfigurationResolver

Returns

GgufInsightsConfigurationResolver

Defined in

gguf/insights/GgufInsights.ts:59


trainContextSize

ts
get trainContextSize(): undefined | number

The context size the model was trained on

Returns

undefined | number

Defined in

gguf/insights/GgufInsights.ts:64


embeddingVectorSize

ts
get embeddingVectorSize(): undefined | number

The size of an embedding vector the model can produce

Returns

undefined | number

Defined in

gguf/insights/GgufInsights.ts:69


totalLayers

ts
get totalLayers(): number

Returns

number

Defined in

gguf/insights/GgufInsights.ts:73


modelSize

ts
get modelSize(): number

Returns

number

Defined in

gguf/insights/GgufInsights.ts:83


flashAttentionSupported

ts
get flashAttentionSupported(): boolean

Returns

boolean

Defined in

gguf/insights/GgufInsights.ts:87

Methods

getWarnings()

ts
getWarnings(modelFilePath?: string): string[]

Get warnings about the model file that would affect its usage.

Most of these warnings are also generated by llama.cpp

Parameters

ParameterType
modelFilePath?string

Returns

string[]

Defined in

gguf/insights/GgufInsights.ts:35


estimateModelResourceRequirements()

ts
estimateModelResourceRequirements(__namedParameters: {
  gpuLayers: number;
 }): GgufInsightsResourceRequirements

Parameters

ParameterType
__namedParametersobject
__namedParameters.gpuLayersnumber

Returns

GgufInsightsResourceRequirements

Defined in

gguf/insights/GgufInsights.ts:107


estimateContextResourceRequirements()

ts
estimateContextResourceRequirements(__namedParameters: {
  contextSize: number;
  modelGpuLayers: number;
  batchSize: number;
  sequences: number;
  isEmbeddingContext: false;
  flashAttention: false;
  includeGraphOverhead: true;
 }): GgufInsightsResourceRequirements

Estimates the memory required to create a context of the given parameters based on the implementation details of llama.cpp. The calculation doesn't include a precise estimation of the graph overhead memory, so it uses a rough estimate for that. The estimation for the graph overhead memory will be improved in the future to be more precise, but it's good enough for now.

Parameters

ParameterType
__namedParametersobject
__namedParameters.contextSizenumber
__namedParameters.modelGpuLayersnumber
__namedParameters.batchSize?number
__namedParameters.sequences?number
__namedParameters.isEmbeddingContext?boolean
__namedParameters.flashAttention?boolean
__namedParameters.includeGraphOverhead?boolean

Returns

GgufInsightsResourceRequirements

Defined in

gguf/insights/GgufInsights.ts:121


from()

ts
static from(ggufFileInfo: GgufFileInfo, llama?: Llama): Promise<GgufInsights>

Parameters

ParameterTypeDescription
ggufFileInfoGgufFileInfo
llama?LlamaIf you already have a Llama instance, pass it to reuse it for the GgufInsights instance. If you don't pass a Llama instance, a basic Llama instance is created as a fallback - it's a slim instance that doesn't instantiate a llama.cpp backend, so it won't utilize the GPU at all, and be shared with other GgufInsights instances that need a fallback Llama instance.

Returns

Promise<GgufInsights>

Defined in

gguf/insights/GgufInsights.ts:436