inspect measure command
Measure VRAM consumption of a GGUF model file with all possible combinations of gpu layers and context sizes
Usage
shell
npx --no node-llama-cpp inspect measure [modelPath]
Options
| Option | Description |
|---|---|
-m [string], --modelPath [string], --model [string], --path [string], --url [string], --uri [string] |
Model file to use for the measurements. Can be a path to a local file or a URI of a model file to download. Leave empty to choose from a list of recommended models (string) |
-H [string], --header [string] |
Headers to use when downloading a model from a URL, in the format key: value. You can pass this option multiple times to add multiple headers. (string[]) |
--gpu [string] |
Compute layer implementation type to use for llama.cpp. If omitted, uses the latest local build, and fallbacks to "auto" (default: Uses the latest local build, and fallbacks to "auto") (string)
|
--minLayers <number>, --mnl <number> |
Minimum number of layers to offload to the GPU (default: 1) (number) |
--maxLayers <number>, --mxl <number> |
Maximum number of layers to offload to the GPU (default: All layers) (number) |
--minContextSize <number>, --mncs <number> |
Minimum context size (default: 512) (number) |
--maxContextSize <number>, --mxcs <number> |
Maximum context size (default: Train context size) (number) |
--flashAttention, --fa |
Enable flash attention for the context (default: false) (boolean) |
--swaFullCache, --noSwa |
Disable SWA (Sliding Window Attention) on supported models (default: false) (boolean) |
-b <number>, --batchSize <number> |
Batch size to use for the model context (number) |
-n <number>, --measures <number> |
Number of context size measures to take for each gpu layers count (default: 10) (number) |
--memory [string] |
Type of memory to measure (default: vram) (string)
|
--noMmap |
Disable mmap (memory-mapped file) usage (default: false) (boolean) |
--printHeaderBeforeEachLayer, --ph |
Print header before each layer's measures (default: true) (boolean) |
--evaluateText [string], --evaluate [string], --et [string] |
Text to evaluate with the model (string) |
--repeatEvaluateText <number>, --repeatEvaluate <number>, --ret <number> |
Number of times to repeat the evaluation text before sending it for evaluation, in order to make it longer (default: 1) (number) |
-h, --help |
Show help |
-v, --version |
Show version number |