`inspect measure` command

Measure VRAM consumption of a GGUF model file with all possible combinations of gpu layers and context sizes

Usage

shell

npx --no node-llama-cpp inspect measure [modelPath]

Options

Option	Description
`-m [string]`, `--modelPath [string]`, `--model [string]`, `--path [string]`, `--url [string]`, `--uri [string]`	Model file to use for the measurements. Can be a path to a local file or a URI of a model file to download. Leave empty to choose from a list of recommended models `(string)`
`-H [string]`, `--header [string]`	Headers to use when downloading a model from a URL, in the format `key: value`. You can pass this option multiple times to add multiple headers. `(string[])`
`--gpu [string]`	Compute layer implementation type to use for llama.cpp. If omitted, uses the latest local build, and fallbacks to "auto" (default: Uses the latest local build, and fallbacks to "auto") `(string)` choices: `auto`, `metal`, `cuda`, `vulkan`, `false`
`--minLayers <number>`, `--mnl <number>`	Minimum number of layers to offload to the GPU (default: `1`) `(number)`
`--maxLayers <number>`, `--mxl <number>`	Maximum number of layers to offload to the GPU (default: All layers) `(number)`
`--minContextSize <number>`, `--mncs <number>`	Minimum context size (default: `512`) `(number)`
`--maxContextSize <number>`, `--mxcs <number>`	Maximum context size (default: Train context size) `(number)`
`--flashAttention`, `--fa`	Enable flash attention for the context (default: `false`) `(boolean)`
`--swaFullCache`, `--noSwa`	Disable SWA (Sliding Window Attention) on supported models (default: `false`) `(boolean)`
`-b <number>`, `--batchSize <number>`	Batch size to use for the model context `(number)`
`-n <number>`, `--measures <number>`	Number of context size measures to take for each gpu layers count (default: `10`) `(number)`
`--memory [string]`	Type of memory to measure (default: `vram`) `(string)` choices: `vram`, `ram`, `all`
`--noMmap`	Disable mmap (memory-mapped file) usage (default: `false`) `(boolean)`
`--printHeaderBeforeEachLayer`, `--ph`	Print header before each layer's measures (default: `true`) `(boolean)`
`--evaluateText [string]`, `--evaluate [string]`, `--et [string]`	Text to evaluate with the model `(string)`
`--repeatEvaluateText <number>`, `--repeatEvaluate <number>`, `--ret <number>`	Number of times to repeat the evaluation text before sending it for evaluation, in order to make it longer (default: `1`) `(number)`
`-h`, `--help`	Show help
`-v`, `--version`	Show version number

Source

Inspect

`inspect measure` command

Usage

Options

inspect measure command ​

Usage ​

Options

`inspect measure` command

Usage