`inspect estimate` command

Estimate the compatibility of a model with the current hardware

Usage

shell

npx --no node-llama-cpp inspect estimate [modelPath]

Option	Description
`--noMmap`	Disable mmap (memory-mapped file) usage (default: `false`) `(boolean)`
`--swaFullCache`, `--noSwa`	Disable SWA (Sliding Window Attention) on supported models (default: `false`) `(boolean)`

Option	Description
`-m [string]`, `--modelPath [string]`, `--model [string]`, `--path [string]`, `--url [string]`, `--uri [string]`	The path or URI of the GGUF file to use. If a URI is provided, the metadata will be read from the remote file without downloading the entire file. `(string)` `(required)`

Option	Description
`-H [string]`, `--header [string]`	Headers to use when reading a model file from a URL, in the format `key: value`. You can pass this option multiple times to add multiple headers. `(string[])`
`--gpu [string]`	Compute layer implementation type to use for llama.cpp. If omitted, uses the latest local build, and fallbacks to "auto" (default: Uses the latest local build, and fallbacks to "auto") `(string)` choices: `auto`, `metal`, `cuda`, `vulkan`, `false`
`--gpuLayers <number>`, `--gl <number>`	number of layers to store in VRAM. Set to `max` to use all the layers the model has (default: Automatically determined based on the available VRAM) `(number)`
`-c <number>`, `--contextSize <number>`	Context size to use for the model context. Set to `max` or `train` to use the training context size. Note that the train context size is not necessarily what you should use for inference, and a big context size will use a lot of memory (default: Automatically determined based on the available VRAM) `(number)`
`-e`, `--embedding`	Whether to estimate for creating an embedding context (default: `false`) `(boolean)`

Option	Description
`-h`, `--help`	Show help
`-v`, `--version`	Show version number