Skip to content

chat command

Chat with a Llama model

Usage

shell
npx --no node-llama-cpp chat
npx --no node-llama-cpp chat

Options

Required

Option Description
-m [string], --model [string] Llama model file to use for the chat (string) (required)

Optional

Option Description
-i, --systemInfo Print llama.cpp system info (default: false) (boolean)
--printTimings Print llama.cpp timings (default: false) (boolean)
-s [string], --systemPrompt [string] System prompt to use against the model (default: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.) (string)
--prompt [string] First prompt to automatically send to the model when starting the chat (string)
-w [string], --wrapper [string] Chat wrapper to use. Use `auto` to automatically select a wrapper based on the model's BOS token (default: general) (string)

choices: auto, general, llamaChat, chatML, falconChat

-c <number>, --contextSize <number> Context size to use for the model (default: 4096) (number)
-g [string], --grammar [string] Restrict the model response to a specific grammar, like JSON for example (default: text) (string)

choices: text, json, list, arithmetic, japanese, chess

--jsonSchemaGrammarFile [string], --jsgf [string] File path to a JSON schema file, to restrict the model response to only generate output that conforms to the JSON schema (string)
--threads <number> Number of threads to use for the evaluation of tokens (default: 6) (number)
-t <number>, --temperature <number> Temperature is a hyperparameter that controls the randomness of the generated text. It affects the probability distribution of the model's output tokens. A higher temperature (e.g., 1.5) makes the output more random and creative, while a lower temperature (e.g., 0.5) makes the output more focused, deterministic, and conservative. The suggested temperature is 0.8, which provides a balance between randomness and determinism. At the extreme, a temperature of 0 will always pick the most likely next token, leading to identical outputs in each run. Set to `0` to disable. (default: 0) (number)
-k <number>, --topK <number> Limits the model to consider only the K most likely next tokens for sampling at each step of sequence generation. An integer number between `1` and the size of the vocabulary. Set to `0` to disable (which uses the full vocabulary). Only relevant when `temperature` is set to a value greater than 0. (default: 40) (number)
-p <number>, --topP <number> Dynamically selects the smallest set of tokens whose cumulative probability exceeds the threshold P, and samples the next token only from this set. A float number between `0` and `1`. Set to `1` to disable. Only relevant when `temperature` is set to a value greater than `0`. (default: 0.95) (number)
--gpuLayers <number>, --gl <number> number of layers to store in VRAM (number)
--repeatPenalty <number>, --rp <number> Prevent the model from repeating the same token too much. Set to `1` to disable. (default: 1.1) (number)
--lastTokensRepeatPenalty <number>, --rpn <number> Number of recent tokens generated by the model to apply penalties to repetition of (default: 64) (number)
--penalizeRepeatingNewLine, --rpnl Penalize new line tokens. set "--no-penalizeRepeatingNewLine" or "--no-rpnl" to disable (default: true) (boolean)
--repeatFrequencyPenalty <number>, --rfp <number> For n time a token is in the `punishTokens` array, lower its probability by `n * repeatFrequencyPenalty`. Set to a value between `0` and `1` to enable. (number)
--repeatPresencePenalty <number>, --rpp <number> Lower the probability of all the tokens in the `punishTokens` array by `repeatPresencePenalty`. Set to a value between `0` and `1` to enable. (number)
--maxTokens <number>, --mt <number> Maximum number of tokens to generate in responses. Set to `0` to disable. Set to `-1` to set to the context size (default: 0) (number)
--noHistory, --nh Don't load or save chat history (default: false) (boolean)

Other

Option Description
-h, --help Show help
-v, --version Show version number