`chat` command

Chat with a Llama model

Usage

shell

npx --no node-llama-cpp chat

npx --no node-llama-cpp chat

Options

Required

Option	Description
`-m [string]`, `--model [string]`	Llama model file to use for the chat (`string`) (`required`)

Optional

Option	Description
`-i`, `--systemInfo`	Print llama.cpp system info (default: `false`) (`boolean`)
`--printTimings`	Print llama.cpp timings (default: `false`) (`boolean`)
`-s [string]`, `--systemPrompt [string]`	System prompt to use against the model (default: `You are a helpful, respectful and honest assistant. Always answer as helpfully as possible. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.`) (`string`)
`--prompt [string]`	First prompt to automatically send to the model when starting the chat (`string`)
`-w [string]`, `--wrapper [string]`	Chat wrapper to use. Use `auto` to automatically select a wrapper based on the model's BOS token (default: `general`) (`string`) choices: `auto`, `general`, `llamaChat`, `chatML`, `falconChat`
`-c <number>`, `--contextSize <number>`	Context size to use for the model (default: `4096`) (`number`)
`-g [string]`, `--grammar [string]`	Restrict the model response to a specific grammar, like JSON for example (default: `text`) (`string`) choices: `text`, `json`, `list`, `arithmetic`, `japanese`, `chess`
`--jsonSchemaGrammarFile [string]`, `--jsgf [string]`	File path to a JSON schema file, to restrict the model response to only generate output that conforms to the JSON schema (`string`)
`--threads <number>`	Number of threads to use for the evaluation of tokens (default: `6`) (`number`)
`-t <number>`, `--temperature <number>`	Temperature is a hyperparameter that controls the randomness of the generated text. It affects the probability distribution of the model's output tokens. A higher temperature (e.g., 1.5) makes the output more random and creative, while a lower temperature (e.g., 0.5) makes the output more focused, deterministic, and conservative. The suggested temperature is 0.8, which provides a balance between randomness and determinism. At the extreme, a temperature of 0 will always pick the most likely next token, leading to identical outputs in each run. Set to `0` to disable. (default: `0`) (`number`)
`-k <number>`, `--topK <number>`	Limits the model to consider only the K most likely next tokens for sampling at each step of sequence generation. An integer number between `1` and the size of the vocabulary. Set to `0` to disable (which uses the full vocabulary). Only relevant when `temperature` is set to a value greater than 0. (default: `40`) (`number`)
`-p <number>`, `--topP <number>`	Dynamically selects the smallest set of tokens whose cumulative probability exceeds the threshold P, and samples the next token only from this set. A float number between `0` and `1`. Set to `1` to disable. Only relevant when `temperature` is set to a value greater than `0`. (default: `0.95`) (`number`)
`--gpuLayers <number>`, `--gl <number>`	number of layers to store in VRAM (`number`)
`--repeatPenalty <number>`, `--rp <number>`	Prevent the model from repeating the same token too much. Set to `1` to disable. (default: `1.1`) (`number`)
`--lastTokensRepeatPenalty <number>`, `--rpn <number>`	Number of recent tokens generated by the model to apply penalties to repetition of (default: `64`) (`number`)
`--penalizeRepeatingNewLine`, `--rpnl`	Penalize new line tokens. set "--no-penalizeRepeatingNewLine" or "--no-rpnl" to disable (default: `true`) (`boolean`)
`--repeatFrequencyPenalty <number>`, `--rfp <number>`	For n time a token is in the `punishTokens` array, lower its probability by `n * repeatFrequencyPenalty`. Set to a value between `0` and `1` to enable. (`number`)
`--repeatPresencePenalty <number>`, `--rpp <number>`	Lower the probability of all the tokens in the `punishTokens` array by `repeatPresencePenalty`. Set to a value between `0` and `1` to enable. (`number`)
`--maxTokens <number>`, `--mt <number>`	Maximum number of tokens to generate in responses. Set to `0` to disable. Set to `-1` to set to the context size (default: `0`) (`number`)
`--noHistory`, `--nh`	Don't load or save chat history (default: `false`) (`boolean`)

Other

Option	Description
`-h`, `--help`	Show help
`-v`, `--version`	Show version number

chat command ​

Usage ​

Options

Required

Optional

Other

`chat` command

Usage