chat
command
Chat with a Llama model
Usage
shell
npx --no node-llama-cpp chat
npx --no node-llama-cpp chat
Options
Required
Option | Description |
---|---|
-m [string] , --model [string] |
Llama model file to use for the chat (string ) (required ) |
Optional
Option | Description |
---|---|
-i , --systemInfo |
Print llama.cpp system info (default: false ) (boolean ) |
--printTimings |
Print llama.cpp timings (default: false ) (boolean ) |
-s [string] , --systemPrompt [string] |
System prompt to use against the model (default: You are a helpful, respectful and honest assistant. Always answer as helpfully as possible.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. ) (string ) |
--prompt [string] |
First prompt to automatically send to the model when starting the chat (string ) |
-w [string] , --wrapper [string] |
Chat wrapper to use. Use `auto` to automatically select a wrapper based on the model's BOS token (default: general ) (string )
|
-c <number> , --contextSize <number> |
Context size to use for the model (default: 4096 ) (number ) |
-g [string] , --grammar [string] |
Restrict the model response to a specific grammar, like JSON for example (default: text ) (string )
|
--jsonSchemaGrammarFile [string] , --jsgf [string] |
File path to a JSON schema file, to restrict the model response to only generate output that conforms to the JSON schema (string ) |
--threads <number> |
Number of threads to use for the evaluation of tokens (default: 6 ) (number ) |
-t <number> , --temperature <number> |
Temperature is a hyperparameter that controls the randomness of the generated text. It affects the probability distribution of the model's output tokens. A higher temperature (e.g., 1.5) makes the output more random and creative, while a lower temperature (e.g., 0.5) makes the output more focused, deterministic, and conservative. The suggested temperature is 0.8, which provides a balance between randomness and determinism. At the extreme, a temperature of 0 will always pick the most likely next token, leading to identical outputs in each run. Set to `0` to disable. (default: 0 ) (number ) |
-k <number> , --topK <number> |
Limits the model to consider only the K most likely next tokens for sampling at each step of sequence generation. An integer number between `1` and the size of the vocabulary. Set to `0` to disable (which uses the full vocabulary). Only relevant when `temperature` is set to a value greater than 0. (default: 40 ) (number ) |
-p <number> , --topP <number> |
Dynamically selects the smallest set of tokens whose cumulative probability exceeds the threshold P, and samples the next token only from this set. A float number between `0` and `1`. Set to `1` to disable. Only relevant when `temperature` is set to a value greater than `0`. (default: 0.95 ) (number ) |
--gpuLayers <number> , --gl <number> |
number of layers to store in VRAM (number ) |
--repeatPenalty <number> , --rp <number> |
Prevent the model from repeating the same token too much. Set to `1` to disable. (default: 1.1 ) (number ) |
--lastTokensRepeatPenalty <number> , --rpn <number> |
Number of recent tokens generated by the model to apply penalties to repetition of (default: 64 ) (number ) |
--penalizeRepeatingNewLine , --rpnl |
Penalize new line tokens. set "--no-penalizeRepeatingNewLine" or "--no-rpnl" to disable (default: true ) (boolean ) |
--repeatFrequencyPenalty <number> , --rfp <number> |
For n time a token is in the `punishTokens` array, lower its probability by `n * repeatFrequencyPenalty`. Set to a value between `0` and `1` to enable. (number ) |
--repeatPresencePenalty <number> , --rpp <number> |
Lower the probability of all the tokens in the `punishTokens` array by `repeatPresencePenalty`. Set to a value between `0` and `1` to enable. (number ) |
--maxTokens <number> , --mt <number> |
Maximum number of tokens to generate in responses. Set to `0` to disable. Set to `-1` to set to the context size (default: 0 ) (number ) |
--noHistory , --nh |
Don't load or save chat history (default: false ) (boolean ) |
Other
Option | Description |
---|---|
-h , --help |
Show help |
-v , --version |
Show version number |