node-llama-cpp v3.0

September 23, 2024

node-llama-cpp 3.0 is finally here.

With node-llama-cpp, you can run large language models locally on your machine using the power of llama.cpp with a simple and easy-to-use API.

It includes everything you need, from downloading models, to running them in the most optimized way for your hardware, and integrating them in your projects.

Why `node-llama-cpp`?

You might be wondering, why choose node-llama-cpp over using an OpenAI API of a service running on your machine?

The answer is simple: simplicity, performance, and flexibility.

Let's break it down:

Simplicity

To use node-llama-cpp, you install it like any other npm package, and you're good to go.

To run your project, all you have to do is npm install and npm start. That's it.

No installing additional software on your machine, no setting up API keys or environment variables, no setup process at all. Everything is self-contained in your project, giving you complete control over it.

With node-llama-cpp, you can run large language models on your machine using Node.js and TypeScript, without any Python at all. Say goodbye to setup headaches, "it works on my machine" issues, and all other Python-related problems.

While llama.cpp is an amazing project, it's also highly technical and can be challenging for beginners. node-llama-cpp bridge that gap, making llama.cpp accessible to everyone, regardless of their experience level.

Performance

node-llama-cpp is built on top of llama.cpp, a highly optimized C++ library for running large language models.

llama.cpp supports many compute backends, including Metal, CUDA, and Vulkan. It also uses Accelerate on Mac.

node-llama-cpp automatically adapts to your hardware and adjusts the default settings to give you the best performance, so you don't have to configure anything to use it.

By using node-llama-cpp you are essentially running models inside your project. With no overhead of network calls or data serializations, you can more effectively take advantage of the stateful nature of inference operations.

For example, you can prompt a model on top of an existing conversation inference state, without re-evaluating the entire history just to process the new prompt.
This reduces the time it takes to start generating a response, and makes more efficient use of your resources.

If you were using an API, you would have to re-evaluate the entire history every time you prompt the model, or have the API store the state for you, which can use huge amounts of disk space.

Flexibility

Since node-llama-cpp runs inside your project, you can also deploy it together with your project.
You can run models in your Electron app without requiring any additional setup on the user's machine.

You can build libraries that use large language models and distribute them as npm packages,
or deploy self-contained Docker images and run them on any hardware you want.

You can use any model you want, or even create your own and use it with node-llama-cpp.
Download models as part of npm install or on-demand from your code.

Tweak inference settings to get better results for your particular use case.

node-llama-cpp is regularly updated with the latest llama.cpp release, but you can also download and build the latest release at any time with a single command.

The possibilities are endless. You have full control over the models you use, how you use them, and where you use them. You can tailor node-llama-cpp to your needs in ways that aren't possible with an OpenAI API (at least not efficiently or easily).

Powerful Features

node-llama-cpp includes a complete suite of everything you need to use large language models in your projects, with convenient wrappers for popular tasks, such as:

Enforcing a JSON schema on the output the model generates
Providing the model with functions it can call on demand to retrieve information or perform actions, even with some models that don't officially support it
Generating completion for a given text
Embedding text for similarity searches or other tasks
Much more

Why Node.js?

JavaScript is the most popular programming language in the world, and Node.js is the most popular runtime for JavaScript server-side applications. Developers choose Node.js for its versatility, reliability, ease of use, forward compatibility, and the vast ecosystem of npm packages.

While Python is currently the go-to language for data science and machine learning, the needs of data scientists differ from those of developers building services and applications.

node-llama-cpp bridges this gap, making it easier to integrate large language models into Node.js and Electron projects, while focusing on the needs of developers building services and applications.

Try It Out

node-llama-cpp comes with comprehensive documentation, covering everything from installation to advanced usage. It's beginner-friendly, with explanations for every step of the way for those who are new to the world of large language models, while still being flexible enough to allow advanced usage for those who are more experienced and knowledgeable.

Experience the ease of running models on your machine with this single command:

shell

npx -y node-llama-cpp chat

Check out the getting started guide to learn how to use node-llama-cpp.

Thank You

node-llama-cpp is only possible thanks to the amazing work done on llama.cpp by Georgi Gerganov, Slaren and all the contributors from the community.

What's next?

Version 3.0 is a major milestone, but there's plenty more planned for the future.

Check out the roadmap to see what's coming next,
and give node-llama-cpp a star on GitHub to support the project.

node-llama-cpp v3.0 ​

Why node-llama-cpp? ​

Simplicity ​

Performance ​

Flexibility ​

Powerful Features ​

Why Node.js? ​

Try It Out ​

Thank You ​

What's next? ​