Using LLaMA.cpp models in Ruby

Diario del capitán, fecha estelar d398.y40/AB

Ruby Ruby on Rails Development AI
Backend Developer
Using LLaMA.cpp models in Ruby

In this blog post, we will teach you how to use LLaMa (Meta's AI) models using ruby for your applications and projects.

LLaMa is the Meta’s AI that “accidentally” had been shared by torrent and is now available to everyone. LLaMa.cpp is a project that provides a plain C/C++ implementation with optional 4-bit quantization support for faster, lower memory inference, and is optimized for desktop CPUs. Many cool projects have been made with LLaMa.cpp, such as GPT4ALL for instance.

A library in C++ can be used to create ports to other languages, and luckily, a port to Ruby has been made: yoshoku/llama_cpp.rb! This means that now we can directly use LLaMa models like Alpaca or Vicuna in Ruby (and Node.js too, using hlhr202/llama-node).

The instructions on how to run it were a little cryptic, and I couldn’t find a straightforward step-by-step tutorial on how to use it. But the process is already quite simple if you know the steps.

First of all, you need to have llama_cpp.rb in your project:

$ bundle add llama_cpp

Then you need to download the model; the easiest way is to download it from Hugging Face. All the Hugging Face models are stored in git and we can download them, but we need to keep in mind that models are very large (some GBs in size), so it is recommended to use the git large file storage.

To download ggml-vicuna-7b-4bit, for example, you can run:

$ git lfs install
$ git clone [email protected]:chharlesonfire/ggml-vicuna-7b-4bit ./models

Then it is ready to be used in your code. I’ll share with you an experiment that I’m doing in Jambots to implement it:

#!/usr/bin/env ruby

require "llama_cpp"
require "bundler/setup"

model_path = "./models/ggml-vicuna-7b-4bit/ggml-vicuna-7b-q4_0.bin"
prompt = <<~HEREDOC
        Transcript of a dialog, where the User interacts with an Assistant named Bob. Bob is helpful, kind, honest, good at writing, and never fails to answer the User’s requests immediately and with precision.
        User: Hello, Bob.
        Bob: Hello. How may I help you today?
        User: Please tell me the largest city in Europe.
        Bob: Sure. The largest city in Europe is Moscow, the capital of Russia.

if ARGV.empty?
        puts "Message required"

messages = ARGV[0]

client = LLaMACpp::Client.new(model_path: model_path, n_threads: 4, seed: 12)
output = client.completions("#{prompt} #{messages}")

puts output

It runs a bit slow on my computer, and the arguments of llama.cpp are a bit complicated, but at least now we can use this ecosystem in Ruby and Node.

Enjoy tinkering with it!


Compartir este post

Artículos relacionados

Car rear

So, do you guys do only Ruby for the backend?

This is a question we are asked all too frequently from outside the company. However, we recently asked this very question ourselves. Yes, we only do Ruby and that isn't going to change anytime soon.

Leer el artículo
Red keyboard

Building a Ruby CLI with Thor

We've written a Ruby CLI using Thor for a client project and we share everything we've learnt in this blog post!

Leer el artículo

Use GoLang code in Ruby

GoLang has the option to create shared libraries in C, and in this post I will show you how to do it.

Leer el artículo