Go to file
Alex Rozanski 713b65df3e add back Makefile and quantize.cpp in correct locations 2023-03-16 02:30:13 +01:00
.github/workflows Add windows to the CI (#98) 2023-03-13 22:29:10 +02:00
Sources add back Makefile and quantize.cpp in correct locations 2023-03-16 02:30:13 +01:00
llama.xcodeproj add back Makefile and quantize.cpp in correct locations 2023-03-16 02:30:13 +01:00
llamaTest improve output 2023-03-14 11:45:59 +01:00
tools add back Makefile and quantize.cpp in correct locations 2023-03-16 02:30:13 +01:00
.gitignore add basic Xcode project and include cpp files 2023-03-14 11:45:59 +01:00
LICENSE update license file 2023-03-14 11:53:49 +01:00
Package.swift add back Makefile and quantize.cpp in correct locations 2023-03-16 02:30:13 +01:00
README.md improve README 2023-03-14 12:03:12 +01:00
convert-pth-to-ggml.py Fix UTF-8 handling (including colors) (#79) 2023-03-13 18:24:18 +02:00

README.md

llama.swift

License: MIT

A fork of @ggerganov's llama.cpp to use Facebook's LLaMA in Swift.

Description

See the main repository for info about the C++ implementation.

Setup

Here are the step for the LLaMA-7B model:

# build this repo
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

# install Python dependencies
python3 -m pip install torch numpy sentencepiece

# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1

# quantize the model to 4-bits
./quantize.sh 7B

# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128

When running the larger models, make sure you have enough disk space to store all the intermediate files.

Building

For now, compile from source. Will add other distribution channels shortly.

NB: Ensure to build llama.framework for Release for snappiness; Debug builds are super slow.

Usage

In Swift:

let url = ... // URL to the model file, as per llama.cpp
let llama = LlamaRunner(modelURL: url)

llama.run(
  with: "Building a website can be done in 10 simple steps:",
  config: LlamaRunner.Config(numThreads: 8, numTokens: 512) // Can also specify `reversePrompt`
  tokenHandler: { token in
    // If printing tokens directly use `terminator: ""` as the tokens include whitespace and newlines.
    print(token, terminator: "")
  },
  stateChangeHandler: { state in
    switch state {
    case .notStarted:
      // ...
      break
    case .initializing:
      // ...
      break
    case .generatingOutput:
      // ...
      break 
    case .completed:
      // ...
      break
    case .failed(error: let error):
      // ...
      break
    }
  })

Using the llamaTest app:

  • Set MODEL_PATH in LlamaTest.xcconfig to point to your path/to/ggml-model-q4_0.bin, then build & run for interactive prompt generation.
  • Ensure to build for Release if you want this to be snappy.

Misc