llamafu / cognisoc

Compare

llamafu vs raw llama.cpp mobile bindings

If llamafu wraps llama.cpp, what does the wrapper buy you over linking llama.cpp directly into your mobile app?


TL;DR

This page is for teams asking the obvious question: “if llama.cpp is open source, why not just link it ourselves?”

What raw llama.cpp gives you

llama.cpp is the inference engine. It loads GGUF models, runs the transformer, exposes a C API for completions, sampling, embeddings, and a few other primitives. It’s MIT-licensed, builds on Android (NDK) and iOS (Xcode), and supports a deep matrix of quantizations.

If you build it yourself for a mobile app you get:

You also take on:

What llamafu adds on top

llamafu’s job is to turn llama.cpp into a Flutter package. Concretely:

1. A Flutter plugin you can pub add

A single command — flutter pub add llamafu — installs the Dart code, the native plugin, and the build wiring for both platforms. The README’s minimum versions hold: Flutter 3.10+, Dart 3.1+, Android API 21+ with NDK 21+, iOS 12.0+ with Xcode 14+. You don’t write any of that integration code.

2. A typed Dart API

The llamafu Dart API exposes the engine through types. Llamafu.init returns a managed handle. complete(), chat(), embed(), tokenize(), getModelInfo() are real methods with typed parameters. Errors come back as LlamafuException with structured codes (modelLoadFailed, outOfMemory, etc.) instead of integer return codes you’d parse out of the C API yourself.

3. An FFI bridge that handles the memory hard parts

dart:ffi is powerful and unforgiving. You have to think about pointer lifetimes, finalizers, string encoding, struct layouts, and the threading rules at the Dart/native boundary. llamafu’s llamafu_bindings.dart handles that. The native C++ layer (llamafu.cpp) uses RAII and validation so that mistakes on the Dart side don’t end up as segfaults in production. That’s the layer most teams underestimate when they consider linking llama.cpp directly.

4. Product-shaped features

llama.cpp gives you a sampler. llamafu gives you APIs:

These are not new ideas, but they are the kind of thing you’d otherwise build on top of llama.cpp yourself. llamafu has built them, so you don’t have to.

5. A sane build flow

make build-android produces an AAR. make build-ios produces an iOS framework. make build-local builds with GPU support for local development. You don’t have to write a CMake toolchain from scratch to get a working build on both platforms.

When raw llama.cpp is the right call

There are still reasons to integrate llama.cpp directly:

For a pure Flutter app that wants to ship an on-device LLM feature, llamafu is the package-shaped path; rolling your own is the project- shaped path, and the project gets large fast.

A worked example

The README’s “Quick Start” is the entire integration on the llamafu side:

import 'package:llamafu/llamafu.dart';

void main() async {
  final llamafu = await Llamafu.init(
    modelPath: '/path/to/model.gguf',
    threads: 4,
    contextSize: 2048,
  );

  final result = await llamafu.complete(
    prompt: 'Explain quantum computing in simple terms:',
    maxTokens: 256,
    temperature: 0.7,
  );

  print(result);
  llamafu.close();
}

The equivalent on raw llama.cpp involves a CMake project, NDK and Xcode toolchains, FFI bindings, error handling, memory management, and the structured-output features built on top. Every team that has done that integration once has an opinion about how long it takes; no team that has done it twice would do it a third time without a reason.

Bottom line

llama.cpp is the engine. llamafu is the package. Pick the engine when you need engine-level control. Pick the package when you want to ship an on-device LLM feature on Flutter without rebuilding the wrapper layer yourself.


← Back to home