Compare
llamafu vs raw llama.cpp mobile bindings
If llamafu wraps llama.cpp, what does the wrapper buy you over linking llama.cpp directly into your mobile app?
TL;DR
- llamafu is a Flutter FFI plugin that wraps llama.cpp.
- Raw llama.cpp mobile is the engine itself — a C/C++ library you can build and link into an Android or iOS app directly.
- llamafu doesn’t replace llama.cpp. It packages it for Flutter and adds a Dart API, type safety, an FFI bridge, and product-shaped features like structured JSON, tool calling, and LoRA hot-swap.
This page is for teams asking the obvious question: “if llama.cpp is open source, why not just link it ourselves?”
What raw llama.cpp gives you
llama.cpp is the inference engine. It loads GGUF models, runs the transformer, exposes a C API for completions, sampling, embeddings, and a few other primitives. It’s MIT-licensed, builds on Android (NDK) and iOS (Xcode), and supports a deep matrix of quantizations.
If you build it yourself for a mobile app you get:
- The same inference engine llamafu uses.
- Direct control over build flags, kernels, and backends.
- No intermediary — you call into the C API and decide how to bridge it to your application.
You also take on:
- The build system. Cross-compiling llama.cpp for Android arm64 and iOS arm64 is a non-trivial project on day one.
- The bridge to your application language. On Flutter, that means hand-writing FFI bindings, managing memory across the Dart/native boundary, and handling threading.
- The product layer. llama.cpp has a sampler. It doesn’t have “give me JSON that matches this schema.”
What llamafu adds on top
llamafu’s job is to turn llama.cpp into a Flutter package. Concretely:
1. A Flutter plugin you can pub add
A single command — flutter pub add llamafu — installs the Dart code,
the native plugin, and the build wiring for both platforms. The
README’s minimum versions hold: Flutter 3.10+, Dart 3.1+, Android API
21+ with NDK 21+, iOS 12.0+ with Xcode 14+. You don’t write any of
that integration code.
2. A typed Dart API
The llamafu Dart API exposes the engine through types. Llamafu.init
returns a managed handle. complete(), chat(), embed(),
tokenize(), getModelInfo() are real methods with typed parameters.
Errors come back as LlamafuException with structured codes
(modelLoadFailed, outOfMemory, etc.) instead of integer return
codes you’d parse out of the C API yourself.
3. An FFI bridge that handles the memory hard parts
dart:ffi is powerful and unforgiving. You have to think about pointer
lifetimes, finalizers, string encoding, struct layouts, and the
threading rules at the Dart/native boundary. llamafu’s
llamafu_bindings.dart handles that. The native C++ layer
(llamafu.cpp) uses RAII and validation so that mistakes on the Dart
side don’t end up as segfaults in production. That’s the layer most
teams underestimate when they consider linking llama.cpp directly.
4. Product-shaped features
llama.cpp gives you a sampler. llamafu gives you APIs:
generateJson(prompt, schema)— structured output, schema-validated.completeWithGrammar(prompt, grammarStr, grammarRoot)— raw GBNF grammars when JSON isn’t the shape you want.generateToolCall(prompt, tools)— function-call planner.loadLoraAdapter,applyLoraAdapter,removeLoraAdapter— LoRA hot-swap at runtime.multimodalComplete(prompt, mediaInputs)— vision-model invocation with LLaVA / Qwen2-VL.
These are not new ideas, but they are the kind of thing you’d otherwise build on top of llama.cpp yourself. llamafu has built them, so you don’t have to.
5. A sane build flow
make build-android produces an AAR. make build-ios produces an
iOS framework. make build-local builds with GPU support for local
development. You don’t have to write a CMake toolchain from scratch
to get a working build on both platforms.
When raw llama.cpp is the right call
There are still reasons to integrate llama.cpp directly:
- You’re not on Flutter. The whole pitch of llamafu is the Flutter package. If you’re a native Swift or Kotlin app, you’re closer to llama.cpp’s natural language anyway.
- You need a specific build flag, custom kernel, or experimental backend that the upstream plugin hasn’t exposed.
- You’re shipping a hybrid app where most of the inference logic is already native and Flutter is just the UI shell.
For a pure Flutter app that wants to ship an on-device LLM feature, llamafu is the package-shaped path; rolling your own is the project- shaped path, and the project gets large fast.
A worked example
The README’s “Quick Start” is the entire integration on the llamafu side:
import 'package:llamafu/llamafu.dart';
void main() async {
final llamafu = await Llamafu.init(
modelPath: '/path/to/model.gguf',
threads: 4,
contextSize: 2048,
);
final result = await llamafu.complete(
prompt: 'Explain quantum computing in simple terms:',
maxTokens: 256,
temperature: 0.7,
);
print(result);
llamafu.close();
}
The equivalent on raw llama.cpp involves a CMake project, NDK and Xcode toolchains, FFI bindings, error handling, memory management, and the structured-output features built on top. Every team that has done that integration once has an opinion about how long it takes; no team that has done it twice would do it a third time without a reason.
Bottom line
llama.cpp is the engine. llamafu is the package. Pick the engine when you need engine-level control. Pick the package when you want to ship an on-device LLM feature on Flutter without rebuilding the wrapper layer yourself.
← Back to home