About
A mobile-first wrapper for llama.cpp.
llamafu is a Flutter FFI plugin that lets mobile app developers run large
language models entirely on the device. It is published on pub.dev as
llamafu, and under the hood it links to
llama.cpp via a thin C++
layer with FFI bindings into Dart.
What it is, concretely
- A Flutter plugin (Dart 3.1+, Flutter 3.10+) targeting Android API 21+ and iOS 12.0+.
- A native C++ layer (
llamafu.cpp) that wraps llama.cpp with RAII and validation. - A Dart API (
Llamafu.init,complete,generateJson, etc.) accessed throughdart:ffi. - An inference engine that loads models in the GGUF format used by llama.cpp.
What it is not
- It is not a hosted inference API. There is no server component.
- It is not a model. You bring your own GGUF file (LLaMA 3, Mistral, Phi-3, Qwen2, Gemma, LLaVA, etc.).
- It is not a desktop or web SDK. The platform support matrix is Android and iOS.
- It is not ONNX, CoreML, or TFLite based. The runtime backend is llama.cpp.
Audience
llamafu is built for mobile app developers who want to add an on-device LLM feature without standing up a cloud inference stack: privacy-first apps, regulated industries, offline-capable products, on-edge tools, and anyone who would rather not pay per token forever.
Architecture, at a glance
Your Flutter App
|
Llamafu Dart API (high-level, typed, async)
|
FFI bindings (dart:ffi <-> C bridge)
|
Native C++ layer (RAII, memory safety, validation)
|
llama.cpp engine (GGUF loading, inference) Capabilities
The plugin exposes the surface area you would expect from a llama.cpp-based runtime: streaming text generation, chat completions with history, embeddings for semantic search, vision/multimodal (LLaVA, Qwen2-VL), tool calling, structured JSON output with schema, GBNF grammar-constrained generation, LoRA adapter loading and hot-swapping, and fine-grained sampling controls (temperature, top-k, top-p, penalties). GPU acceleration is used where available.
License
MIT-licensed. The source lives at https://github.com/cognisoc/llamafu, and reference documentation is hosted at https://docs.cognisoc.com/llamafu/.