Flutter FFI plugin · Android & iOS
On-device LLMs
for mobile apps.
Run AI models directly on mobile devices. No cloud. No latency. Complete privacy. llamafu is a Flutter plugin built on llama.cpp for running GGUF models locally on Android (API 21+) and iOS (12.0+).
final llamafu = await Llamafu.init(
modelPath: '/path/to/model.gguf',
threads: 4,
contextSize: 2048,
);
final result = await llamafu.complete(
prompt: 'Explain quantum computing:',
maxTokens: 256,
);
llamafu.close(); Why on-device?
Four reasons mobile teams ship llamafu instead of calling a cloud API.
100% On-Device
No API keys, no network calls, works offline.
Privacy First
User data never leaves the device.
Low Latency
No round-trip to a cloud server.
Cost Effective
No per-token charges as you scale.
Core
- Text generation with streaming
- Chat completions with history
- Embeddings for semantic search
- Tokenize / detokenize APIs
Advanced
- Vision / multimodal (LLaVA, Qwen2-VL)
- Tool calling & function calls
- Structured JSON with schema
- GBNF grammar-constrained output
Customization
- LoRA adapter load & hot-swap
- Temperature, top-k, top-p, penalties
- Configurable context size + threads
- GPU acceleration where available
Bring your own GGUF
llamafu loads any model in the GGUF format used by llama.cpp. Pick a quantization that fits your device class.
LLaMA 3, Mistral, Phi-3, Qwen2, Gemma 2
Code LLaMA, DeepSeek Coder, StarCoder2
LLaVA, Qwen2-VL, Moondream
Phi-3 Mini, TinyLlama, Gemma 2B
Recommended quantizations for mobile: Q4_K_M
for balanced quality, Q4_0 for speed,
Q8_0 when you have headroom.
Ship an offline AI feature this sprint.
Add llamafu to your Flutter app, load a GGUF model, call complete().
No keys, no servers, no per-token bill.