llamafu / cognisoc

About

A mobile-first wrapper for llama.cpp.

llamafu is a Flutter FFI plugin that lets mobile app developers run large language models entirely on the device. It is published on pub.dev as llamafu, and under the hood it links to llama.cpp via a thin C++ layer with FFI bindings into Dart.

What it is, concretely

What it is not

Audience

llamafu is built for mobile app developers who want to add an on-device LLM feature without standing up a cloud inference stack: privacy-first apps, regulated industries, offline-capable products, on-edge tools, and anyone who would rather not pay per token forever.

Architecture, at a glance

Your Flutter App
   |
Llamafu Dart API   (high-level, typed, async)
   |
FFI bindings        (dart:ffi <-> C bridge)
   |
Native C++ layer    (RAII, memory safety, validation)
   |
llama.cpp engine    (GGUF loading, inference)

Capabilities

The plugin exposes the surface area you would expect from a llama.cpp-based runtime: streaming text generation, chat completions with history, embeddings for semantic search, vision/multimodal (LLaVA, Qwen2-VL), tool calling, structured JSON output with schema, GBNF grammar-constrained generation, LoRA adapter loading and hot-swapping, and fine-grained sampling controls (temperature, top-k, top-p, penalties). GPU acceleration is used where available.

License

MIT-licensed. The source lives at https://github.com/cognisoc/llamafu, and reference documentation is hosted at https://docs.cognisoc.com/llamafu/.