Compare
llamafu vs MLC LLM
Two open-source approaches to running LLMs on mobile. Where they overlap, where they differ, and which one fits a Flutter codebase.
TL;DR
- llamafu is a Flutter FFI plugin that wraps llama.cpp and loads models in the GGUF format. Target platforms: Android (API 21+) and iOS (12.0+).
- MLC LLM is a separate open-source project that uses Apache TVM to compile LLMs to native code for a wide range of hardware targets, including mobile.
- They are not the same runtime. They use different engines, different model formats, and different integration paths.
This page is a grounded comparison, focused on the parts of the decision that matter when you’re picking an SDK for a Flutter app. For deep benchmarks, run them on your own device matrix.
Runtime engine
- llamafu: built on llama.cpp, the inference engine maintained by
the ggerganov community. llama.cpp is a C/C++ implementation of
transformer inference with hand-tuned kernels for CPU and selective
GPU backends. llamafu links it through a small C++ layer and exposes
it to Dart via
dart:ffi. - MLC LLM: built on Apache TVM, a deep-learning compiler stack. Models are compiled ahead of time into device-specific code for the target hardware.
These are genuinely different philosophies. llama.cpp is “one runtime, many models.” MLC LLM is “one model, many compiled runtimes.” Both work; they have different trade-offs around model swap, device coverage, and update flow.
Model format
- llamafu: GGUF, the format used across the llama.cpp ecosystem. Quantizations like Q4_K_M, Q4_0, Q5_K_M, Q8_0. Hugging Face has thousands of GGUF files ready to download.
- MLC LLM: its own pipeline. Models are typically converted and compiled through MLC’s tooling before they can be loaded by the runtime.
If your team wants to point at a GGUF file on Hugging Face and have it load, llamafu is the shorter path. If you’re willing to invest in a compile-and-ship pipeline for a specific model, MLC LLM gives you a different set of optimization knobs.
Platform integration
- llamafu: Flutter-first. You
flutter pub add llamafu, you callLlamafu.init(...), you get a Dart object. Android NDK 21+, iOS Xcode 14+, native code linked through the plugin. - MLC LLM: ships native iOS and Android example apps and provides bindings in several languages. Flutter integration is a wrapper you build (or find), not a first-class supported surface.
For a team that lives in Dart and ships through Flutter, this is the
biggest practical difference. llamafu fits inside pubspec.yaml.
Feature surface
llamafu exposes the surface area its README documents directly:
- Text generation with streaming.
- Chat completions with conversation history.
- Embeddings.
- Vision / multimodal (LLaVA, Qwen2-VL).
- Tool calling / function calling.
- Structured JSON output with schema validation.
- GBNF grammar-constrained generation.
- LoRA adapter loading and hot-swapping.
- Fine-grained sampling: temperature, top-k, top-p, repeat penalties.
- Tokenize / detokenize APIs and model info.
MLC LLM supports a feature set that is project- and version-specific, and the two projects’ feature surfaces are not in lock-step. For any specific feature you depend on, check both projects’ current docs.
Hardware acceleration
- llamafu: GPU acceleration “where available,” per the README. Concretely, this means it inherits the backends llama.cpp supports on the device — typically CPU with selective GPU offload where the platform allows it.
- MLC LLM: TVM’s compilation step targets the device GPU/NPU directly, which is one of its design goals. On supported devices, this can be a meaningful throughput advantage.
If maximum throughput on a specific device is your top constraint, benchmark both. If portability across a wide device matrix is your top constraint, the llama.cpp ecosystem has been beaten on more hardware than almost anything else in this space.
When to pick llamafu
- You’re building a Flutter app and want a pub.dev package, not a native integration project.
- You want to load GGUF files directly, swap them at runtime, and pick quantization independently of code changes.
- You need structured output, tool calling, LoRA adapters, and multimodal in one consistent Dart API.
- You want to stay aligned with the broad llama.cpp ecosystem.
When to pick MLC LLM
- You’re not on Flutter and you’re comfortable wiring native bindings.
- You’re optimizing for a specific device GPU/NPU and want compile-time scheduling.
- You’ve already invested in TVM and want to reuse that tooling.
Bottom line
Both projects are legitimate paths to running LLMs on phones. They sit in different parts of the design space and they don’t try to do the same thing. If you’re shipping a Flutter app, llamafu is the plugin-shaped answer; MLC LLM is the engine-shaped answer, and plugging it into Flutter is on you.
← Back to home