Blog
Notes from on-device land.
Field-notes on shipping GGUF models on Android and iOS with llamafu.
- structured-output · tool-calling
Structured output on-device with grammars and schemas
How llamafu uses GBNF grammars and JSON schemas to get reliable structured output from quantized models running on a phone.
- gguf · performance
Picking a GGUF quantization for mobile
Q4_K_M, Q4_0, Q8_0 — what they mean, when to pick which, and how llamafu lets you load any of them.
- on-device · mobile
Why on-device LLMs are no longer a science project
Four practical reasons to move LLM inference onto the phone, and what llamafu gives you to do it on Flutter.