Practical QLoRA Fine-tuning: Axolotl & Unsloth | SLM Playbook

← Series hub ← Previous | Next → Full-parameter fine-tuning of a large language model is a luxury. For even an 8B model like Llama 3, updating all weights in 16-bit precision requires massive clusters far beyond the reach of mid-sized teams or startups. To resolve these hardware barriers, Parameter-Efficient Fine-Tuning (PEFT) methods were developed, with LoRA and QLoRA emerging as the dominant paradigms. They allow developers to train multi-billion parameter models on a single consumer GPU (like an RTX 3090, 4090, or A10G) while maintaining near-zero performance degradation compared to full tuning. ...

May 23, 2026 · 7 min · Lê Tuấn Anh

Fine-Tune vs Prompt-Engineer an LLM: Decision Guide

Answer-first: A clear decision framework for AI engineers: when to fine-tune (LoRA/QLoRA), when to prompt-engineer, and when RAG is the right answer instead. Three engineers on the same team are trying to build the same thing: a customer support assistant that answers questions in the company’s specific support style, using terminology from their product documentation. One engineer says “just write a better system prompt.” Another says “we need to fine-tune a model.” The third says “this is clearly a RAG problem.” ...

June 1, 2026 · 12 min · Lê Tuấn Anh