Unsloth

QLoRA fine-tuning lets you adapt a multi-billion parameter model on a single consumer GPU — like an RTX 3090 or A10G — by combining LoRA adapter training with 4-bit NF4 quantization. This article covers the math, a production Axolotl YAML config, and Unsloth integration for 3x training speedup. ← Series hub ← Previous | Next → 1. LoRA: Low-Rank Adaptation Matrix Decomposition LoRA reduces fine-tuning cost by freezing all original model weights and training only two small adapter matrices (A and B) of rank r — typically 8–64. This cuts trainable parameters by over 99% versus full fine-tuning with near-zero performance loss. ...