Fine-Tuning

Data Engineering SFT: NEFTune & SemDeDup | SLM Playbook

← Series hub ← Previous | Next → In the era of LLMs/SLMs, the classic data science proverb: “Garbage In, Garbage Out” has never been more relevant. When performing Supervised Fine-Tuning (SFT) for Small Language Models (SLMs), data quality and format dictate over 90% of the model’s downstream capabilities. Feeding millions of raw, web-scraped dialogue pairs or low-quality synthetic data directly into your model will overfit it to repetitive phrasing, restrict its reasoning capabilities, and waste thousands of GPU hours. ...

Practical QLoRA Fine-tuning: Axolotl & Unsloth | SLM Playbook

QLoRA fine-tuning lets you adapt a multi-billion parameter model on a single consumer GPU — like an RTX 3090 or A10G — by combining LoRA adapter training with 4-bit NF4 quantization. This article covers the math, a production Axolotl YAML config, and Unsloth integration for 3x training speedup. ← Series hub ← Previous | Next → 1. LoRA: Low-Rank Adaptation Matrix Decomposition LoRA reduces fine-tuning cost by freezing all original model weights and training only two small adapter matrices (A and B) of rank r — typically 8–64. This cuts trainable parameters by over 99% versus full fine-tuning with near-zero performance loss. ...

Prompt Engineering vs Fine-Tuning: 2026 Decision Guide

Answer-first: Choose prompt engineering for rapid prototyping and general domains. Deploy RAG when your application requires real-time retrieval from a frequently updated knowledge base. Commit to QLoRA fine-tuning only when you need strict output formatting, persistent style compliance under adversarial input, or significant prompt token compression. What You’ll Learn That AI Won’t Tell You Production cost-benefit thresholds comparing fine-tuning a 7B model locally versus calling proprietary APIs for structured schema generation. How to structure prompt engineering to handle 95% of e-commerce intent recognition, and the exact boundary where fine-tuning becomes cost-effective. Three engineers on the same team are trying to build the same thing: a customer support assistant that answers questions in the company’s specific support style, using terminology from their product documentation. One engineer says “just write a better system prompt.” Another says “we need to fine-tune a model.” The third says “this is clearly a RAG problem.” ...