Previous All Posts Next

AI Fine-Tuning Guide: How to Train Custom LLMs on Your Data

Posted: March 4, 2026 to Technology.

AI Fine-Tuning Guide: How to Train Custom LLMs on Your Data

Fine-tuning is where AI stops being a generic tool and becomes a competitive advantage specific to your business. A base model like Llama 3 or Mistral knows a lot about the world in general, but it knows nothing about your specific industry terminology, your internal processes, your compliance requirements, or the way your customers communicate. Fine-tuning bridges that gap, creating a model that responds as if it were trained by your best subject matter experts.

Over the past two years at Petronella Technology Group, we have fine-tuned dozens of models for clients in healthcare, legal, defense contracting, and financial services. This guide distills that experience into practical steps you can follow, whether you are a developer looking to fine-tune your first model or a business leader evaluating whether fine-tuning makes sense for your organization.

What Fine-Tuning Actually Does

A large language model learns patterns from massive amounts of text during its initial training. Fine-tuning takes a pre-trained model and adjusts its weights using a smaller, domain-specific dataset. The result is a model that retains the general capabilities of the base model while becoming significantly better at tasks relevant to your data.

Think of it like hiring an experienced generalist and then training them on your specific business. The generalist already knows how to read, write, analyze, and reason. Fine-tuning teaches them your vocabulary, your standards, and your expectations.

What Fine-Tuning Can Do

  • Teach the model industry-specific terminology and concepts
  • Align outputs with your formatting and style requirements
  • Improve accuracy on domain-specific questions and tasks
  • Reduce hallucinations in your area of expertise
  • Create a model that follows your specific instructions consistently

What Fine-Tuning Cannot Do

  • Give the model access to information that changes frequently (use RAG instead)
  • Fundamentally change the model's reasoning capabilities
  • Make a small model perform like a model 10 times its size
  • Replace the need for proper prompt engineering

Fine-Tuning vs RAG: Choosing the Right Approach

Before investing in fine-tuning, understand whether retrieval-augmented generation might solve your problem more efficiently. RAG keeps the base model unchanged but gives it access to a searchable knowledge base at inference time. Fine-tuning changes the model itself.

Use RAG when your data changes frequently, such as product catalogs, pricing, or policy documents. Use fine-tuning when you need the model to internalize patterns, terminology, or behavioral expectations that are relatively stable. The most powerful approach combines both: a fine-tuned model that understands your domain, enhanced with RAG that provides current information.

Choosing Your Base Model

The base model you fine-tune from matters enormously. Larger models are more capable but require more VRAM, more training data, and longer fine-tuning times. Here are the practical tiers in 2026.

7B to 8B Parameter Models

Llama 3 8B and Mistral 7B are the workhorses of fine-tuning. They fine-tune on a single RTX 5090 with 32GB VRAM using QLoRA, complete training runs in hours rather than days, and deliver surprisingly strong performance for focused tasks. If your use case is well-defined, such as classifying support tickets, extracting entities from contracts, or generating responses in a specific format, a fine-tuned 7B model often outperforms a generic 70B model.

13B to 14B Parameter Models

Llama 3 13B offers a meaningful step up in reasoning capability while still fitting on a single high-end GPU for QLoRA fine-tuning. This tier is ideal when your task requires more nuanced understanding, such as legal document analysis or medical record summarization. Training takes longer and requires more data to see improvements, but the capability ceiling is higher.

30B to 70B Parameter Models

Fine-tuning at this scale requires multi-GPU setups with 80GB or more of total VRAM. The investment is justified when you need near-human performance on complex reasoning tasks. Most organizations should start with a smaller model and only move to this tier after confirming that the smaller model's limitations are actually blocking their use case.

Preparing Your Training Data

Data quality is the single biggest factor in fine-tuning success. A small dataset of high-quality examples will produce better results than a large dataset of mediocre ones.

Data Format

Most fine-tuning frameworks expect data in a conversational format: pairs of inputs and expected outputs. For instruction-following tasks, this typically looks like a system prompt, a user message, and the expected assistant response. For classification tasks, it might be a text sample paired with the correct category.

Data Quality Guidelines

Every example should be correct. Errors in your training data teach the model to make errors. Each example should be representative of the actual inputs the model will encounter in production. Vary your examples to cover edge cases and different phrasings of similar requests. Include negative examples that show the model what not to do, such as refusing to answer questions outside its scope.

Data Volume

For LoRA fine-tuning, you can see meaningful improvements with as few as 100 to 500 high-quality examples. Most production fine-tuning jobs use 1,000 to 10,000 examples. Beyond 10,000 examples, returns diminish unless you are teaching fundamentally new capabilities. Quality always trumps quantity.

Data Sourcing Strategies

The best training data comes from your existing operations. Customer support transcripts, expert-written documents, quality-assured outputs from your best employees, and curated knowledge bases all make excellent fine-tuning data. We have helped clients build training datasets from their SharePoint documents, CRM records, email templates, and internal wikis.

If you lack sufficient existing data, synthetic data generation using a frontier model like Claude or GPT-4o can bootstrap your dataset. Generate examples, have your domain experts review and correct them, and use the corrected versions for fine-tuning. This human-in-the-loop approach produces high-quality data efficiently.

Fine-Tuning Methods Explained

Full Fine-Tuning

Updates all model parameters. Produces the best results but requires the most VRAM, typically 4 to 8 times the model size in GPU memory. For a 7B model, that means approximately 56GB of VRAM minimum. Full fine-tuning is rarely necessary for business applications.

LoRA (Low-Rank Adaptation)

Freezes the base model weights and trains small adapter matrices that modify the model's behavior. Requires a fraction of the VRAM of full fine-tuning and trains much faster. A 7B model can be LoRA fine-tuned on 16GB of VRAM. This is the standard approach for most production fine-tuning.

QLoRA (Quantized LoRA)

Combines LoRA with 4-bit quantization of the base model, further reducing VRAM requirements. A 7B model can be QLoRA fine-tuned on as little as 6GB of VRAM, and a 13B model fits on a single RTX 5090. The quality trade-off compared to full LoRA is minimal for most use cases, making QLoRA the most practical choice for organizations without datacenter-grade hardware.

Step-by-Step Fine-Tuning Process

Step 1: Environment Setup

Install Python 3.11 or later, PyTorch with CUDA support, and the Hugging Face ecosystem including transformers, datasets, peft, and trl. If you are using Unsloth for accelerated fine-tuning, install it via Docker for the cleanest setup. We maintain Unsloth Docker deployments across our NVIDIA-equipped servers at PTG.

Step 2: Data Preparation

Format your training data as JSONL with each line containing a conversations array. Split your data into training and validation sets, typically 90/10 or 80/20. Ensure your validation set is representative of production inputs, not just a random sample.

Step 3: Configure Training

Key hyperparameters to set include learning rate (typically 1e-4 to 2e-4 for LoRA), number of epochs (1 to 3 for most datasets), LoRA rank (16 to 64 depending on task complexity), batch size (adjusted to fit your VRAM), and warmup steps (typically 5 to 10 percent of total steps).

Step 4: Train

Launch the training job and monitor loss curves. Training loss should decrease steadily. Validation loss should decrease and then plateau. If validation loss starts increasing while training loss continues decreasing, you are overfitting and should stop or reduce epochs.

Step 5: Evaluate

Test the fine-tuned model against your validation set and a separate held-out test set. Compare outputs against the base model on identical prompts. Have domain experts rate the quality of outputs on a rubric specific to your use case. Do not rely solely on automated metrics. Human evaluation is essential for assessing fine-tuning quality.

Step 6: Deploy

Merge the LoRA adapters into the base model for production deployment. Serve the merged model using Ollama, vLLM, or your preferred inference engine. Monitor production performance and collect feedback for future fine-tuning iterations.

Common Fine-Tuning Mistakes

Starting with too large a model. Begin with 7B or 8B. You can always scale up if the smaller model proves insufficient, but you cannot get back the time and resources spent training a 70B model that a 7B model could have handled.

Insufficient data curation. Garbage in, garbage out applies more strongly to fine-tuning than almost any other aspect of AI. Every training example should be reviewed by a domain expert.

Training for too many epochs. With LoRA, 1 to 3 epochs is usually optimal. More epochs lead to overfitting, where the model memorizes your training examples rather than learning generalizable patterns.

Ignoring the system prompt. Your fine-tuning data should include the system prompt you plan to use in production. The model learns to follow the system prompt during fine-tuning, so consistency between training and deployment is critical.

Not establishing a baseline. Always measure the base model's performance on your task before fine-tuning. Without a baseline, you cannot quantify the improvement or determine whether fine-tuning was worth the effort.

When to Use PTG's Fine-Tuning Services

Fine-tuning is accessible to organizations with in-house AI expertise, but many businesses benefit from professional guidance. Our LLM fine-tuning services handle the entire pipeline from data preparation through deployment. We bring experience across dozens of industry-specific fine-tuning projects, proprietary evaluation frameworks, and the hardware infrastructure to run training jobs efficiently.

Whether you are fine-tuning your first model or optimizing an existing deployment, the principles in this guide will help you make informed decisions. The key is to start with a clear use case, invest heavily in data quality, and iterate based on real-world performance rather than benchmark scores.

Need help implementing these strategies? Our cybersecurity experts can assess your environment and build a tailored plan.
Get Free Assessment
Craig Petronella
Craig Petronella
CEO & Founder, Petronella Technology Group | CMMC Registered Practitioner

Craig Petronella is a cybersecurity expert with over 24 years of experience protecting businesses from cyber threats. As founder of Petronella Technology Group, he has helped over 2,500 organizations strengthen their security posture, achieve compliance, and respond to incidents.

Related Service
Enterprise IT Solutions & AI Integration

From AI implementation to cloud infrastructure, PTG helps businesses deploy technology securely and at scale.

Explore AI & IT Services
Previous All Posts Next
Free cybersecurity consultation available Schedule Now