Fine-Tuning Language Models on NVIDIA DGX Spark

Complete How-To Guide

Overview

This guide provides comprehensive instructions for fine-tuning open-source language models on the NVIDIA DGX Spark personal AI supercomputer. The DGX Spark’s unique 128GB unified memory architecture enables local training of models that would traditionally require cloud infrastructure.

Fine-tuning allows you to customize pre-trained models for specific tasks, domains, or response styles while preserving their general capabilities. This guide covers three fine-tuning strategies: Full fine-tuning for maximum customization, LoRA for memory-efficient adaptation, and QLoRA for training even larger models within memory constraints.

DGX Spark Hardware Advantages

The NVIDIA DGX Spark provides several key advantages for local AI development:

128GB Unified Memory: CPU and GPU share the same memory pool via NVLink-C2C, eliminating memory transfer bottlenecks
Grace Blackwell Architecture: Purpose-built for AI workloads with up to 1 PFLOPS performance (FP4)
900 GB/s NVLink-C2C Bandwidth: Ultra-fast CPU-GPU communication for seamless model loading
Local Execution: Complete privacy, no cloud dependencies, predictable costs
Large Model Support: Train 7B-70B parameter models locally with appropriate methods

Fine-Tuning Methods

Choose the appropriate method based on your model size, available memory, and quality requirements:

Recommended Models

The following open-source models are excellent choices for fine-tuning on DGX Spark, sorted by size:

Small Models (Under 3B Parameters)

Ideal for experimentation, fast iteration, and full fine-tuning:

SmolLM 135M/360M/1.7B: HuggingFace’s efficient small models, perfect for testing
Qwen 2.5 1.5B: Excellent multilingual capabilities in a small package
Phi-3 Mini (3.8B): Microsoft’s compact but capable model

Medium Models (3B-13B Parameters)

Best balance of capability and trainability with LoRA:

Qwen 2.5 3B/7B: Strong reasoning and coding abilities
Llama 3.2 3B: Meta’s latest efficient model
Llama 3.1 8B: Excellent general-purpose model
Mistral 7B: Strong performance with fast inference
Gemma 2 9B: Google’s high-quality open model

Large Models (13B+ Parameters)

Maximum capability, requires QLoRA for training:

Mistral Nemo 12B: Excellent for complex tasks
Llama 3.1 70B: State-of-the-art open model (QLoRA required)
Qwen 2.5 72B: Powerful multilingual model (QLoRA required)

Quick Start Guide

Follow these steps to fine-tune your first model:

Step 1: Environment Setup

Clone or download the fine-tuning scripts, then run the setup:

chmod +x setup.sh && ./setup.sh

This creates a virtual environment and installs all dependencies.

Step 2: Prepare Your Data

Create a JSON file with your training examples in Alpaca format:

[{“instruction”: “Your task”, “input”: “Optional context”, “output”: “Expected response”}]

Or use the provided dataset preparation script:

python scripts/prepare_dataset.py — create-sample

Step 3: Run Fine-Tuning

Execute fine-tuning with your chosen model and method:

python scripts/finetune_dgx_spark.py — model qwen2.5–3b — method lora — dataset data/sample_data.json

Or use the convenience script with presets:

./run_finetune.sh small # Qwen 3B with LoRA

Step 4: Test Your Model

Run inference with your fine-tuned model:

python scripts/finetune_dgx_spark.py — inference — model-path output/merged_model — prompt “Your test prompt”

Dataset Preparation

High-quality training data is the most important factor in fine-tuning success. This section covers data formats and best practices.

Supported Formats

Alpaca Format (Recommended)

The standard format for instruction-following datasets:

{“instruction”: “task description”, “input”: “optional context”, “output”: “expected response”}

ShareGPT Format

For conversational/chat-style data:

{“conversations”: [{“role”: “user”, “content”: “…”}, {“role”: “assistant”, “content”: “…”}]}

Data Quality Guidelines

Aim for 1,000–10,000 high-quality examples for domain adaptation
Ensure diverse examples covering the full range of desired behaviors
Include both simple and complex examples for robust learning
Validate that outputs match instructions accurately
Remove duplicates and low-quality examples
Balance categories if doing multi-task training

Training Configuration

Optimal hyperparameters vary by model size and method. The following recommendations are tuned for DGX Spark’s 128GB unified memory:

Recommended Batch Sizes

LoRA Hyperparameters

Rank (r): 16 for LoRA, 64 for QLoRA — higher rank = more capacity
Alpha: Typically 2x the rank (32 for r=16)
Dropout: 0.05 for LoRA, 0.1 for QLoRA
Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Parameters

Learning Rate: 2e-4 (adjust based on loss curves)
Epochs: 3 for domain adaptation, 1–2 for instruction tuning
Warmup Ratio: 0.03 (3% of training steps)
Weight Decay: 0.01
Scheduler: Cosine with warmup

Troubleshooting

Out of Memory Errors

Reduce batch size by half
Increase gradient accumulation to maintain effective batch size
Switch from full fine-tuning to LoRA, or from LoRA to QLoRA
Reduce sequence length (max_length parameter)
Enable gradient checkpointing (enabled by default)

Training Loss Not Decreasing

Check data quality and format
Increase learning rate by 2–5x
Verify tokenization is working correctly
Ensure sufficient training examples (1000+ recommended)

Model Produces Nonsense

Training may have diverged — reduce learning rate
Check for data formatting issues
Ensure proper tokenizer configuration
Train for more epochs if loss is still high

Next Steps

After successfully fine-tuning your model:

Evaluate on held-out test data to measure improvements
Deploy using LM Studio, Ollama, or vLLM for inference
Compare with cloud alternatives to quantify DGX Spark advantages
Iterate on data quality for continued improvement
Consider RLHF or DPO for further alignment

The DGX Spark’s unified memory architecture provides unique advantages for local AI development, enabling training of large models without cloud dependencies while maintaining full control over your data and models.

GitHub: https://github.com/sanjbasu/dgxsparkfinetune

Search This Blog

Patterns that Connect: AI, Management, Metaverse, Quantum, Philosophy, and Physics