AI Workstation Build Guide 2026: RTX 5090 Deep Learning Setup
Posted: March 11, 2026 to Technology.
An AI workstation is a high-performance computer purpose-built for training, fine-tuning, and running inference on machine learning models locally. The 2026 generation of NVIDIA GPUs, led by the RTX 5090 with 32 GB of GDDR7 VRAM, enables professionals and small businesses to run 70-billion parameter models on a single desktop machine, eliminating the need for cloud GPU rentals that cost $2 to $8 per hour. This guide covers hardware selection, operating system configuration, driver installation, and performance benchmarks based on our production AI workstation builds.
Key Takeaways
- The NVIDIA RTX 5090 with 32 GB GDDR7 is the best value GPU for local AI work in 2026, running quantized 70B models at 40+ tokens per second
- A complete AI workstation capable of running 70B parameter models costs $6,000-$10,000, replacing $15,000-$50,000 per year in cloud GPU rental costs
- NixOS provides the most reproducible AI development environment; Ubuntu 24.04 LTS is the easiest to set up for teams new to Linux
- CUDA 12.8 and cuDNN 9.x are required for current PyTorch 2.5 and vLLM 0.7 support
- Proper cooling is critical: sustained AI workloads push GPU power consumption to 575W, requiring adequate case airflow and a 1200W+ power supply
Why Build an AI Workstation in 2026
The economics of cloud GPU rental have shifted dramatically against users who need regular access. A single NVIDIA A100 80GB instance on AWS costs $32.77 per hour. Running it 8 hours per day, 5 days per week, costs $6,880 per month or $82,560 per year.
An RTX 5090 workstation that delivers 60-80% of the A100's performance for inference costs $8,000 once. The workstation pays for itself in 5 weeks of regular use.
For businesses exploring private AI deployment or custom AI development, an AI workstation is the natural starting point. Test models locally, fine-tune on your data, validate performance, then deploy to production infrastructure.
Hardware Selection
The GPU: Choosing the Right Card
The GPU determines what models you can run and how fast. VRAM is the critical constraint: a model must fit (mostly) in GPU memory for acceptable inference speed.
| GPU | VRAM | MSRP | 70B Model (Q4) | 13B Model (FP16) | Power | Best For |
|---|---|---|---|---|---|---|
| RTX 5090 | 32 GB GDDR7 | $1,999 | Yes (40+ tok/s) | Yes (85+ tok/s) | 575W | Primary AI workstation |
| RTX 5080 | 16 GB GDDR7 | $999 | No (needs quantization below Q4) | Yes (60+ tok/s) | 360W | Budget AI, 7B-13B models |
| RTX 4090 | 24 GB GDDR6X | $1,599 (used) | Tight (Q3 quantization) | Yes (70+ tok/s) | 450W | Used market value pick |
| RTX 5090 x2 | 64 GB total | $3,998 | Yes (70+ tok/s) | Yes (120+ tok/s) | 1150W | Multi-model serving, 405B models |
| AMD RX 9070 XT | 16 GB GDDR6 | $549 | No | Yes (via ROCm, 40+ tok/s) | 300W | Budget inference only |
Our recommendation: The RTX 5090 is the clear winner for AI workstations in 2026. Its 32 GB of VRAM is the minimum for comfortably running quantized 70B models, which are the sweet spot for most business AI applications (clinical summarization, code generation, document analysis, customer support).
CPU Selection
For AI inference, the CPU matters less than the GPU. For training and fine-tuning, CPU performance affects data loading and preprocessing speed.
| CPU | Cores/Threads | Price | AI Workstation Rating |
|---|---|---|---|
| AMD Ryzen 9 9950X3D | 16C/32T | $599 | Excellent (best for mixed workloads) |
| AMD Ryzen 9 9950X | 16C/32T | $499 | Excellent |
| AMD Ryzen 9 9900X | 12C/24T | $399 | Good |
| Intel Core Ultra 9 285K | 24C/24T | $589 | Good (high efficiency cores help data loading) |
| AMD Threadripper 7960X | 24C/48T | $1,399 | Overkill for inference, excellent for training |
The AMD Ryzen 9 9950X3D is our default choice. Its massive L3 cache accelerates data preprocessing, and the platform (AM5) supports up to 256 GB DDR5 RAM with excellent PCIe 5.0 bandwidth for GPU communication.
Memory (RAM)
System RAM must be at least 2x the model size for efficient loading. For a 70B Q4 model (~35 GB on disk):
| Config | Price | Recommendation |
|---|---|---|
| 64 GB DDR5-6000 (2x32GB) | $200 | Minimum for 70B models |
| 128 GB DDR5-6000 (4x32GB) | $400 | Recommended (headroom for concurrent tasks) |
| 256 GB DDR5-5600 (4x64GB) | $900 | For training/fine-tuning with large datasets |
DDR5-6000 CL30 provides the best performance-per-dollar. Higher speeds show diminishing returns for AI workloads because the GPU, not system memory, is the bottleneck during inference.
Storage
Model weights for a 70B Q4 model consume approximately 35 GB. Training datasets can be much larger. Fast storage reduces model load time from minutes to seconds.
| Config | Price | Load Time (70B Q4) |
|---|---|---|
| 2 TB PCIe 5.0 NVMe (Samsung 990 Pro or WD SN850X) | $180-$250 | 10-15 seconds |
| 4 TB PCIe 4.0 NVMe | $250-$350 | 15-25 seconds |
| 2 TB SATA SSD | $120 | 45-90 seconds |
PCIe 5.0 NVMe is not strictly necessary (models spend most time in GPU VRAM, not on disk), but the 10-second load times make iterating between models painless. We recommend 2 TB minimum: 1 TB for the OS and applications, 1 TB for model weights and datasets.
Power Supply
AI workloads push GPUs to maximum power draw for sustained periods. Undersized power supplies cause shutdowns, instability, and component damage.
| GPU Config | Minimum PSU | Recommended PSU |
|---|---|---|
| Single RTX 5090 | 1000W | 1200W |
| Dual RTX 5090 | 1600W | 2000W |
| Single RTX 5080 | 750W | 850W |
Choose an 80 Plus Gold or Platinum certified PSU from Corsair, Seasonic, or be quiet!. The RTX 5090 uses a 16-pin 12V-2x6 connector; verify your PSU includes one natively rather than relying on adapters.
Case and Cooling
The RTX 5090 reference card is a triple-slot design measuring 340mm. Most mid-tower ATX cases accommodate this, but verify GPU clearance before purchasing.
Cooling strategy:
- CPU: 280mm or 360mm AIO liquid cooler (the 9950X3D runs hot under sustained load)
- Case: At minimum, 3 front intake fans and 1 rear exhaust. Ideally, 3 front + 3 top exhaust
- GPU: Reference cooler is adequate for single-GPU builds. For dual-GPU, consider a deshrouded approach or liquid cooling
Ambient room temperature matters. A sustained 575W GPU in a 30C/86F room without adequate HVAC will thermal throttle. Target room temperature below 25C/77F.
Complete Build List
Recommended Build: Single RTX 5090
| Component | Model | Price |
|---|---|---|
| GPU | NVIDIA RTX 5090 32 GB | $1,999 |
| CPU | AMD Ryzen 9 9950X3D | $599 |
| Motherboard | ASUS ProArt X870E-Creator WiFi | $499 |
| RAM | 128 GB DDR5-6000 CL30 (4x32GB) | $400 |
| Storage | Samsung 990 Pro 2 TB NVMe | $200 |
| PSU | Corsair HX1200i (1200W, 80+ Platinum) | $260 |
| CPU Cooler | Arctic Liquid Freezer III 360 | $120 |
| Case | Fractal Design Torrent | $190 |
| Fans | 3x Arctic P14 (additional intake) | $30 |
| Total | $4,297 |
Add $200 for NixOS/Ubuntu installation and configuration, or do it yourself following the instructions below.
High-End Build: Dual RTX 5090
| Component | Model | Price |
|---|---|---|
| GPUs | 2x NVIDIA RTX 5090 32 GB | $3,998 |
| CPU | AMD Ryzen 9 9950X3D | $599 |
| Motherboard | ASUS ProArt X870E-Creator WiFi | $499 |
| RAM | 256 GB DDR5-5600 (4x64GB) | $900 |
| Storage | Samsung 990 Pro 4 TB NVMe | $350 |
| PSU | Corsair HX2000i (2000W, 80+ Platinum) | $500 |
| CPU Cooler | Arctic Liquid Freezer III 360 | $120 |
| Case | Fractal Design Torrent XL | $250 |
| Fans | 6x Arctic P14 | $60 |
| Total | $7,276 |
Operating System Setup
Option A: NixOS (Our Recommendation)
NixOS provides fully reproducible system configurations. Every package, driver, and service is declared in configuration files that can be version-controlled and deployed identically to other machines.
# /etc/nixos/configuration.nix (AI workstation excerpt)
{ config, pkgs, ... }:
{
# NVIDIA drivers
hardware.nvidia = {
package = config.boot.kernelPackages.nvidiaPackages.stable;
modesetting.enable = true;
open = false;
};
hardware.graphics.enable = true;
# CUDA support
environment.systemPackages = with pkgs; [
cudaPackages.cudatoolkit
cudaPackages.cudnn
python312
python312Packages.pip
python312Packages.torch-bin
ollama-cuda
git
htop
nvtopPackages.nvidia
];
# Ollama service
services.ollama = {
enable = true;
package = pkgs.ollama-cuda;
acceleration = "cuda";
};
}
After editing the configuration:
sudo nixos-rebuild switch
The entire system state is now reproducible. If you build a second workstation, copy the configuration file and run nixos-rebuild switch. Identical environment, guaranteed.
Option B: Ubuntu 24.04 LTS
Ubuntu is the most widely supported platform for AI development. Setup is more manual but well-documented.
# Update system
sudo apt update && sudo apt upgrade -y
# Install NVIDIA driver
sudo apt install nvidia-driver-560
# Reboot and verify
sudo reboot
nvidia-smi
# Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-8
# Add CUDA to PATH
echo 'export PATH=/usr/local/cuda-12.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc
# Verify CUDA
nvcc --version
# Install Python environment
sudo apt install python3.12 python3.12-venv python3-pip
# Create AI development environment
python3.12 -m venv ~/ai-env
source ~/ai-env/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install transformers accelerate vllm
Install Docker (Both OS)
Docker simplifies running AI tools and ensures environment isolation:
# Ubuntu
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker
# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi
Benchmarks: RTX 5090 AI Workstation
All benchmarks run on our single RTX 5090 build (9950X3D, 128 GB DDR5, NixOS):
Inference Performance (Ollama)
| Model | Quantization | VRAM Used | Tokens/Second | Time to First Token |
|---|---|---|---|---|
| Llama 3.1 8B Instruct | Q4_K_M | 5.2 GB | 112 tok/s | 0.3s |
| Llama 3.1 70B Instruct | Q4_K_M | 28.4 GB | 42 tok/s | 1.2s |
| Mistral Large 2 123B | Q3_K_M | 30.8 GB | 18 tok/s | 2.8s |
| Qwen 2.5 72B Instruct | Q4_K_M | 29.1 GB | 39 tok/s | 1.4s |
| DeepSeek R1 70B (distilled) | Q4_K_M | 28.9 GB | 35 tok/s | 1.6s |
| CodeLlama 34B | Q5_K_M | 18.2 GB | 58 tok/s | 0.8s |
Fine-Tuning Performance (Unsloth)
| Model | Method | Dataset Size | Training Time | VRAM Used |
|---|---|---|---|---|
| Llama 3.1 8B | QLoRA (4-bit) | 10K samples | 45 minutes | 9.8 GB |
| Llama 3.1 70B | QLoRA (4-bit) | 10K samples | 8.2 hours | 29.6 GB |
| Mistral 7B | Full fine-tune | 10K samples | 3.1 hours | 18.4 GB |
Power Consumption
| Workload | GPU Power Draw | System Total |
|---|---|---|
| Idle | 25W | 85W |
| Inference (70B Q4) | 380-420W | 520-560W |
| Training (QLoRA) | 520-575W | 660-720W |
| Peak transient | 600W | 780W |
Annual electricity cost at $0.12/kWh with 12 hours/day active use: approximately $290-$380.
Maintenance and Monitoring
GPU Monitoring
# Real-time GPU monitoring
nvtop
# One-shot status
nvidia-smi
# Continuous monitoring with 1-second refresh
watch -n 1 nvidia-smi
Temperature Management
Set up alerts for thermal throttling:
# Check GPU temperature
nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader
# Alert if temperature exceeds 85C (add to crontab)
TEMP=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)
if [ "$TEMP" -gt 85 ]; then
echo "GPU temperature critical: ${TEMP}C" | mail -s "GPU Thermal Alert" [email protected]
fi
Normal operating temperatures under sustained AI workload: 65-80C. Above 83C, the GPU begins thermal throttling, reducing performance by 10-30%.
Firmware and Driver Updates
- NVIDIA drivers: Update quarterly. Test on a non-production machine first
- BIOS: Update if the manufacturer releases stability or compatibility fixes for your GPU
- NixOS:
sudo nixos-rebuild switchautomatically pulls the latest driver version from your configured channel
Who Should Build an AI Workstation
An AI workstation makes sense for:
- Software development teams using AI-assisted coding (Copilot-level performance, locally hosted)
- Healthcare practices evaluating private AI for clinical documentation
- Defense contractors who need AI capabilities that cannot touch the cloud
- Research teams prototyping models before production deployment
- Businesses tired of per-seat AI subscription fees ($20-$30/user/month adds up fast)
For organizations that want the capability without the hardware management, Petronella Technology Group builds and deploys AI workstations as part of our private AI service. We handle hardware procurement, OS configuration, model selection, and ongoing support.
Call 919-348-4912 or visit petronellatech.com/contact/ to discuss your AI infrastructure needs.
About the Author: Craig Petronella is the CEO of Petronella Technology Group, Inc., and operates a 19-machine fleet including RTX 5090, RTX 4090, and AMD ROCm AI workstations. With over 30 years of IT infrastructure experience and a CMMC Registered Practitioner credential (RP-1372), Craig builds AI infrastructure that meets both performance and security requirements.
Frequently Asked Questions
Is the RTX 5090 better than the H100 for local AI?
For inference on models up to 70B parameters, the RTX 5090 delivers 60-80% of H100 performance at 2.5% of the cost ($1,999 vs $25,000-$40,000). The H100 wins for training large models (70B+ full fine-tune) and serving high-concurrency workloads (100+ simultaneous users) due to its 80 GB HBM3 memory and higher memory bandwidth. For a business running a private LLM for a team of 10-30 people, the RTX 5090 is the clear value winner.
Can I use AMD GPUs for AI workloads?
AMD GPUs work for inference through ROCm, but the software ecosystem is less mature than NVIDIA CUDA. PyTorch supports ROCm officially, and Ollama has ROCm support for AMD Instinct and RX 7000/9000 series GPUs. However, some AI frameworks (vLLM, TensorRT) have limited or experimental AMD support. For production AI workstations in 2026, NVIDIA remains the safer choice.
How loud is an AI workstation under load?
A well-built AI workstation with a Fractal Design Torrent case and quality fans runs at 35-45 dBA under sustained GPU load. This is comparable to a refrigerator or quiet conversation. The GPU cooler is the primary noise source. Moving the workstation to a server closet or separate room with a remote connection (SSH or remote desktop) eliminates noise entirely.
Should I use NixOS or Ubuntu?
Ubuntu if your team has no NixOS experience and needs the widest software compatibility. NixOS if you want reproducible environments, declarative configuration management, and the ability to replicate the exact workstation setup across multiple machines. NixOS has a steeper learning curve (2-4 weeks to become productive) but pays dividends in long-term maintenance.
Can I run multiple models simultaneously?
Yes, if you have sufficient VRAM. Ollama manages model loading and unloading automatically. With a 32 GB RTX 5090, you can run a 7B model and a 13B model simultaneously, or dedicate all VRAM to a single 70B model. With dual RTX 5090s (64 GB total), you can run multiple large models or serve a single model with higher throughput via tensor parallelism.
How often should I upgrade the GPU?
GPU generations advance every 2 years. Based on current trends, the RTX 5090 will remain competitive for AI inference for 3 to 4 years. The limiting factor is VRAM: as models grow, you need more memory. When 100B+ models become the standard for general-purpose tasks (likely by 2028-2029), an upgrade to whatever offers 48-64 GB per card will make sense.
{
"@context": "https://schema.org",
"@type": "FAQPage",
"mainEntity": [
{
"@type": "Question",
"name": "Is the RTX 5090 better than the H100 for local AI?",
"acceptedAnswer": {
"@type": "Answer",
"text": "For inference on models up to 70B parameters, the RTX 5090 delivers 60-80% of H100 performance at 2.5% of the cost ($1,999 vs $25,000-$40,000). The H100 wins for training large models and high-concurrency serving. For teams of 10-30 people, the RTX 5090 is the clear value winner."
}
},
{
"@type": "Question",
"name": "Can I use AMD GPUs for AI workloads?",
"acceptedAnswer": {
"@type": "Answer",
"text": "AMD GPUs work for inference through ROCm with official PyTorch support. However, the software ecosystem is less mature than NVIDIA CUDA, and some frameworks have limited AMD support. For production AI workstations in 2026, NVIDIA remains the safer choice."
}
},
{
"@type": "Question",
"name": "How loud is an AI workstation under load?",
"acceptedAnswer": {
"@type": "Answer",
"text": "A well-built AI workstation runs at 35-45 dBA under sustained GPU load, comparable to a refrigerator. Moving it to a separate room with remote access eliminates noise entirely."
}
},
{
"@type": "Question",
"name": "Should I use NixOS or Ubuntu?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Ubuntu for teams with no NixOS experience needing wide software compatibility. NixOS for reproducible environments and declarative configuration management. NixOS has a steeper learning curve but pays dividends in maintenance."
}
},
{
"@type": "Question",
"name": "Can I run multiple models simultaneously?",
"acceptedAnswer": {
"@type": "Answer",
"text": "Yes, if you have sufficient VRAM. With a single 32 GB RTX 5090, run a 7B and 13B model simultaneously, or dedicate all VRAM to one 70B model. Dual 5090s (64 GB) enable multiple large models or higher throughput via tensor parallelism."
}
},
{
"@type": "Question",
"name": "How often should I upgrade the GPU?",
"acceptedAnswer": {
"@type": "Answer",
"text": "GPU generations advance every 2 years. The RTX 5090 will remain competitive for AI inference for 3-4 years. The limiting factor is VRAM as models grow. Upgrade when 100B+ models become standard, likely requiring 48-64 GB per card by 2028-2029."
}
}
]
}