Previous All Posts Next

AI Workstation Build Guide 2026: RTX 5090 Deep Learning Setup

Posted: March 11, 2026 to Technology.

An AI workstation is a high-performance computer purpose-built for training, fine-tuning, and running inference on machine learning models locally. The 2026 generation of NVIDIA GPUs, led by the RTX 5090 with 32 GB of GDDR7 VRAM, enables professionals and small businesses to run 70-billion parameter models on a single desktop machine, eliminating the need for cloud GPU rentals that cost $2 to $8 per hour. This guide covers hardware selection, operating system configuration, driver installation, and performance benchmarks based on our production AI workstation builds.


Key Takeaways

  • The NVIDIA RTX 5090 with 32 GB GDDR7 is the best value GPU for local AI work in 2026, running quantized 70B models at 40+ tokens per second
  • A complete AI workstation capable of running 70B parameter models costs $6,000-$10,000, replacing $15,000-$50,000 per year in cloud GPU rental costs
  • NixOS provides the most reproducible AI development environment; Ubuntu 24.04 LTS is the easiest to set up for teams new to Linux
  • CUDA 12.8 and cuDNN 9.x are required for current PyTorch 2.5 and vLLM 0.7 support
  • Proper cooling is critical: sustained AI workloads push GPU power consumption to 575W, requiring adequate case airflow and a 1200W+ power supply

Why Build an AI Workstation in 2026

The economics of cloud GPU rental have shifted dramatically against users who need regular access. A single NVIDIA A100 80GB instance on AWS costs $32.77 per hour. Running it 8 hours per day, 5 days per week, costs $6,880 per month or $82,560 per year.

An RTX 5090 workstation that delivers 60-80% of the A100's performance for inference costs $8,000 once. The workstation pays for itself in 5 weeks of regular use.

For businesses exploring private AI deployment or custom AI development, an AI workstation is the natural starting point. Test models locally, fine-tune on your data, validate performance, then deploy to production infrastructure.

Hardware Selection

The GPU: Choosing the Right Card

The GPU determines what models you can run and how fast. VRAM is the critical constraint: a model must fit (mostly) in GPU memory for acceptable inference speed.

GPU VRAM MSRP 70B Model (Q4) 13B Model (FP16) Power Best For
RTX 5090 32 GB GDDR7 $1,999 Yes (40+ tok/s) Yes (85+ tok/s) 575W Primary AI workstation
RTX 5080 16 GB GDDR7 $999 No (needs quantization below Q4) Yes (60+ tok/s) 360W Budget AI, 7B-13B models
RTX 4090 24 GB GDDR6X $1,599 (used) Tight (Q3 quantization) Yes (70+ tok/s) 450W Used market value pick
RTX 5090 x2 64 GB total $3,998 Yes (70+ tok/s) Yes (120+ tok/s) 1150W Multi-model serving, 405B models
AMD RX 9070 XT 16 GB GDDR6 $549 No Yes (via ROCm, 40+ tok/s) 300W Budget inference only

Our recommendation: The RTX 5090 is the clear winner for AI workstations in 2026. Its 32 GB of VRAM is the minimum for comfortably running quantized 70B models, which are the sweet spot for most business AI applications (clinical summarization, code generation, document analysis, customer support).

CPU Selection

For AI inference, the CPU matters less than the GPU. For training and fine-tuning, CPU performance affects data loading and preprocessing speed.

CPU Cores/Threads Price AI Workstation Rating
AMD Ryzen 9 9950X3D 16C/32T $599 Excellent (best for mixed workloads)
AMD Ryzen 9 9950X 16C/32T $499 Excellent
AMD Ryzen 9 9900X 12C/24T $399 Good
Intel Core Ultra 9 285K 24C/24T $589 Good (high efficiency cores help data loading)
AMD Threadripper 7960X 24C/48T $1,399 Overkill for inference, excellent for training

The AMD Ryzen 9 9950X3D is our default choice. Its massive L3 cache accelerates data preprocessing, and the platform (AM5) supports up to 256 GB DDR5 RAM with excellent PCIe 5.0 bandwidth for GPU communication.

Memory (RAM)

System RAM must be at least 2x the model size for efficient loading. For a 70B Q4 model (~35 GB on disk):

Config Price Recommendation
64 GB DDR5-6000 (2x32GB) $200 Minimum for 70B models
128 GB DDR5-6000 (4x32GB) $400 Recommended (headroom for concurrent tasks)
256 GB DDR5-5600 (4x64GB) $900 For training/fine-tuning with large datasets

DDR5-6000 CL30 provides the best performance-per-dollar. Higher speeds show diminishing returns for AI workloads because the GPU, not system memory, is the bottleneck during inference.

Storage

Model weights for a 70B Q4 model consume approximately 35 GB. Training datasets can be much larger. Fast storage reduces model load time from minutes to seconds.

Config Price Load Time (70B Q4)
2 TB PCIe 5.0 NVMe (Samsung 990 Pro or WD SN850X) $180-$250 10-15 seconds
4 TB PCIe 4.0 NVMe $250-$350 15-25 seconds
2 TB SATA SSD $120 45-90 seconds

PCIe 5.0 NVMe is not strictly necessary (models spend most time in GPU VRAM, not on disk), but the 10-second load times make iterating between models painless. We recommend 2 TB minimum: 1 TB for the OS and applications, 1 TB for model weights and datasets.

Power Supply

AI workloads push GPUs to maximum power draw for sustained periods. Undersized power supplies cause shutdowns, instability, and component damage.

GPU Config Minimum PSU Recommended PSU
Single RTX 5090 1000W 1200W
Dual RTX 5090 1600W 2000W
Single RTX 5080 750W 850W

Choose an 80 Plus Gold or Platinum certified PSU from Corsair, Seasonic, or be quiet!. The RTX 5090 uses a 16-pin 12V-2x6 connector; verify your PSU includes one natively rather than relying on adapters.

Case and Cooling

The RTX 5090 reference card is a triple-slot design measuring 340mm. Most mid-tower ATX cases accommodate this, but verify GPU clearance before purchasing.

Cooling strategy:

  • CPU: 280mm or 360mm AIO liquid cooler (the 9950X3D runs hot under sustained load)
  • Case: At minimum, 3 front intake fans and 1 rear exhaust. Ideally, 3 front + 3 top exhaust
  • GPU: Reference cooler is adequate for single-GPU builds. For dual-GPU, consider a deshrouded approach or liquid cooling

Ambient room temperature matters. A sustained 575W GPU in a 30C/86F room without adequate HVAC will thermal throttle. Target room temperature below 25C/77F.

Complete Build List

Recommended Build: Single RTX 5090

Component Model Price
GPU NVIDIA RTX 5090 32 GB $1,999
CPU AMD Ryzen 9 9950X3D $599
Motherboard ASUS ProArt X870E-Creator WiFi $499
RAM 128 GB DDR5-6000 CL30 (4x32GB) $400
Storage Samsung 990 Pro 2 TB NVMe $200
PSU Corsair HX1200i (1200W, 80+ Platinum) $260
CPU Cooler Arctic Liquid Freezer III 360 $120
Case Fractal Design Torrent $190
Fans 3x Arctic P14 (additional intake) $30
Total $4,297

Add $200 for NixOS/Ubuntu installation and configuration, or do it yourself following the instructions below.

High-End Build: Dual RTX 5090

Component Model Price
GPUs 2x NVIDIA RTX 5090 32 GB $3,998
CPU AMD Ryzen 9 9950X3D $599
Motherboard ASUS ProArt X870E-Creator WiFi $499
RAM 256 GB DDR5-5600 (4x64GB) $900
Storage Samsung 990 Pro 4 TB NVMe $350
PSU Corsair HX2000i (2000W, 80+ Platinum) $500
CPU Cooler Arctic Liquid Freezer III 360 $120
Case Fractal Design Torrent XL $250
Fans 6x Arctic P14 $60
Total $7,276

Operating System Setup

Option A: NixOS (Our Recommendation)

NixOS provides fully reproducible system configurations. Every package, driver, and service is declared in configuration files that can be version-controlled and deployed identically to other machines.

# /etc/nixos/configuration.nix (AI workstation excerpt)
{ config, pkgs, ... }:
{
  # NVIDIA drivers
  hardware.nvidia = {
    package = config.boot.kernelPackages.nvidiaPackages.stable;
    modesetting.enable = true;
    open = false;
  };
  hardware.graphics.enable = true;

  # CUDA support
  environment.systemPackages = with pkgs; [
    cudaPackages.cudatoolkit
    cudaPackages.cudnn
    python312
    python312Packages.pip
    python312Packages.torch-bin
    ollama-cuda
    git
    htop
    nvtopPackages.nvidia
  ];

  # Ollama service
  services.ollama = {
    enable = true;
    package = pkgs.ollama-cuda;
    acceleration = "cuda";
  };
}

After editing the configuration:

sudo nixos-rebuild switch

The entire system state is now reproducible. If you build a second workstation, copy the configuration file and run nixos-rebuild switch. Identical environment, guaranteed.

Option B: Ubuntu 24.04 LTS

Ubuntu is the most widely supported platform for AI development. Setup is more manual but well-documented.

# Update system
sudo apt update && sudo apt upgrade -y

# Install NVIDIA driver
sudo apt install nvidia-driver-560

# Reboot and verify
sudo reboot
nvidia-smi

# Install CUDA toolkit
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-12-8

# Add CUDA to PATH
echo 'export PATH=/usr/local/cuda-12.8/bin:$PATH' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64:$LD_LIBRARY_PATH' >> ~/.bashrc
source ~/.bashrc

# Verify CUDA
nvcc --version

# Install Python environment
sudo apt install python3.12 python3.12-venv python3-pip

# Create AI development environment
python3.12 -m venv ~/ai-env
source ~/ai-env/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
pip install transformers accelerate vllm

Install Docker (Both OS)

Docker simplifies running AI tools and ensures environment isolation:

# Ubuntu
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
newgrp docker

# Install NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

# Verify GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.8.0-base-ubuntu24.04 nvidia-smi

Benchmarks: RTX 5090 AI Workstation

All benchmarks run on our single RTX 5090 build (9950X3D, 128 GB DDR5, NixOS):

Inference Performance (Ollama)

Model Quantization VRAM Used Tokens/Second Time to First Token
Llama 3.1 8B Instruct Q4_K_M 5.2 GB 112 tok/s 0.3s
Llama 3.1 70B Instruct Q4_K_M 28.4 GB 42 tok/s 1.2s
Mistral Large 2 123B Q3_K_M 30.8 GB 18 tok/s 2.8s
Qwen 2.5 72B Instruct Q4_K_M 29.1 GB 39 tok/s 1.4s
DeepSeek R1 70B (distilled) Q4_K_M 28.9 GB 35 tok/s 1.6s
CodeLlama 34B Q5_K_M 18.2 GB 58 tok/s 0.8s

Fine-Tuning Performance (Unsloth)

Model Method Dataset Size Training Time VRAM Used
Llama 3.1 8B QLoRA (4-bit) 10K samples 45 minutes 9.8 GB
Llama 3.1 70B QLoRA (4-bit) 10K samples 8.2 hours 29.6 GB
Mistral 7B Full fine-tune 10K samples 3.1 hours 18.4 GB

Power Consumption

Workload GPU Power Draw System Total
Idle 25W 85W
Inference (70B Q4) 380-420W 520-560W
Training (QLoRA) 520-575W 660-720W
Peak transient 600W 780W

Annual electricity cost at $0.12/kWh with 12 hours/day active use: approximately $290-$380.

Maintenance and Monitoring

GPU Monitoring

# Real-time GPU monitoring
nvtop

# One-shot status
nvidia-smi

# Continuous monitoring with 1-second refresh
watch -n 1 nvidia-smi

Temperature Management

Set up alerts for thermal throttling:

# Check GPU temperature
nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader

# Alert if temperature exceeds 85C (add to crontab)
TEMP=$(nvidia-smi --query-gpu=temperature.gpu --format=csv,noheader)
if [ "$TEMP" -gt 85 ]; then
  echo "GPU temperature critical: ${TEMP}C" | mail -s "GPU Thermal Alert" [email protected]
fi

Normal operating temperatures under sustained AI workload: 65-80C. Above 83C, the GPU begins thermal throttling, reducing performance by 10-30%.

Firmware and Driver Updates

  • NVIDIA drivers: Update quarterly. Test on a non-production machine first
  • BIOS: Update if the manufacturer releases stability or compatibility fixes for your GPU
  • NixOS: sudo nixos-rebuild switch automatically pulls the latest driver version from your configured channel

Who Should Build an AI Workstation

An AI workstation makes sense for:

  • Software development teams using AI-assisted coding (Copilot-level performance, locally hosted)
  • Healthcare practices evaluating private AI for clinical documentation
  • Defense contractors who need AI capabilities that cannot touch the cloud
  • Research teams prototyping models before production deployment
  • Businesses tired of per-seat AI subscription fees ($20-$30/user/month adds up fast)

For organizations that want the capability without the hardware management, Petronella Technology Group builds and deploys AI workstations as part of our private AI service. We handle hardware procurement, OS configuration, model selection, and ongoing support.

Call 919-348-4912 or visit petronellatech.com/contact/ to discuss your AI infrastructure needs.


About the Author: Craig Petronella is the CEO of Petronella Technology Group, Inc., and operates a 19-machine fleet including RTX 5090, RTX 4090, and AMD ROCm AI workstations. With over 30 years of IT infrastructure experience and a CMMC Registered Practitioner credential (RP-1372), Craig builds AI infrastructure that meets both performance and security requirements.


Frequently Asked Questions

Is the RTX 5090 better than the H100 for local AI?

For inference on models up to 70B parameters, the RTX 5090 delivers 60-80% of H100 performance at 2.5% of the cost ($1,999 vs $25,000-$40,000). The H100 wins for training large models (70B+ full fine-tune) and serving high-concurrency workloads (100+ simultaneous users) due to its 80 GB HBM3 memory and higher memory bandwidth. For a business running a private LLM for a team of 10-30 people, the RTX 5090 is the clear value winner.

Can I use AMD GPUs for AI workloads?

AMD GPUs work for inference through ROCm, but the software ecosystem is less mature than NVIDIA CUDA. PyTorch supports ROCm officially, and Ollama has ROCm support for AMD Instinct and RX 7000/9000 series GPUs. However, some AI frameworks (vLLM, TensorRT) have limited or experimental AMD support. For production AI workstations in 2026, NVIDIA remains the safer choice.

How loud is an AI workstation under load?

A well-built AI workstation with a Fractal Design Torrent case and quality fans runs at 35-45 dBA under sustained GPU load. This is comparable to a refrigerator or quiet conversation. The GPU cooler is the primary noise source. Moving the workstation to a server closet or separate room with a remote connection (SSH or remote desktop) eliminates noise entirely.

Should I use NixOS or Ubuntu?

Ubuntu if your team has no NixOS experience and needs the widest software compatibility. NixOS if you want reproducible environments, declarative configuration management, and the ability to replicate the exact workstation setup across multiple machines. NixOS has a steeper learning curve (2-4 weeks to become productive) but pays dividends in long-term maintenance.

Can I run multiple models simultaneously?

Yes, if you have sufficient VRAM. Ollama manages model loading and unloading automatically. With a 32 GB RTX 5090, you can run a 7B model and a 13B model simultaneously, or dedicate all VRAM to a single 70B model. With dual RTX 5090s (64 GB total), you can run multiple large models or serve a single model with higher throughput via tensor parallelism.

How often should I upgrade the GPU?

GPU generations advance every 2 years. Based on current trends, the RTX 5090 will remain competitive for AI inference for 3 to 4 years. The limiting factor is VRAM: as models grow, you need more memory. When 100B+ models become the standard for general-purpose tasks (likely by 2028-2029), an upgrade to whatever offers 48-64 GB per card will make sense.


{
  "@context": "https://schema.org",
  "@type": "FAQPage",
  "mainEntity": [
    {
      "@type": "Question",
      "name": "Is the RTX 5090 better than the H100 for local AI?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "For inference on models up to 70B parameters, the RTX 5090 delivers 60-80% of H100 performance at 2.5% of the cost ($1,999 vs $25,000-$40,000). The H100 wins for training large models and high-concurrency serving. For teams of 10-30 people, the RTX 5090 is the clear value winner."
      }
    },
    {
      "@type": "Question",
      "name": "Can I use AMD GPUs for AI workloads?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "AMD GPUs work for inference through ROCm with official PyTorch support. However, the software ecosystem is less mature than NVIDIA CUDA, and some frameworks have limited AMD support. For production AI workstations in 2026, NVIDIA remains the safer choice."
      }
    },
    {
      "@type": "Question",
      "name": "How loud is an AI workstation under load?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "A well-built AI workstation runs at 35-45 dBA under sustained GPU load, comparable to a refrigerator. Moving it to a separate room with remote access eliminates noise entirely."
      }
    },
    {
      "@type": "Question",
      "name": "Should I use NixOS or Ubuntu?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Ubuntu for teams with no NixOS experience needing wide software compatibility. NixOS for reproducible environments and declarative configuration management. NixOS has a steeper learning curve but pays dividends in maintenance."
      }
    },
    {
      "@type": "Question",
      "name": "Can I run multiple models simultaneously?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "Yes, if you have sufficient VRAM. With a single 32 GB RTX 5090, run a 7B and 13B model simultaneously, or dedicate all VRAM to one 70B model. Dual 5090s (64 GB) enable multiple large models or higher throughput via tensor parallelism."
      }
    },
    {
      "@type": "Question",
      "name": "How often should I upgrade the GPU?",
      "acceptedAnswer": {
        "@type": "Answer",
        "text": "GPU generations advance every 2 years. The RTX 5090 will remain competitive for AI inference for 3-4 years. The limiting factor is VRAM as models grow. Upgrade when 100B+ models become standard, likely requiring 48-64 GB per card by 2028-2029."
      }
    }
  ]
}
Need help implementing these strategies? Our cybersecurity experts can assess your environment and build a tailored plan.
Get Free Assessment

About the Author

Craig Petronella, CEO and Founder of Petronella Technology Group
CEO, Founder & AI Architect, Petronella Technology Group

Craig Petronella founded Petronella Technology Group in 2002 and has spent more than 30 years working at the intersection of cybersecurity, AI, compliance, and digital forensics. He holds the CMMC Registered Practitioner credential (RP-1372) issued by the Cyber AB, is an NC Licensed Digital Forensics Examiner (License #604180-DFE), and completed MIT Professional Education programs in AI, Blockchain, and Cybersecurity. Craig also holds CompTIA Security+, CCNA, and Hyperledger certifications.

He is an Amazon #1 Best-Selling Author of 15+ books on cybersecurity and compliance, host of the Encrypted Ambition podcast (95+ episodes on Apple Podcasts, Spotify, and Amazon), and a cybersecurity keynote speaker with 200+ engagements at conferences, law firms, and corporate boardrooms. Craig serves as Contributing Editor for Cybersecurity at NC Triangle Attorney at Law Magazine and is a guest lecturer at NCCU School of Law. He has served as a digital forensics expert witness in federal and state court cases involving cybercrime, cryptocurrency fraud, SIM-swap attacks, and data breaches.

Under his leadership, Petronella Technology Group has served 2,500+ clients, maintained a zero-breach record among compliant clients, earned a BBB A+ rating every year since 2003, and been featured as a cybersecurity authority on CBS, ABC, NBC, FOX, and WRAL. The company leverages SOC 2 Type II certified platforms and specializes in AI implementation, managed cybersecurity, CMMC/HIPAA/SOC 2 compliance, and digital forensics for businesses across the United States.

CMMC-RP NC Licensed DFE MIT Certified CompTIA Security+ Expert Witness 15+ Books
Related Service
Enterprise IT Solutions & AI Integration

From AI implementation to cloud infrastructure, PTG helps businesses deploy technology securely and at scale.

Explore AI & IT Services
Previous All Posts Next
Free cybersecurity consultation available Schedule Now