Previous All Posts Next

Proxmox Backup Server: Production Configuration Guide

Posted: December 31, 1969 to Technology.

Home-lab server rack with enterprise hard drives, blue activity LEDs, organized Cat6 cabling, ambient light

Proxmox Backup Server (PBS) is the piece of the Proxmox stack most admins stand up last, and most admins regret standing up last. Petronella Technology Group runs PBS as the primary backup target for our internal Proxmox Virtual Environment (PVE) clusters, including the hosts behind the private AI cluster we operate for clients in regulated verticals. Those same clusters run our ten-plus production AI agents (Penny on (919) 348-4912, Peter the chat agent on petronellatech.com, ComplyBot on petronella.ai, and the rest of the fleet), so when we say PBS is our primary backup target, we are talking about the backup target for our own revenue-bearing infrastructure as well as the client workloads we manage. This guide is the version of the PBS install we wish somebody had handed us in 2021: the hardware math, the ZFS layout, the datastore flags, the verify cadence, the restore test, and the boring stuff that decides whether you actually get your VMs back at 3 a.m. on a Sunday.

This walkthrough covers Proxmox Backup Server 4.1.6-1, which is the current stable release published at the time of writing (see the PBS installation docs). Almost everything here applies cleanly to the 3.x line too, with the exception of the .sources DEB822 repository format and the S3 backend that shipped with the 4.x branch.

If you want the short version: use a dedicated box, put the datastore on ZFS, set a real retention policy, turn on client-side encryption, schedule garbage collection and verify jobs during quiet hours, and restore a VM every month to prove the whole thing actually works. The rest of this post is how.

Why PBS instead of vzdump to NFS

PVE has had vzdump for years, and a lot of shops still dump straight to NFS. That works right up until it doesn't. PBS changes three things that matter:

  1. Deduplication and incremental. PBS splits backups into content-addressed chunks and only sends the delta between snapshots. Per the PBS introduction, "reading and sending only the delta reduces the storage and network impact of backups." In practice, a daily backup of a 500 GB VM routinely moves less than 5 GB over the wire after the first full.
  2. Client-side encryption. PBS supports AES-256-GCM encryption with keys that live on the PVE hosts, not on the backup server. The backup server stores opaque ciphertext. Even an attacker with full root on PBS cannot read your data. That single property is why we default to PBS for every CMMC and HIPAA client we onboard.
  3. Verify jobs. PBS can cryptographically re-check every chunk on disk against its stored hash on a schedule. NFS has no equivalent. A silent bit-rot on your backup target is a backup you do not actually have.

The fourth thing that matters, though it is less about PBS itself and more about how we use it, is that deduplicated-and-encrypted backups slot cleanly into a proper 3-2-1 strategy: three copies of every VM (primary on PVE, on-site PBS, off-site PBS sync target), two different media classes (ZFS on the hot datastore, an S3-backed or spinning-rust cold tier off-site), and one copy that is geographically and administratively separate. Every managed IT services engagement we run ships with that topology by default, and the disaster-recovery rehearsals we schedule for clients are built around it. A single PBS box on a NAS in the same closet is a ransomware incident waiting to happen.

With that settled, let's build one.

Planning: hardware sizing and network

The PBS installation guide lists minimums that are honestly too low for anything past a homelab. The recommended production spec is the one to plan against:

  • CPU: modern 64-bit AMD or Intel, 4+ cores. Verification and garbage collection are CPU-bound on chunk hashing.
  • RAM: 4 GB baseline for the OS, plus 1 GB per TB of backup storage. A 40 TB datastore wants 44 GB RAM before you even account for ZFS ARC. This is not optional if you care about GC and verify speed.
  • OS disk: 32 GB minimum with hardware RAID and battery-backed cache, or a redundant ZFS mirror. Keep the OS off the backup pool.
  • Backup disks: enterprise SSDs for anything serious. If you use HDDs, you need a ZFS special device (mirrored SSDs) for the metadata, or GC walks will take days.
  • Network: redundant multi-gigabit. 10 GbE is the sweet spot; 2x 1 GbE LACP is the floor.

For the network layout, do not share your backup traffic with your client-facing VLAN. Even though the PBS network management docs do not mandate a dedicated backup network, every production deployment we have built uses one. A PBS verify job will happily saturate a 10 GbE link. Your users do not want to hear about that. Put PBS on a separate VLAN or physical interface that only the PVE nodes and the admin jump host can reach.

Size the datastore generously. The deduplication ratio depends heavily on your workload, but most mixed-VM Windows and Linux fleets land between 4:1 and 10:1 once you factor in the incremental delta chain. Plan for raw-storage = (average VM size) * (VM count) * 1.3 to give yourself headroom for growth, retention overlap, and GC slack.

Installation: ISO vs Debian-on-top

PBS ships two install paths:

  1. ISO install. Download the hybrid ISO from proxmox.com, flash it, boot it, done. This is what we use for 95% of production deployments because it pins the kernel and the ZFS version for you.
  2. Debian install. If you have a golden Debian Trixie image or you are deploying into a regulated environment that requires a specific base OS, install PBS on top of a fresh Debian. This is the path documented below because it exposes every moving part.

Debian install procedure

Start with a fresh minimal Debian 13 (Trixie) install on the OS disk. Then, from the PBS installation guide, add the Proxmox repository key and the PBS repository:

# install the archive key
wget https://enterprise.proxmox.com/debian/proxmox-archive-keyring-trixie.gpg \
  -O /usr/share/keyrings/proxmox-archive-keyring.gpg

Create /etc/apt/sources.list.d/proxmox.sources (no-subscription repo, suitable for evaluation and homelab; paid sites should use the enterprise repo):

Types: deb
URIs: http://download.proxmox.com/debian/pbs
Suites: trixie
Components: pbs-no-subscription
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg

If you have a subscription, use /etc/apt/sources.list.d/pbs-enterprise.sources instead:

Types: deb
URIs: https://enterprise.proxmox.com/debian/pbs
Suites: trixie
Components: pbs-enterprise
Signed-By: /usr/share/keyrings/proxmox-archive-keyring.gpg

Then pull the full stack:

apt update
apt install proxmox-backup

The proxmox-backup meta-package brings in the Proxmox-patched kernel, ZFS, and the backup server itself. After a reboot, you should be able to reach the web UI at https://<pbs-ip>:8007. Log in with your Linux root account and you'll land on the dashboard.

Storage: ZFS is the right default

You can build a PBS datastore on ext4 or XFS. Do not. The PBS storage docs formally support ext4, XFS, ZFS, and S3-backed stores, but two operational realities push us to ZFS for on-prem deployments every time: integrated checksumming catches silent corruption early, and snapshots make migration and DR rehearsal trivial.

Build the pool

For a six-disk RAIDZ2 with SSD special device, from the PBS system administration docs:

# verify disks and pick the right /dev/disk/by-id/ paths
ls -la /dev/disk/by-id/ | grep -v part

# create the pool with ashift 12 for 4K-native drives
zpool create -f -o ashift=12 backup-pool \
  raidz2 /dev/disk/by-id/ata-ST16000NE000-XXXXXXXX_1 \
         /dev/disk/by-id/ata-ST16000NE000-XXXXXXXX_2 \
         /dev/disk/by-id/ata-ST16000NE000-XXXXXXXX_3 \
         /dev/disk/by-id/ata-ST16000NE000-XXXXXXXX_4 \
         /dev/disk/by-id/ata-ST16000NE000-XXXXXXXX_5 \
         /dev/disk/by-id/ata-ST16000NE000-XXXXXXXX_6 \
  special mirror /dev/disk/by-id/nvme-SAMSUNG_MZ1LB1T9_1 \
                 /dev/disk/by-id/nvme-SAMSUNG_MZ1LB1T9_2

# tune the pool
zfs set compression=lz4 backup-pool
zfs set atime=off backup-pool
zfs set xattr=sa backup-pool
zfs set recordsize=1M backup-pool

A few notes on the settings above, all sourced from the Proxmox ZFS admin guide:

  • ashift=12 matches 4 KiB sector drives, which is nearly every modern spinning disk and SSD.
  • compression=lz4 adds almost no CPU overhead and routinely nets 20 to 40% space savings on typical VM backups.
  • special mirror on NVMe hosts all the small metadata blocks, which makes GC orders of magnitude faster on HDD pools. Skip this only if your pool is already all-SSD.
  • ARC limit: ZFS will eat RAM by default. In /etc/modprobe.d/zfs.conf, cap ARC at roughly 50% of system RAM. For a 64 GB box: options zfs zfs_arc_max=34359738368 then update-initramfs -u.

Do not put ZFS on a hardware RAID controller with its own cache. Use an HBA in IT mode. The Proxmox docs are explicit about this.

Create the datastore

With the pool in place, create the PBS datastore. From the PBS storage docs:

# create a ZFS dataset for the datastore
zfs create backup-pool/store1

# register it with PBS
proxmox-backup-manager datastore create store1 /backup-pool/store1

Verify it landed:

proxmox-backup-manager datastore list
proxmox-backup-manager datastore show store1

If you want PBS to handle the whole disk setup itself, the one-liner proxmox-backup-manager disk fs create store1 --disk sdX --filesystem ext4 --add-datastore true works too, but it locks you into ext4 and loses the ZFS benefits above.

Retention: actual prune policies

The PBS maintenance docs list every retention knob. The ones that matter:

  • keep-last: absolute number of most-recent backups, regardless of time.
  • keep-hourly, keep-daily, keep-weekly, keep-monthly, keep-yearly: one backup kept per time bucket.

These are additive, not exclusive. A policy of keep-last=3, keep-daily=7, keep-weekly=4, keep-monthly=12, keep-yearly=3 keeps the 3 most recent backups plus one per day for a week plus one per week for a month plus one per month for a year plus one per year for three years. That is a reasonable default for a production VM.

Set the policy and the schedule on the datastore:

proxmox-backup-manager datastore update store1 \
  --keep-last 3 \
  --keep-daily 7 \
  --keep-weekly 4 \
  --keep-monthly 12 \
  --keep-yearly 3 \
  --prune-schedule 'daily 01:00' \
  --gc-schedule 'Tue 04:27'

That stanza sets daily pruning and weekly garbage collection. The odd 04:27 is intentional: the PBS docs recommend off-peak times that do not collide with round-numbered schedules used elsewhere.

How garbage collection actually works

From the PBS maintenance docs, GC runs in two phases. Phase one walks every index file and touches the access time of every chunk that is still referenced. Phase two deletes any chunk whose access time is older than the cutoff, which is the earlier of (a) the start time of the oldest running backup or (b) 24 hours and 5 minutes before GC started.

That 24h5m grace is what keeps GC safe against in-flight writers. It also means GC does not immediately reclaim space from a backup you just pruned; that space frees up on the next GC pass, which is one reason weekly GC is usually enough.

Verify jobs: the one you cannot skip

Deduplication means chunks are shared across backup snapshots. A single corrupted chunk can poison dozens of backups. The only defense is to re-hash chunks on a schedule and compare against the stored hashes. PBS calls these verify jobs.

Manual verification:

proxmox-backup-manager verify store1 --read-threads 4 --verify-threads 4 --ignore-verified false

To make it recurring, configure a verify job through the web UI (Datastore -> store1 -> Verify Jobs -> Add) or via the CLI. Sensible defaults:

  • Schedule: once a week during a low-activity window.
  • Ignore verified: true after the first full pass, so you only re-verify chunks that have never been checked or are past the re-verify age.
  • Re-verify after: 30 days (chunks verified within the last 30 days are skipped).

A verify job that finds a bad chunk marks the affected snapshot as corrupt, which is exactly what you want: you learn about rot before you need the backup, not during the restore.

Administrator hand holding a 3.5-inch enterprise SATA drive, PCB visible, workbench with disk caddy blurred

Client-side encryption

For regulated workloads this is the most important setting on the server. Generate a key on the PVE host (not on PBS), then reference it from the storage config.

From the PBS backup client docs:

# on the PVE host, as root
mkdir -p /etc/pve/priv/backup-keys
proxmox-backup-client key create /etc/pve/priv/backup-keys/pbs-store1.key

The command prompts for a passphrase. Protect the output file: it is the only thing standing between a stolen PBS disk and your customer data. Copy it off the PVE cluster to an offline store (a YubiKey, a printed QR via the --paperkey flag, an HSM). The PBS backup client docs also describe master key pairs for enterprise recovery:

proxmox-backup-client key create-master-key

A master key pair lets an administrator recover data even if the per-host passphrase is lost. In regulated environments this is almost always the right setup: the private master key lives in escrow (air-gapped), the public master key is distributed to every PVE host, and every backup is encrypted such that either the per-host passphrase or the master private key can decrypt.

Wire PBS into the PVE cluster

On every PVE node, or once cluster-wide via /etc/pve/storage.cfg which is replicated automatically, add PBS as a storage target. The canonical way is the web UI (Datacenter -> Storage -> Add -> Proxmox Backup Server), but the pvesm CLI works too:

pvesm add pbs pbs-store1 \
  --server pbs.internal.example.com \
  --datastore store1 \
  --username root@pam \
  --password \
  --fingerprint 'AA:BB:CC:...:ZZ' \
  --encryption-key /etc/pve/priv/backup-keys/pbs-store1.key

The fingerprint is required so PVE can validate the PBS certificate. Grab it from the PBS dashboard or with:

proxmox-backup-manager cert info | grep Fingerprint

Once the storage is mounted, schedule backups from the PVE web UI under Datacenter -> Backup. Those jobs land in /etc/pve/jobs.cfg, as documented in the PVE backup and restore wiki, and are executed by pvescheduler. For ad-hoc backups, the classic vzdump still works:

vzdump 101 --storage pbs-store1 --mode snapshot --notes-template '{{vmid}} {{guestname}} adhoc'

Namespaces and multi-tenant PBS

If you are running PBS for multiple clients or environments, use namespaces. A namespace is a logical subtree inside a datastore that gets its own ACLs and quotas. From the PBS maintenance docs, namespaces accept prune and verify configuration independently, so you can keep client-a on a 90-day retention while client-b runs 365 days in the same datastore.

Create a namespace:

proxmox-backup-manager namespace create store1 client-a

Then point the PVE storage at that namespace with --namespace client-a when adding the PBS target. This is also how we keep production, staging, and lab backups from co-mingling on a single box.

Network traffic control

The PBS network management docs expose a token bucket filter for bandwidth limits. The one we most often use caps remote sync traffic during business hours:

proxmox-backup-manager traffic-control create office-hours \
  --rate-in 50000000 \
  --rate-out 50000000 \
  --network 0.0.0.0/0 \
  --timeframe 'mon..fri 08:00-18:00'

That limits everyone to 50 MB/s during the business day and removes the cap outside those hours. One gotcha: sync jobs running on PBS itself are not affected by these rules, which matters if you use PBS-to-PBS replication.

Sync jobs and off-site copies

Backups that live in the same room as the primary workload are not backups. PBS supports sync jobs that replicate one datastore to another PBS instance, including a remote one over the internet. The typical topology is:

  1. Primary PBS in the production rack, close to the PVE cluster on the backup VLAN.
  2. Secondary PBS in a separate building (or a hosted facility) pulling from the primary via a nightly sync job.

On the secondary, register the primary as a remote and create the sync job:

proxmox-backup-manager remote create prod-pbs \
  --host pbs.prod.example.com \
  --userid sync@pbs \
  --password \
  --fingerprint 'AA:BB:CC:...:ZZ'

proxmox-backup-manager sync-job create prod-to-dr \
  --remote prod-pbs \
  --remote-store store1 \
  --store dr-store1 \
  --schedule 'daily 03:30'

Use a dedicated sync@pbs API user with a restricted ACL (read-only on the primary, write on the remote namespace). Never sync as root.

Monitoring, alerting, and the boring stuff

A PBS that silently fails is worse than no PBS at all. The stack we deploy always includes:

  • SMART monitoring. smartd from smartmontools, configured to email on any pre-fail attribute change. /etc/smartd.conf gets a line per disk.
  • ZFS scrub schedule. /etc/cron.d/zfsutils-linux ships with a monthly scrub by default on Debian; verify it's there and the email destination is real.
  • zpool status -v checked nightly by a wrapper script that pages on DEGRADED or FAULTED.
  • PBS email notifications. Set /etc/proxmox-backup/notifications.cfg with an SMTP target and enable both job-failure and GC-verify notifications.
  • Prometheus metrics. PBS exposes a metrics endpoint at /api2/json/status/metrics when you enable the built-in metric server. Scrape it from Prometheus and alert on backup_count staleness and verify failures.

Disaster recovery: the restore drill

The correct measure of a backup system is not throughput or dedup ratio. It is minutes to first restored VM boot during an incident. Schedule a quarterly drill on every PBS you operate:

  1. Pick a non-critical VM at random.
  2. Restore it to an isolated PVE node from last night's snapshot.
  3. Boot it on an isolated VLAN.
  4. Validate: OS boots, application starts, database is consistent.
  5. Time the whole thing. Write the number down.

If the number keeps going up, your dedup chain is getting long or your restore pipe is narrow. Both are fixable and neither will fix itself.

Common pitfalls

A short tour of the failure modes we have personally tripped over:

  • No ZFS special device on an HDD pool. GC walks take 30+ hours on large datastores. Add a mirrored NVMe special device and re-run.
  • ARC too large. Default ZFS eats half your RAM; on a 64 GB PBS that leaves 32 GB for dedup tables, verify threads, and everything else. Cap ARC explicitly.
  • Forgetting the encryption key backup. A stolen PBS disk is harmless if client-side encryption is on and the key is safe. A lost key is just as catastrophic. Print the paperkey. Put it in a safe.
  • Pruning without verifying first. A prune on top of corrupt chunks silently drops your only good copy of the parent snapshot. Verify first, then prune.
  • Shared VLAN for backups and production. A single misconfigured firewall rule sends your nightly 40 GB full across the wrong link at 9 a.m. Dedicate the VLAN.
  • No off-site. Ransomware that compromises PVE will compromise PBS too if they share credentials. Your sync target must run on separate credentials and, ideally, a separate admin boundary.

How Petronella deploys and manages PBS

Petronella Technology Group has been building on Proxmox since the 2.x days. We run managed Proxmox clusters as the foundation for our internal AI workloads and for clients who need private virtualization under CMMC, HIPAA, or contractual data-sovereignty constraints. Every cluster we stand up includes a PBS target built against this guide, client-side encrypted, with verify and sync jobs scheduled before the first production VM ships.

A typical Petronella managed Proxmox engagement includes:

  • Hardware sizing and procurement against the ratios in the Planning section above, with spare-disk inventory and a tested hot-swap runbook.
  • ZFS datastore design with ashift, special device, ARC cap, and recordsize pre-tuned to the workload mix, not left on defaults.
  • A real 3-2-1 backup topology: hot PBS on the production VLAN, cold PBS in a geographically separate building, plus cold-storage copies on a separate administrative boundary so a single credential compromise cannot take out every copy.
  • Disaster-recovery design and rehearsals: quarterly restore drills with timed results, written postmortems, and targeted improvements. If RTO drifts, we find out before the incident, not during.
  • Client-side encryption key escrow with paperkey copies in a physical safe and master-key separation for regulated clients.
  • Forensics-grade chain of custody on every drive that enters a managed datastore. Each disk is serial-logged, SMART-baselined, and entered into an inventory before it is racked. Craig Petronella is a Digital Forensics Examiner (DFE #604180), which means the default documentation standard for Petronella backup infrastructure is the standard a forensic chain of custody has to meet, not the standard an MSP happens to hit.
  • Integration with the broader Petronella stack: the same Proxmox clusters that back up to these PBS targets also host our private AI cluster for regulated clients, run our ten-plus production AI agents (Penny on (919) 348-4912 for inbound sales and scheduling, Peter the site chat agent, ComplyBot for compliance Q&A, and the rest of the fleet), and feed our 24/7 AI plus human threat analysis pipeline.

Credentials behind the work

  • CMMC-AB Registered Provider Organization #1449, verifiable at the official CyberAB registry: https://cyberab.org/Member/RPO-1449-Petronella-Cybersecurity-And-Digital-Forensics. When a CMMC assessor walks through your backup and DR process, the documentation we produce was built to that assessor's checklist.
  • Full team CMMC-RP certified. Not one lead, the whole engagement team.
  • Founder credentials: Craig Petronella, DFE #604180 (Digital Forensics Examiner), CCNA, CWNE. Forensics, network engineering, and wireless engineering in one head means the person designing your backup chain of custody, the VLAN that carries your backup traffic, and the wireless separation around your admin jump host is the same person. That keeps the design coherent.
  • Founded 2002. 23 years of continuously operating IT and cybersecurity practice.
  • BBB A+ accredited since 2003. Published track record, not a LinkedIn bio.
  • Based in Raleigh, North Carolina. On-site capable across the Triangle and eastern North Carolina for rack-and-stack, drive swaps, and tabletop DR rehearsals.

If you want this built for you, or you want a second set of eyes on a PBS deployment that already exists, our team handles both. See our managed IT services for ongoing Proxmox and PBS operations, our private AI cluster solution if you are pairing PBS with regulated AI workloads, and our cybersecurity practice for the hardening and compliance side that surrounds a backup stack. Call Penny at (919) 348-4912 to book a free fifteen-minute assessment, or reach out through our contact form and we will route it to the right engineer the same business day.

Wrap

PBS is one of the highest-value pieces of infrastructure in a virtualized shop. The install is an afternoon. The operations are boring by design. The payoff shows up exactly once, on the day the primary storage array dies or the ransomware hits, and on that day the difference between a good PBS deploy and a lazy one is measured in whether your business keeps running.

Build the box on the right hardware. Put the datastore on ZFS with a special device. Turn on client-side encryption before the first backup runs. Schedule GC, prune, and verify during quiet hours. Replicate off-site. Restore a VM every quarter to prove the whole chain works. Do those six things and you will join the small club of admins who actually have backups, not just a backup job.

Need help implementing these strategies? Our cybersecurity experts can assess your environment and build a tailored plan.
Get Free Assessment

About the Author

Craig Petronella, CEO and Founder of Petronella Technology Group
CEO, Founder & AI Architect, Petronella Technology Group

Craig Petronella founded Petronella Technology Group in 2002 and has spent more than 30 years working at the intersection of cybersecurity, AI, compliance, and digital forensics. He holds the CMMC Registered Practitioner credential (RP-1372) issued by the Cyber AB, is an NC Licensed Digital Forensics Examiner (License #604180-DFE), and completed MIT Professional Education programs in AI, Blockchain, and Cybersecurity. Craig also holds CompTIA Security+, CCNA, and Hyperledger certifications.

He is an Amazon #1 Best-Selling Author of 15+ books on cybersecurity and compliance, host of the Encrypted Ambition podcast (95+ episodes on Apple Podcasts, Spotify, and Amazon), and a cybersecurity keynote speaker with 200+ engagements at conferences, law firms, and corporate boardrooms. Craig serves as Contributing Editor for Cybersecurity at NC Triangle Attorney at Law Magazine and is a guest lecturer at NCCU School of Law. He has served as a digital forensics expert witness in federal and state court cases involving cybercrime, cryptocurrency fraud, SIM-swap attacks, and data breaches.

Under his leadership, Petronella Technology Group has served 2,500+ clients, maintained a zero-breach record among compliant clients, earned a BBB A+ rating every year since 2003, and been featured as a cybersecurity authority on CBS, ABC, NBC, FOX, and WRAL. The company leverages SOC 2 Type II certified platforms and specializes in AI implementation, managed cybersecurity, CMMC/HIPAA/SOC 2 compliance, and digital forensics for businesses across the United States.

CMMC-RP NC Licensed DFE MIT Certified CompTIA Security+ Expert Witness 15+ Books
Related Service
Enterprise IT Solutions & AI Integration

From AI implementation to cloud infrastructure, Petronella Technology Group helps businesses deploy technology securely and at scale.

Explore AI & IT Services
Previous All Posts Next
Free cybersecurity consultation available Schedule Now