Backup & Recovery — Core Concepts
RPO vs RTO
Two numbers every infrastructure conversation eventually reaches. RPO tells you how much data you can afford to lose. RTO tells you how long you can afford to be down. The business defines them — you design and price the solution that meets them.
The definitions
RPO
Recovery Point Objective
"How much data can we afford to lose?"
Expressed as a time window — the maximum age of data you would restore from. An RPO of 24 hours means you accept losing up to 24 hours of changes in a disaster. Controlled by backup frequency.
RTO
Recovery Time Objective
"How long can the system be down?"
Expressed as a duration — the maximum acceptable outage from failure to restoration. An RTO of 4 hours means the business can survive a 4-hour outage before impact becomes unacceptable. Controlled by recovery infrastructure.
Visualized on a timeline
RPO = data loss window · RTO = recovery duration
Last backup
Disaster
Restored
RPO — data in this gap is lost
RTO — system is down for this duration
Why this matters
Making risk tolerance measurable
Anyone can say "we need good backups." RPO and RTO make that statement measurable — and measurable means you can design for it, price it, and hold someone accountable to it.
RPO and RTO are independent dimensions. A financial trading system might need an RPO of seconds and an RTO of minutes — every transaction matters and downtime costs millions per hour. A dev environment might accept an RPO of 24 hours and an RTO of 4 hours — losing a day of work is annoying, not catastrophic.
The business defines acceptable values based on the cost of downtime versus the cost of protection. The infrastructure team designs the solution that meets those targets within budget.
Azure recovery mechanisms
Azure Backup
VM restore from vault
RPOHours
RTO30 min – hours
CostLow
Use caseDeletion, corruption, ransomware
Azure Site Recovery
Continuous replication
RPO~30 seconds
RTOMinutes
Cost~$25+/VM/mo
Use caseRegional outage, DR compliance
Availability Zones
Multi-zone redundancy
RPO~Zero
RTOSeconds
Cost2–3× VM count
Use caseHardware failure, zone outage
How backup policy sets RPO
Frequency is the lever
The backup schedule directly determines your RPO ceiling. A daily backup at 11pm UTC means an RPO of up to 23 hours 59 minutes — a failure at 10:59pm means restoring from yesterday.
Retention tiers (7-day daily, 4-week weekly, 3-month monthly) do not affect RPO. They control how far back you can go when choosing a restore point. RPO is set by frequency. Retention is set by compliance and business requirements.
Project 10 policy — pol-daily-7day
| Setting |
Value |
What it means |
| Schedule |
Daily · 11:00 PM UTC |
RPO ceiling of ~24 hours |
| Daily retention |
7 days |
Restore to any point in the last week |
| Weekly retention |
4 weeks |
Restore to start of any week in last month |
| Monthly retention |
3 months |
Restore to start of any month in last quarter |
| Vault redundancy |
LRS |
Survives rack/building failure. Does not survive regional outage — use GRS for production |
Interview answer — three parts:
1. Definitions: RPO is how much data you can lose — set by backup frequency. RTO is how long recovery takes — set by recovery infrastructure.
2. The trade-off: Lower RPO and RTO both cost more. The business defines acceptable values based on the cost of downtime vs the cost of protection.
3. Azure mapping: Azure Backup daily schedule sets RPO. Vault restore time sets RTO baseline. Azure Site Recovery pushes RPO to seconds and RTO to minutes — at significantly higher cost. Right answer depends on workload criticality.
Quick reference
RPO
Data loss window
Set by backup frequency
RTO
Recovery duration
Set by recovery infrastructure
Lower RPO
More frequent backups → more storage cost
Lower RTO
Faster recovery tooling → more infrastructure cost
Production minimum
GRS vault + daily backup + tested restore
Production DR
ASR for critical workloads only — cost justifies selectively