ostebovik.net / resources AZ-104 · Interview Reference

Backup & Recovery — Core Concepts

RPO vs RTO

Two numbers every infrastructure conversation eventually reaches. RPO tells you how much data you can afford to lose. RTO tells you how long you can afford to be down. The business defines them — you design and price the solution that meets them.

The definitions

RPO

Recovery Point Objective

"How much data can we afford to lose?"

Expressed as a time window — the maximum age of data you would restore from. An RPO of 24 hours means you accept losing up to 24 hours of changes in a disaster. Controlled by backup frequency.

RTO

Recovery Time Objective

"How long can the system be down?"

Expressed as a duration — the maximum acceptable outage from failure to restoration. An RTO of 4 hours means the business can survive a 4-hour outage before impact becomes unacceptable. Controlled by recovery infrastructure.

Visualized on a timeline

RPO = data loss window · RTO = recovery duration

Last backup Disaster Restored

RPO — data in this gap is lost

RTO — system is down for this duration

Why this matters

Making risk tolerance measurable

Anyone can say "we need good backups." RPO and RTO make that statement measurable — and measurable means you can design for it, price it, and hold someone accountable to it.

RPO and RTO are independent dimensions. A financial trading system might need an RPO of seconds and an RTO of minutes — every transaction matters and downtime costs millions per hour. A dev environment might accept an RPO of 24 hours and an RTO of 4 hours — losing a day of work is annoying, not catastrophic.

The business defines acceptable values based on the cost of downtime versus the cost of protection. The infrastructure team designs the solution that meets those targets within budget.

Azure recovery mechanisms

Azure Backup

VM restore from vault

RPOHours

RTO30 min – hours

CostLow

Use caseDeletion, corruption, ransomware

Azure Site Recovery

Continuous replication

RPO~30 seconds

RTOMinutes

Cost~$25+/VM/mo

Use caseRegional outage, DR compliance

Availability Zones

Multi-zone redundancy

RPO~Zero

RTOSeconds

Cost2–3× VM count

Use caseHardware failure, zone outage

How backup policy sets RPO

Frequency is the lever

The backup schedule directly determines your RPO ceiling. A daily backup at 11pm UTC means an RPO of up to 23 hours 59 minutes — a failure at 10:59pm means restoring from yesterday.

Retention tiers (7-day daily, 4-week weekly, 3-month monthly) do not affect RPO. They control how far back you can go when choosing a restore point. RPO is set by frequency. Retention is set by compliance and business requirements.

Project 10 policy — pol-daily-7day

Setting	Value	What it means
Schedule	Daily · 11:00 PM UTC	RPO ceiling of ~24 hours
Daily retention	7 days	Restore to any point in the last week
Weekly retention	4 weeks	Restore to start of any week in last month
Monthly retention	3 months	Restore to start of any month in last quarter
Vault redundancy	LRS	Survives rack/building failure. Does not survive regional outage — use GRS for production

Interview answer — three parts:

1. Definitions: RPO is how much data you can lose — set by backup frequency. RTO is how long recovery takes — set by recovery infrastructure.

2. The trade-off: Lower RPO and RTO both cost more. The business defines acceptable values based on the cost of downtime vs the cost of protection.

3. Azure mapping: Azure Backup daily schedule sets RPO. Vault restore time sets RTO baseline. Azure Site Recovery pushes RPO to seconds and RTO to minutes — at significantly higher cost. Right answer depends on workload criticality.

Quick reference

RPO

Data loss window
Set by backup frequency

RTO

Recovery duration
Set by recovery infrastructure

Lower RPO

More frequent backups → more storage cost

Lower RTO

Faster recovery tooling → more infrastructure cost

Production minimum

GRS vault + daily backup + tested restore

Production DR

ASR for critical workloads only — cost justifies selectively