DORA Metrics | Agile Scrum Master
DORA Metrics are four outcome measures used to understand software delivery performance and operational stability: deployment frequency, lead time for changes, change failure rate, and time to restore service. They help teams reason about system capability and improvement over time, not individual productivity. Key elements: consistent definitions, trend analysis, segmentation by service or value stream, and pairing the metrics with qualitative context from incidents and delivery data. Used well, DORA Metrics balance speed with reliability and guide improvement experiments.
DORA Metrics:
Overview of DORA Metrics
DORA Metrics are outcome measures used to understand software delivery performance and operational stability. In the classic four-key model, they describe what the delivery system produces: how frequently changes are deployed to production, how quickly changes flow from commit to production, how often deployments require immediate intervention because they caused production problems, and how quickly service is restored when a deployment causes impairment. Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Time to Restore Service provide a balanced view of throughput and stability, helping teams see flow constraints and resilience gaps over time.
DORA Metrics are most useful when treated as signals for learning and system improvement, not targets for individuals or teams. They support empiricism: make delivery performance transparent, inspect trends and constraints, and adapt by running focused experiments. Used well, they reinforce small batches, fast feedback, and continuous improvement of the system of work, while keeping attention on customer outcomes and operational risk rather than on “hitting a tier.”
The original four DORA metrics are still widely used. Current DORA guidance has evolved into a five-metric model of software delivery performance by adding deployment rework rate and refining time to restore service into failed deployment recovery time. The classic four metrics remain a practical and recognizable starting point when teams make their definitions explicit and apply them consistently.
The Four Key DORA Metrics
In the classic four-key model, DORA Metrics are commonly defined as the following four measures. Consistency of definition matters more than absolute numbers, because trends and comparisons require stable measurement rules.
- Deployment Frequency (DF) - How often changes are deployed to production, indicating batch size and feedback speed.
- Lead Time for Changes (LT) - Time from code committed to version control to successfully running in production, showing queues, handoffs, and pipeline efficiency.
- Change Failure Rate (CFR) - Percentage of deployments that require immediate intervention such as rollback, hotfix, patch, or other urgent remediation because the deployment caused degraded service or failure in production.
- Time to Restore Service (historically MTTR) - In the classic model, time to recover service after a change-related incident. In current DORA guidance, this is refined into failed deployment recovery time: time to recover when a deployment causes impairment and requires immediate intervention.
In the classic four-key framing, DORA Metrics are often grouped into two categories:
- Throughput - Deployment frequency and lead time for changes reflect how quickly changes move through the system.
- Stability - Change failure rate and time to restore service reflect reliability of change and recovery capability.
In current DORA guidance, the grouping is slightly different: throughput includes change lead time, deployment frequency, and failed deployment recovery time, while instability is represented by change fail rate and deployment rework rate.
High performance is not “more speed at any cost.” It is the ability to deliver valuable change frequently while keeping failures rare and recovery fast.
Using DORA Metrics to improve a delivery system
DORA Metrics work best as a learning loop: observe current performance, identify the constraint, run a small experiment, and check whether outcomes improved. The goal is to increase delivery capability and reduce operational risk, not to “optimize the dashboard.”
- Define scope - Measure at application, service, product, or value stream boundaries that match how work flows and incidents occur.
- Make definitions explicit - Agree what counts as a deployment, a failed deployment, rework, and “restore,” so trends are trustworthy.
- Automate data collection - Pull from CI/CD, version control, and incident tooling to avoid manual reporting bias.
- Inspect trends - Track movement over time and look for step-changes after interventions, not single snapshots.
- Segment deliberately - Split by service criticality, architecture, or value stream so constraints are visible.
- Pair with qualitative context - Use incident reviews and customer signals to explain causes, not just symptoms.
- Adapt with experiments - Treat improvement ideas as hypotheses and keep the smallest change that tests them.
Performance Benchmarks
Benchmarks can provide orientation, but they are not the purpose. The most useful benchmark is your own trend in your own context, especially when constraints and system design are changing.
- Use benchmarks as a reference - Start conversations about capability and constraints without turning tiers into targets.
- Avoid tier-chasing - Optimizing for labels can create superficial change and metric gaming.
- Compare like with like - Only compare systems with similar risk profiles, release models, service criticality, and constraints.
DORA Metrics improvement levers
DORA Metrics improve when teams reduce batch size, remove queues, strengthen verification, and improve recovery. Pick levers based on the constraint you observe.
- Limit work in progress - Reduce queues and multitasking to improve lead time and predictability.
- Make changes smaller - Reduce blast radius and speed diagnosis, improving both failure rate and restore time.
- Increase verification quality - Shift learning earlier with reliable automated tests and clear acceptance criteria.
- Use safer release patterns - Progressive delivery, feature flags, and quick rollback reduce customer impact.
- Improve observability - Better signals reduce time to detect and time to restore service.
- Reduce friction in the pipeline - Remove manual steps and reduce wait states so feedback arrives quickly.
Interpreting DORA Metrics responsibly
DORA Metrics are outcome measures, but they require interpretation. A rising deployment frequency is not “good” if it comes with increased incidents or customer disruption. A low deployment frequency may be appropriate in high-risk contexts, provided stability and recovery are strong and the trade-off is explicit.
Responsible use includes clear boundaries, accounting for different release models, and avoiding simplistic comparisons across teams with different constraints. These metrics are most meaningful at the application or service level; aggregating unlike systems can hide real bottlenecks. When metrics worsen, treat it as data about the system and its bottlenecks, not as evidence that people are “underperforming.”
Related measures that complement DORA Metrics
DORA Metrics are a strong core, but complementary measures can help explain constraints and guide experiments. If you are using the classic four-key model, these measures add context. If you are using current DORA guidance, deployment rework rate belongs in the core five-metric model rather than only as a supplement.
- Pipeline cycle time - Time in build/test stages, indicating automation performance and queue bottlenecks.
- Defect escape rate - Issues found after release, indicating verification or discovery gaps.
- Error budgets - A reliability boundary that helps decide when to prioritize stability work over feature work.
- Reliability measures (SLIs/SLOs) - Useful alongside DORA Metrics to show whether the service meets user-facing reliability expectations.
- Deployment rework rate - In current DORA guidance, this is a core metric showing the percentage of deployments that are unplanned work to fix production issues.
Current DORA guidance also refines recovery measurement from time to restore service toward failed deployment recovery time, which focuses specifically on restoring service after a deployment causes impairment, not after unrelated outages.
Improving DORA Metrics
- Deployment frequency - Reduce batch size and release friction so valuable change reaches users more often with less risk.
- Lead time for changes - Remove queues and handoffs, improve CI performance, and make environments reproducible.
- Change failure rate - Strengthen verification and prevent repeat failures through learning from incidents.
- Time to restore service / failed deployment recovery time - Improve detection, runbooks, rollback paths, and on-call readiness to recover fast when deployments cause impairment.
Common misuses and guardrails
DORA Metrics are often misused as targets or ranking mechanisms. That drives gaming and reduces transparency, which harms improvement.
- Using metrics to judge individuals - Looks like performance management by dashboard; it hurts because people game and hide problems; do instead: measure systems and service-level outcomes and improve constraints, tooling, and ways of working.
- Gaming the metrics - Looks like artificial deployments or splitting changes to inflate frequency; it hurts because it creates noise without improving outcomes; do instead: tie metric movement to customer impact, incident learning, and actual flow improvement.
- Mixing deployment and release definitions - Looks like counting production deployments in one period and user-visible releases in another; it hurts because trends stop meaning the same thing; do instead: define the event boundary once and keep it stable.
- Lack of context - Looks like reacting to a number without understanding causes; it hurts because fixes target symptoms; do instead: pair trends with incident reviews, delivery notes, and qualitative signals.
- Overemphasis on benchmarks - Looks like chasing tiers as goals; it hurts because it encourages shallow compliance; do instead: use benchmarks only as rough reference and set aims based on constraints and outcomes.
- Optimizing speed without stability - Looks like pushing throughput while failures rise; it hurts because trust and resilience degrade; do instead: improve throughput and stability together using safer release and recovery practices.
- Comparing unlike systems - Looks like ranking very different applications, architectures, or risk profiles against each other; it hurts because it hides local constraints and drives unfair conclusions; do instead: compare within similar service boundaries and use trends over time.
DORA Metrics are four measures of delivery speed and stability used to understand how reliably a team can ship changes and recover from failure over time

