Leadership Archives - Site Reliability Engineer Blog

As a DevOps manager, measuring your team’s performance and impact is crucial for guiding the next year’s roadmap. By applying proven practices like the DORA research combined with SRE tools such as SLOs (Service Level Objectives) and SLIs (Service Level Indicators), you can design a highly practical approach to team performance evaluation.

🚀 DORA and SLO/SLIs practices

There are several framework which can guide you through measuring team performance, but I prefer DORA research with SLO/SLI and Error Budget due to their simplicity and practical application.

Using both in combination we can make a 3 steps plan where:

SLO/SLI provides the method of tracking and implementing improvements
DORA suggests which high-impact metrics to track, offering valuable industry standards for top-performing teams

Lets look at the steps in details.

Step 1: Define Key Performance Indicators (KPIs)

Establish team performance indicators that track the core of your day-to-day work.

Delivery Lead Time: How long does it take for a work item (or commit) to be delivered to production? (This measures end-to-end efficiency, excluding non-engineering activities like meetings for simplicity.)
Quality of Work (Defect Rate): How many defects or bugs were identified per feature, Merge Request (MR), or ticket?
Business Value Delivered: If a feature is linked to a business outcome, this value must be quantifiable. An example is the percentage of users adopting a new flow. Use A/B tests or pre/post-release data to determine the net positive or negative business impact.

Step 2: Set Healthy Metrics Thresholds

Involve your team in setting metrics thresholds. These must act as a healthy motivator for improvement.

Setting thresholds too high can lead to negative morale and burnout.
Setting them too low won’t provide sufficient learning, research, or challenging opportunities.

Examples of Thresholds:

Delivery Lead Time: Is our delivery time consistently below one week (or one day)?
Quality of Work: Is our defect rate kept below the team’s historical average?
Business Value: Does a new feature drive at least a 10% increase in customer engagement/conversion?

Step 3: Create a Data-Driven Improvement Plan

To ensure your improvement plan is robust, collect sufficient evidence of real use cases where metrics exceeded their established thresholds.

Ask targeted questions based on the failing metrics:

If Delivery Lead Time is too high: Are our integration tests too slow? Is the deployment pipeline inefficient?
If Quality of Work is low: Do work items have too many bugs affecting overall quality and delivery speed?
If Business Value is low: Did the delivered feature fail to have a positive impact on the end-user or business?

✅ Final Thoughts: Start Small, Think Big

Focus on the Feedback Loop: Start with just one or two key metrics for a single application. Validate the full feedback loop—from measurement to action—before scaling up.
Drive Decisions: Metrics without action are useless. Use clear thresholds and concepts like the Error Budget to drive immediate, data-backed decisions in collaboration with your Product team.
Focus on Core Metrics: Just like a good coach focuses on the fundamentals, concentrate on tracking the most essential core metrics that define your team’s top-level performance.
Small Adjustments, Big Results: Each metric can drive a small, specific adjustment. Many small, cumulative improvements throughout the year will ultimately bring your team to a new level of performance.

Notable Frameworks

DORA metrics are excellent for measuring delivery performance and stability (throughput and stability), but they intentionally leave out key factors like developer well-being, internal collaboration, and business value alignment.

SPACE

The primary framework designed to address these “human factors” and provide a more holistic view of team health is SPACE. Developed by researchers from Microsoft and GitHub, SPACE is a multi-dimensional approach to measuring developer productivity. It deliberately moves beyond just output to include the human and systemic factors that drive sustainable performance.

Developer Experience (DevEx) Framework

DevEx is a highly focused framework that specifically measures the lived experience of the developer. The core principle is that improving DevEx directly leads to improved productivity, retention, and code quality.

Summary

To create a truly “healthy team” metric set, I recommend using DORA for the technical engine (delivery and stability) and then overlaying key metrics from SPACE (especially Satisfaction/Well-being and Collaboration/Efficiency) to measure the human engine.

Building a tech strategy is a core responsibility of the CTO, VP of Engineering, or Head of Engineering. Involving team leaders in this process ensures a more grounded and effective approach.

Tech team leaders play a crucial role by defining roadmaps for their teams, which, in turn, provide the foundation for an effective high-level strategy. To achieve the best results, continuous collaboration between leadership and team leaders is essential.

Let’s explore how to create a roadmap that is both practical and aligned with the company’s overall vision.

Building an Effective Team Roadmap

A team roadmap is a strategic document that outlines product needs, infrastructure requirements, modernization efforts, and compliance and security considerations, among other critical aspects.

An effective roadmap goes beyond listing high-level initiatives or goals. It expands on each goal using the Diagnosis, Policy, and Actions framework, helping to answer the Why, What, and How of every initiative. This approach fosters trust, alignment, and transparency with top-level leadership.

The Diagnosis, Policy, and Actions framework, developed by Richard Rumelt, consists of:

Diagnosis – Defining the problem that needs to be addressed
Policy – Establishing guiding principles and constraints for the solution
Actions – Defining concrete steps to implement the solution within the given policy

Let’s explore a few examples.

Example 1: Modernizing the Infrastructure

Diagnosis: Our current infrastructure relies on outdated and proprietary components, leading to scalability challenges, high maintenance costs, and slow adoption of new technologies.

Policy: Prioritize open-source and cloud-native solutions for new developments. Maintain legacy systems where necessary but avoid further expansion of proprietary technologies.

Actions:

Identify and replace critical proprietary components with open-source or cloud-native alternatives.
Standardize infrastructure automation and provisioning to improve scalability and maintainability.
Update internal documentation and on-boarding materials to reflect new infrastructure standards.

Example 2: Upgrade the Database

Diagnosis: The current database version has reached end-of-life and is no longer receiving security updates or feature enhancements. An upgrade is necessary to maintain security, stability, and performance.

Policy: The database upgrade must be performed with zero downtime to avoid service disruptions.

Actions:

Test new database version in the QA environment to ensure compatibility
Create a full backup of the existing database.
Implement a Blue-Green deployment strategy to minimize risk during the upgrade.
Communicate the upgrade plan and schedule a rollout window.

Example 3: Improve Cloud Cost Efficiency

Diagnosis: Cloud expenses represent a significant portion of overall costs. Unused or underutilized resources contribute to unnecessary costs.

Policy: Optimize cloud usage by right-sizing instances, using auto-scaling, and enforcing cost-control policies.

Actions:

Conduct an audit of cloud resources to identify inefficiencies.
Implement auto-scaling policies for workloads with variable demand.
Use reserved or spot instances for predictable workloads.
Set up monitoring and alerts for unexpected cost spikes.

Conclusion

Structuring your team’s roadmap using the Diagnosis, Policy, and Actions framework ensures clear prioritization and alignment with the company’s overall strategy.
This approach facilitates productive discussions with top-level leadership, leading to better decision-making.
It improves transparency, trust and accountability across all levels of the organization.

Have you faced challenges when implementing a strategic roadmap? How did you overcome them? Drop a comment below and let’s learn from each other!

Category: Leadership

Improving Team Performance Measurement: A DORA and SLO/SLI Approach

🚀 DORA and SLO/SLIs practices

Step 1: Define Key Performance Indicators (KPIs)

Step 2: Set Healthy Metrics Thresholds

Step 3: Create a Data-Driven Improvement Plan

✅ Final Thoughts: Start Small, Think Big

Notable Frameworks

Tech Team Leaders’ Guide to Strategy

Building an Effective Team Roadmap

Example 1: Modernizing the Infrastructure

Example 2: Upgrade the Database

Example 3: Improve Cloud Cost Efficiency

Conclusion