So many productivity metrics 🤯 Which one should you use?
Learn how to choose the best metrics that serve your team
Which metrics do I need? —Advice for leaders
You dread the meeting with your C-levels. They expect you to explain in detail whether the huge investments of time and money into the engineering department yield returns. This is much harder than it seems. You have your CI/CD pipeline, version control and some form of ticket management tool.
How do you translate that into delays, returns, problem areas and growth opportunities?
🔵 typo —ship better software faster
Typo is a Software delivery intelligence platform that enables modern software teams with visibility, insights & tools to code better, deploy faster & stay aligned with business goals.
By seamlessly integrating with the dev tool stack (Git, Issue Tracker, CI/CD, Incident Management, Slack), Typo connects the dots between engineering signals & developer well-being to build efficient teams that deliver on time.
Typo helps tech teams deliver at every level:
For Leaders. Align engineering with business goals & support teams to deliver their best with developer well-being insights
For Managers. Identify delivery bottlenecks, predict release delays & drive continuous improvement with customized team goals & weekly progress reports
For Developers. Ship code faster with automated dev workflows in Slack & measure the impact of your work on team goals.
Typo is a sponsor of Crafting Tech Teams 🙏
Reframe and serve
Understanding these two aspects will eliminate the need to second-guess yourself:
Reframe questions to understand what decisions the metrics enable. Are two options for the roadmap being compared? Is the CFO looking to cut a department? Is headcount increasing or decreasing—in what areas? By when? Are there opportunities for pivots? Does an effort require infrastructural or platform investment?
Understand what metrics serve the growth of the team. Teams face different life cycles. This is the main area where I coach and mentor the most as each phase in a team’s growth requires different types of nurture. Do they need more structure? Less structure? Are they improving or being micromanaged too much? Do they require training in software design space they work in the most? Do they understand the business?
Pick a few metrics to serve these questions and tie them together on a long timeframe, ideally 2 quarters to a year. Then track their outcomes:
When did you start
When did you finish
How did the team grow from this?
What impact did it deliver or enable to the business?
Once you clarify your target priorities you can use a tool like typoapp.io to measure selected metrics that serve you best from the plethora of research papers below. Now let’s explore where are all these metrics come from and what they are.
Remember, any metrics can and will be gamed. So they have to be continually adopted to foster personal and business growth rather than being used as a mechanism to PIP or isolate individual poor performers. More on this in part two which comes out in two weeks.
State of the art productivity metrics
These metrics are a produced from a combination of automated tracking, surveys and static analysis from your codebase(s). Imperfect on their own, they are helpful to show trends of improvement or decline in overlapping areas:
Do your bugfixing SLAs impact your quality?
Does your code review time impact your time to launch?
How does your code coverage mandate impact job satisfaction?
And more…
Google—Project Aristotle (re:Work)
Project Aristotle was and still is one of the largest tech-oriented studies that focused on psychological safety and its impact on performance in the work place. It established evidence on the commonly held belief that a team of average individuals can outperform a team of highly skilled ones. The key ingredient is to focus on their ability to work as a team. By taking interpersonal risk safely and nurturing an environment where making mistakes publicly is encouraged, the team prioritises learning. This brings a positive impact on their performance—both in measurement and satisfaction.
Summarised, project Aristotle promotes OKRs and surveys focusing on the following hierarchy:
Psychological Safety. Refers to an individual’s perception of the consequences of taking an interpersonal risk.
Dependability. On dependable teams, members reliably complete quality work on time (vs the opposite - shirking responsibilities).
Structure and clarity. An individual’s understanding of job expectations, the process for fulfilling these expectations, and the consequences of one’s performance are important for team effectiveness.
Meaning. Finding a sense of purpose in either the work itself or the output is important for team effectiveness.
Impact. The results of one’s work, the subjective judgement that your work is making a difference, is important for teams.
These surveys expose a lack of teamwork, the deadly silence in meetings and any ill-reported disguises that may hinder your other metrics. I wrote in detail on psychological safety during our Book Club month reading The Fearless Organization.
P.S.: Preceding this project was Project Oxygen, which researched what makes a great manager.
G. Kim, J. Humble, Dr. N. Forsgren—DORA metrics
The most widely known acronym in software engineering and DevOps, DORA aims to set a standard for the tactical process involved with continuous delivery—or lack there-of. Much like lean methodologies, this is a self-governed approach to measuring to expose problem areas and declining trends, rather than having a percentage score where you can aim for 100%.
The DORA acronym stands for DevOps Research and Assessment. It a large range of possible metrics, but highlights these four key areas as a starting point:
Deployment Frequency (DF). How often your successful release to production.
Mean Lead Time for Changes (MLT). The aggregate mean time that passes between commit and deployment for an individual change. NOTE: there is a common misunderstanding here. This measure intentionally doesn’t track the time it takes to implement the change, merely focusing on how long it takes to ship it.
Change Failure Rate (CFR). How often releases cause immediately detectable outages. A failure is any release that required an immediate hotfix, rollback, patching or caused an outage and disruption with downtime.
Mean Time to Recovery (MTTR). Combines the previous metrics of MLT and CFR: it tracks the mean time it took to fix the failure (e.g: detect and deploy a hotfix).
All four are systems-metrics, they track your process and gives no result on individual performance. In fact, it is counter-productive to use these metrics on an individual level because of the gaming effect it causes when applied to reward/punishment.
An automated, all-in-one platform that combined your code base with your issue tracker and CI/CD pipeline is ideal for this kind of analysis. They highlight human bias in underestimating how fast and expensive your feedback-loop process actually is. Once you gain clarity on whether it needs improvement, the four key metrics will hint at areas of opportunity for nurture, incentives or investment.
Github, Microsoft, et al—SPACE metrics
SPACE metrics are the newest of the bunch, the research being released in 2021. SPACE metrics aim to build on the prior material and expand on subjective surveys to incorporate individual-level feedback and nurture mental and cognitive wellbeing. The acronym stands for the main overarching categories it measures:
Satisfaction and well-being. How team members’ happiness and individual self-expression matches with the overall goals of the team. The research shows that a greater sense of belonging contributes towards higher, long-term satisfaction.
Performance. How they perceive themselves within the team’s system. Do they feel productive? Are they growing? What’s holding them back? How does their work affect other departments and customer success? Performance aims to survey and quantify the rate of change of outcomes.
Activity. An overall measure of output. Think of these as generic team KPIs: number of bugs, support tickets, sprint velocity, magical story points. They don’t mean anything on their own. Instead, they establish a baseline of change for the other metrics.
Communication, collaboration. How easy it is to and how effective the team is at collaboration, integration, networking, onboarding and brainstorming together.
Efficiency and flow. Measure the signal-to-noise ratio between what the team is set out to do vs. what they can actually get done. This can expose lack of focus time, too much or too little teamwork, having processes that create more synthetic tasks by hand-offs and hand-backs.
None of these provide a “dial it to eleven” target. SPACE metrics help you with the original intention for having metrics that we mentioned in the first chapter: making trade offs. Balancing decisions with as much logic and data as possible. These decisions improve your profitable business outcomes and developer experience (DevEx):
A. Tornhill, M. Borg—Code RED
The Code RED study researches the impact of code quality on business outcomes. Rather than beating the dead horse on technical debt, it reframes the conversation to code health instead.
The study finds that more time, effort and money is spent on areas of 39 proprietary production codebases that are poor in code health metrics. These code health metrics include:
Cycle Time. Measuring time-in-development of a ticket or feature.
Lead Time. Absolute time on the calendar from request to deployment. Cycle time and lead time are common agile metrics found in Kanban, Scrum and Lean.
Code Smells. Code smells signal a poorly designed code file that is difficult to read and change.
Unnecessary Coupling. Modules and lines of code that change together all the time, that are hard to change or have many connections to other parts of the system pose a high risk to maintainability.
Duplicated Code. Unnecessary duplication lowers the capital efficiency of the existing codebase to support changing, removing or adding new features. This is the most common layman’s representation of technical debt.
Code HotSpots with no maintainer. There may be lines of code or entire folders of files whose original authors have left the company. A team with such hotspots has to constantly balance unknowns represented by reacquiring working knowledge of their system.
In summary, choose the metrics your team needs right now for their next growth opportunity. Adjust them as you go. Remove them once they start being gamed.