Lead by Example
Among the best Testing Leaders and a dear friend, I find myself following Nik Sumeiko’s advice:
“Start. Lead by example.”
I learned testing theory from books. Kent Beck’s book on TDD, online tutorials, later videos.
But a started writing tests only a few years later. The missing ingredient was an important piece of software so mission-critical that you couldn’t afford to make mistakes.
I was lucky enough to be on the right team at the right at the right time.
He’s hosting an Audio Event today.
Monitor Code Health Metrics
Please don’t obsess about code coverage. A well-targeted test that you trust can save you from an important production outage. On coverage metrics it will show up as 0.1%.
This is not a good metric to inspire your team with.
Instead, focus on Code Health with tools like CodeScene, SonarQube, Qodana and others. Integrate it into your CI/CD pipeline and establish clear rules for your team on how and when to follow recommendations.
Specifically, when does not following the recommendations mandate a delay in release.
We’re not sponsored yet, by the way!
Make your Operations Transparent
Often the team is busy fighting fires. That leaves little time to improve on code health.
Rest assured that the team is always doing their best to deal with whatever is in front of them.
I often see that in most teams that want a tangible improvements to their release cycles, the problem isn’t that they don’t have tests.
The problem they may be facing is that the code and features simply aren’t in front of them enough because they are dealing with operational problems.
Operational problems may present themselves as:
Lots of paperwork
Manual Analytics
Support
High intensity of customer care detail (this is a healthy problem to have!)
Simply put, some of these aren’t transparent and the team —along with the organisation— underestimates how much operational capacity they are actively devoting to new features.
Is there a Problem?
There is a mental and effort cost to writing tests. This is very apparent when under pressure. Under a mentally taxing situation the team members may willingly regress in their operational quality on everything that isn’t strictly mandates or machine-enforced.
In good faith, mind you!
Anything to deal with the problem.
Is there a Recurring Problem?
Sometimes the same problem keeps re-appearing.
The tricky part is noticing that it’s the same one. In software organisation there is just enough noise to produce seemingly unique errors.
Requests and email threads that seem unrelated can originate from the same underlying issue.
It takes the willing collaboration of many departments in combination with a very mature monitoring and telemetry culture to spot this accurately and early.
What is the Root Cause of the Problem?
There’s 3 steps to any good root cause analysis.
Find the symptom
Find the source that’s causing it
Analyse what human assumptions allowed the source to be created
Coaching Tech Leads across many companies I realised this last step is often elusive to them. A common problem is something like this:
Symptom: Slow releases due to bugs occurring during launch
Problem: Lack of tests, we were in a hurry
Root cause: The team assumed it would be faster to launch without tests first and that they’ll have time to do it later
Are you documenting your Decisions?
Once you find the area you want to improve on - it’s important to write them down so you’ll have a record of who, when and why a decision was made. This helps you find key people months after implementation and pinpoint assumptions that were made during incidents or future analysis.
Are Decisions made outside of the Engineering Team?
Most teams that I work with follow some amalgamation of scrum and waterfall. In these environments it’s common for the engineering team not to be fully autonomous.
This can manifest in many different ways:
Being pressure to forego certain practices (like pair programming, TDD or testing)
Unfair negotiation strategies leading to under-representative decisions and estimates
Lack of information sharing causing bad assumptions to be made
Lack of accountability caused by gatekeeping prevents the team to move faster or improve on their own
This is a hard situation to address as it requires organisational buy-in to alleviate a bad or even toxic environment. Sometimes the improvement process can be localised to a team but long-term effort require support and a plan for success along the entire company hierarchy.
A list of 100 Case Studies by Companies you know
This post is long enough already without another long list, so I’ll just leave this here: