Context
In case you haven’t watched 🏔 Part 1 or the First TDD February video, here’s a bit of context. I opened up the thought challenge of deliberate practice for test qualities and TDD. What qualities are good for the business? What do these qualities look like? Which ones does TDD naturally emphasise? What are the trade-offs?
This is part 2 of the recap for the second episode where two of our audience members mustered the courage to join the challenge on stage.
The highlights below are a humble attempt at deriving new insights from their work and code, and bears no ill will or judgement that may be implied by comparison. At least, I’ll do my best. These matters are often hard.
—Denis
Testing preferences
As you watch the recap, be mindful of the mental gymnastics each of us brings:
From what assumptions are we making our plans?
Along what concepts are we abstracting?
What trade-offs did we make?
What trade-offs did we make accidentally?
What mistakes did we make that we noticed and would’ve liked to rework?
Marc & Denis pairing
In wanting to deliberately practice writing a testing plan we took an approach that is reminiscent of domain driven design storming. We were looking to define the terminology used to define certain patterns within the exercise.
Notice the back-and-forth between mechanical, programming concepts and the domain of poems, structures.
Here are the key qualities that stand out in hindsight, looking at the code:
deterministic—Marc was adamant on making sure the core linguistic rules were reliable
isolated—we had a bottom-up approach with Marc. Atomic pieces, then piece it together. This sadly also put off the final lyrics construction to the end
inspiring—Typescript wasn’t the pair’s primary language so it was important that each step inspired further exploration and confidence, even if the tests were incomplete
Jeff
I have to commend Jeff. Jeff starting coding along by himself during the first stream and took the initiative to share it with us. I invited him to this show in part 2 for this reason, to give his contribution a fair stage. His example is in Java, so it will naturally be harder to read compared to the rest/s I’m joking!
Jeff’s approach followed TDD style following the ZOMBIES principle, the ZOM parts being ZERO-ONE-MANY. This helped Jeff destructure (but not decompose) the problem into a 0..99 cases. In the case of the 99 bottles kata, this goes as follows:
Zero: the last line, static (no logic)
One: the first line before the last (require an if branch)
Many: split logic to handle decrements and looping, etc.
This kind of structure takes a more end-to-end approach to the testing, highlighting the following qualities:
predictive. The tests have been integrated with the end-result UI in mind, ie. printed lyrics
writeable. Having the E2E test in front of you helps scope the cost of writing each test as you have a stringified safety net to rely on. This is what you would call outside-in acceptance testing prior to a TDD loop.
specific. While the tests don’t fail for the lyrical reasons, they do fail with in the sense of “Line 99 is wrong” or “Line 2 is wrong”.
Denis solo (off-screen)
Taking the spur of inspiration seriously in between episodes, I decided to create a history of commits that focuses on behavior and composability. Behavior stood out in how the lyrical structure was expressed in plain english in the tests.
Composability to fully test cover the problem without relying on an end-to-end test with the entire lyrics. This kind of approach emphasises structure independent unit testing at the cost of lower predictive confidence (proof of integration happens late).
BONUS: Adrian w/ ChatGPT
As a joke, we figured if the four of us can spend hours on this kata, what does a quick solution from ChatGPT come up with? Surely it would be easy to outperform us. As expected the result was confident and visually feature-complete. Here’s our remarks after spending some time going through Adrian’s ChatGPT prompted code.
The first iteration was naïve. We didn’t prompt it in any way in regards to the kata or test qualities. It produced seemingly halucinatory outputs, displaying behavior as if it wasn’t sure which code is the test and which the implementation. It created two implementations and compared them. That’s interesting! But not very useful.
It saw all solutions. The 99 bottles kata is prominently indexable on the web. So one may think—surely the LLM has seen all the solutions for this Kata on github and elsewhere. This is the best it could come up with? In hindsight that seemed awfully underwhelming.
Practice vs business. As the kata is extremely simple (ie. can be solved by returning a string), the only business-like problem is the act of deliberate practice by developers. ChatGPT would require precise prompting to take it in this spirit without going to solutions that may seem trivial or counter to the intention of the kata.
Share this post