Design APIs around data, not databases

Coupling, database refactoring, event modeling

Sep 05, 2023

Before we start today’s topic, I want to extend my huge thanks to you on helping Crafting Tech Teams reach 1000 subscribers. This means a lot to me and your support encourages me to write amazing content.

I’m running a survey currently on TDD adoption. Would you mind filling it out?

To the survey 👉

white and yellow light digital wallpaper — Photo by WrongTog on Unsplash

Follow your value streams, model your data streams—Advice for managers

Requirements gathering and human relationships are the hardest part of software development. It’s far too easy to get distracted in meetings about features and deadlines and avoid having the really important conversations. The ones that contain the important detail.

Don’t repeat the mistake many managers do: mistaking detail for technicalities. Methodologies like Domain-driven design and Event modeling exist to deal with this bias heads-on. Once you remove technical, software and engineering jargon from the surface level discussion you will find an entire swarm of business level detail. The type of detail that was lurking to be discovered and an engineer or architect cannot distill on their own: the business domain.

The symptoms of neglecting these conversations follows simple metrics:

Engineers not knowing how to name things
The team doesn’t know how to test, testing plans have to be re-discovered constantly
Confusion on “how should X work??”
Low quality database modeling, resulting in low quality database-to-API coupling (today’s topic)

To reiterate: this isn’t about planning the work. Converse about planning to discovery the language of the product and features you are building.

I’ve been looking into the work of

Yves Goeleven

recently. He has a very succinct way of combining the details of value streams and data. We’ll be looking into some of his examples below. Crafting Tech Teams regulars will notice his use of the Event Modeling colours in his diagrams. Notably this one to capture business value from a data stream:

Data on the inside vs data on the outside — From his blog: DATA ON THE INSIDE, DATA ON THE OUTSIDE

Avid event modeling practitioners will notice that this is a standard unit of a vertical slice. It is the link in the long chain of your business processes. A value stream is every capability you need in place to deliver a service to your customers and further: capture revenue from their gratitude. This can go as far as having good relations with your hosting provider’s account manager to the manual processes and agreement that you have setup with your business’s accounting firm.

“The output of one capability will become the input of another, and so on, until all value is delivered.”—Yves Goeleven

The reason we model the value stream first is to find gaps in our understanding:

Where is the chain incomplete?
Where does it branch?
Where are the time-sensitive, temporal links?

Only when you understand all this can you practically model the software side of the system. And it will be much easier to do so.

To refresh your event modeling + DDD knowledge, I invite you to learn from my summaries:

Plan your Releases using Event Modeling

Denis Čahuk

June 7, 2023

Read full story

Migrations and breaking changes are part of your data stream

First principles thinking is very powerful. It allows us to take in the situation as is and ask questions that challenge an artificial status quo that no longer serves us. This change of status quo is almost always monotonically present in the form of breaking changes and migrations in software.

You had the truth. But it was created by the legacy system. Now the system got updated, so the old data is considered old. Legacy. Pre-update. Data-adapters and re-writing history in the form of one-time operations often clutter your codebase in an attempt to hold a singular source of truth that is compatible with today’s codebase.

But the real world doesn’t work like that. The real world is in constant flux and asynchronous. This causes a concept of bi-temporality where events in the past can be processed in a different order than they occurred, “Was the address change an actual reroute or a typo fix? Did it happen when we logged it or was the date when we learned about it?”

Most applications, especially large data products, have no other choice than to accept a before and after state. What makes matters worse is when the owning side of an API that is tightly coupled to a database changes independently of that database while the DB is used in downstream data processing bypassing the service layer.

In Yves’ terminology, this is data on the inside sharing data to the outside. A common issue in separation of concerns and encapsulation. In this case in the scope of systems design and context mapping (my context mapping stream showcase).

Summary

Migration scripts are first-class citizens in your data operations
Data projectors know about data and code breaks (the thing that creates tables or joined outputs)
Time-based columns follow the main change event (ie. active_since <date> instead of is_active = true)

Event Modeling & Refactoring

I dipped my toe into a hornet’s nest of polarising opinions this week on the topic of database refactoring:

As you can imagine the comments were quick to take sides and polarise in one of two camps:

The database schema is static. The API serves it, fully coupled (up/down).

DB schema is just a (temporary) storage tool. It serves the API, coupled downstream.

There’s a ton of value in these perspective, though I could have phrased the original post better. What stands out to me is the lack of deliberate tooling and change-management principles that deal with changing what is hard to change: the deep, low-quality coupling between a database and one or several APIs.

This begs the question: Schema of what?

In relational databases you don’t have a lot of choice: your schema is your schema. Old data breaks will either have to be handled with a sparse (nullable) widening of the table or separate relations. RDBMS coupling at best is wide in scope and large in the amount of time history it covers.

However, in an event sourced architecture the atomic elements of the schema are much looser. We still have a schema, but it’s much more fragmented and relies on functional conversions and aggregations in the form of projections to create a API-facing derivative. This allows the projection to be updated much more freely while keeping the schema of event data immutable forever. New events mark themselves as a newer version and co-exist with the old—as long as the projectors know how to account for this. Event sourcing coupling at worst is narrow in scope and composable due to the immutable atomic points.

Here is an illustration in event model colors from Yves’ blog post on CQRS:

Server Side Command Handling — Yves Goeleven - Command handling in CQRS architecture

Hope that inspired you to think differently about your database schema. Let me know what you took from this. Do you disagree? Is it missing nuance?