Microservices Data Sharing /w CDC

Nisith Dash
7 min readAug 5, 2022

Today, Microservices Architecture (MSA) is the most ubiquitous choice for building highly scalable, available and low latency software systems. Such architectures have 2 key characteristics -

  • Microservices, that are modeled around a particular business domain and delivers a very specific business capabilities.
  • Microservices, that own their own domain specific data and are reactive i.e interact with other services and/or the environment only via change events.

“The core architectural tenant is loose coupling. Microservices must have sole ownership of their associated business domain data.”

However, ensuring complete data autonomy is a difficult technical problem to solve and there are many tradeoffs to consider.

In this blog, I’ll focus on the data consistency issues and discuss a simple yet effective strategy that leverages on the core idea of Change Data Capture (CDC) and a Transactional Outbox pattern to address them.

MSA Design Tradeoffs

The primary business drivers behind decomposing any traditional monolithic system into a bunch of decoupled Microservices is to achieve architectural agility and speed to market. But, there are some obvious tradeoffs -

  • Decomposition of services leads to lots of moving parts which in turn causes operational complexity and management overhead.
  • Partitioning application functionality needs extremely careful consideration for constraints and consistency requirements. Otherwise, we end up with eventual consistency everywhere!
  • Decomposition of services creates nondeterministic systems. This has the following major implications -

Unexpected failures for inexplicable reasons. Failure modes increase exponentially and cascade quickly making it harder to track down the origin.

☞ Nondeterministic global application state. The state of the world at any particular instance of time has to be stitched together from data scattered across multiple databases and the change events propagating through the system.

Eventually consistent state is like throwing an availability roll of dice. A lot of different systems need to be up and running and need to participate for data changes to eventually propagate.

Ownership & Sharing

In principle, each Microservices should own its data and keep it private. However, in large distributed systems with thousands of services, it is not feasible to provision a separate database for each service. There are other low overhead mechanisms to achieve data decoupling like provisioning a database schema-per-service or using private tables-per-service etc. The general idea is to impose some barriers using the DB capabilities at our disposal to enforce strict data encapsulation and prevent any shared data access.

In reality, Microservices need to share data all the time such as needed to carry out any end-to-end business transactions or invoking a cross-cutting capability, repeatedly. Like e.g., User Profile may be needed by multiple services repeatedly to pre-validate authorization and site preferences of a logged-in user.

Generally, Microservices share their data with the outside world using the following techniques -

  • Export an immutable/read-only view of their data via public APIs.
  • Embed business domain data in change events messages and broadcast it via a message broker.

The obvious tradeoff in either of these approaches is maintaining strict data consistency. Downstream Microservices should be able to consume and utilize data that may be eventually consistent at any point in time.

Database-per-service Pattern

The pattern is self-explanatory and I’ll list some of the advantages over monolithic databases.

  • Sharing databases causes tight design time and run time coupling limiting our ability to independently own, develop, test and deploy services.
  • Any schema changes need to be coordinated across services and backward compatibility needs to be maintained at all times. Business logic to maintain data consistency needs to be duplicated and must be changed in lockstep across multiple services.
  • Microservices can have different data storage requirements optimized for specific use cases like analytics, search, recommendations etc.
  • Services can have different compliance, security and scaling expectations from their databases and “one size may not fit all”.

However, there is no free lunch! Again, there are significant tradeoffs to consider with provisioning a database per service.

  • Business transactions that span multiple Microservices are complex to implement. There are solutions like implementing sagas with compensating transactions which are non-trivial to implement.
  • Queries that rely on joining data from different databases are no longer possible.
  • Distributed global transactions are slow, complex and degrades system availability. Keeping any two resources in sync in the absence of a global transaction manager (XA/2PC) is impossible.
  • Maintaining data consistency and integrity of business invariants at all times is quite challenging. There are numerous edge scenarios when data can go out of sync without even being detected immediately.

MSA Data Sharing Problems — Dual Writes

Microservices routinely have to atomically update their own state and simultaneously publish a message about their state change to other collaborating services. This concept of dual writes is a very common pattern in any Event-Driven MSA. But it can introduce subtle data consistency issues.

Suppose an Order service receives a new customer order and stores it in the Order DB. Simultaneously, it needs to publish an ORDER-CREATION event about this newly created Order in a message broker. This is to notify downstream services like e.g., Fulfilment service to start processing this order. But what happens if the first operation succeeds and the second one fails? We cannot rollback the DB changes since they have already been committed by a local transaction. But the ORDER-CREATION event never gets published and the order is stuck and cannot be fulfilled.

Dual writes can also introduce subtle race conditions that creates inconsistent data. Consider the scenario where two services need to update the same data item in two data stores with different values. Based on how these change events are ordered, it is possible for same data item to end up with different values at the end of the transaction.

The ordering of events matters!

MSA Data Sharing Solutions

There are two main approaches to share data among distributed reactive Microservices and address the Dual Writes issue simultaneously— Event Sourcing w/ CQRS and Change Data Capture (CDC) w/ Outbox table. I’ll focus on the latter approach.

MSA Data IntegrationCDC /w Outbox Table.

Here is outline of an event-driven solution for providing atomic dual-writes.

“Write to one resource with strongly consistent semantics and use it in turn will drive the update of the second resource with eventually consistent semantics.”

Event Driven Integration:

Here is an outline of the event-driven solution to solve dual writes. “Write to one resource with strongly consistent semantics and use it in turn will drive the update of the second resource with eventually consistent semantics.”

  • Create a new transactional Outbox table serving as repository of change events in the same DB schema as source tables.
  • Microservice updates the source table and create a change event in the Outbox table in one single ACID transaction.
  • Perform a log-based CDC capture from this Outbox table using Debezium, create change events and insert them into Kafka topic.
  • Other Microservices will then connect to Kafka topic and consume the events for downstream processing.

The diagram below provides a good visual explanation.

Data Replication:

Data Replication is another extremely popular approach for Microservices data integration. This is relevant when a particular service provides some universal cross-cutting data like e.g., user preferences which is frequently used yet rarely changes. This user preference service needs to be called by several other services repeatedly upon each invocation. To avoid timeouts and errors with these repeated network calls, a read-only subset of the user preference data can be replicated over to each service that needs it.

Again, CDC presents an elegant solution. The diagram below provides a good visual explanation.

Conclusion:

CDC /w Outbox pattern is an effective approach to share data among several Microservices via the asynchronous propagation of Change Events. It avoids any potential inconsistencies encountered while simultaneously modifying multiple resources which don’t share one common transactional context. It also depends on a highly reliable and scalable messaging backbone like Kafka to propagate changes in near real time and at scale.

References & Further Reading:

Gunnar Morling & Hans-Peter Grahsl — Change Data Streaming Patterns in Distributed Systems.

Implementing the Outbox Pattern with CDC using Debezium — Thorben Janssen : https://thorben-janssen.com/outbox-pattern-with-cdc-and-debezium/

--

--