Microservice Distributed Transactions 101: Guide to Choose the Best Strategy

The Java Trail
8 min readJan 27, 2024

--

Your application relies on multiple services, and maintaining data consistency across distributed transactions is crucial. How would you design and implement a distributed transaction management system to ensure atomicity and isolation?

In a microservices architecture, one of the challenges is managing distributed transactions.

Traditional monolithic applications often use a single, centralized database and can rely on ACID transactions (Atomicity, Consistency, Isolation, Durability) to ensure data consistency.

However, in a microservices environment where each service may have its own database, to complete a whole task (from order create to complete), multiple server needs to complete their respective transactions (deduce money in payment service, deduce quantity in inventory service, then complete order, sends to shipment in shipping service). If any of the service fails meanwhile (failed to deduce quantity in inventory-service) then the previous transactions (order creation, deduce money in payment service) are meaningless (cannot check/update quantity in inventory service but already deducted money from wallet in payment service, order created).

Scenario: Distributed Transaction in E-Commerce

Imagine an e-commerce platform where users can purchase products from various sellers. The system is composed of microservices, including Product Catalog Service, Cart Service, Payment Service, and Order Service.

Distributed Transactions in Order Processing:

When a user attempts to place an order for a product, a sequence of actions involving multiple microservices is initiated:

  1. The Product Catalog Service checks product availability.
  2. The Cart Service adds the product to the user’s cart.
  3. The Payment Service processes the payment.
  4. The Order Service creates an order and updates inventory.

Problems with Distributed Transactions:

Atomicity Problem (two-phase commit/SAGA):

  • If the Product Catalog Service confirms product availability > the Cart Service adds the product to the cart, but the Payment Service fails to process the payment, inconsistency arises where the user’s cart contains items not yet paid for.
  • Conversely, if the Payment Service deducts the amount from the user’s account but encounters an error before creating the order in the Order Service, the user is charged without receiving their purchase, violating atomicity.

**Solution: Implement a two-phase commit protocol/SAGA pattern where the Cart Service and Payment Service coordinate to ensure that either both operations (updating inventory and processing payment) are completed successfully, or none at all. If the payment fails, rollback the inventory changes to maintain consistency.

Consistency Problem (Optimistic Concurrency Control):

  • Inconsistencies may occur if the Product Catalog Service confirms product availability, before the Cart Service can add the product to the cart, the availability changes due to another concurrent transaction.
  • if the Payment Service deducts the amount from the user’s account, but the order creation fails due to an error in the Order Service, the user’s account may be charged without a corresponding order.

**Solution: Utilize optimistic concurrency control mechanisms such as versioning or timestamps to detect conflicts between concurrent updates to inventory. Implement logic to handle such conflicts gracefully, ensuring that only one transaction successfully updates the inventory while the other is notified of the conflict and retries the operation.

Concurrent Requests Problem (Pessimistic Lock/Rate limiter):

During a flash sale event, a surge of users attempts to purchase discounted items simultaneously, overwhelming the system and causing performance degradation.

**Solution: Employ locking mechanisms to serialize access to critical resources, such as product inventory or user carts. Pessimistic locking can prevent conflicts by locking resources during transaction execution.

**Implement throttling or rate limiting mechanisms to control the rate of incoming requests and prevent the system from being overloaded. Throttle requests based on factors such as user session, IP address, or request type to ensure fair access and maintain system stability during peak traffic.

Isolation (Database Isolation Level: Serializable):

A user places an order for a product, and while the order is being processed, another user attempts to modify the same product’s details, leading to inconsistent data.

**Solution: Use transaction isolation levels to control the visibility of data modifications between concurrent transactions. Apply an appropriate isolation level (e.g., Serializable) to ensure that transactions are executed in a manner that prevents interference from concurrent operations, maintaining data integrity and consistency.

Approaches to Ensure Data Consistency Across Microservices:

Approach 1: Synchronous Communication: (Strong Consistency)

When one microservice updates its data, it waits for a response from the dependent microservices before completing the operation.

When a user places an order, the Order Processing microservice updates its database and then communicates with the Inventory Management microservice to deduct the item quantity. The Order Processing microservice waits for a successful response from the Inventory Management microservice before confirming the order.

Approach 2: Two-Phase Commit (2PC): (Strong Consistency, loose coupling compared to synchronized approach)

  • Phase 1: A central coordinator manages the transaction. It sends a prepare message to all participants, waits for their acknowledgment, and then sends a commit or rollback message accordingly.
  • Phase 2: If all participant services agree, the coordinator instructs them to commit the changes atomically. If any participant fails, the coordinator instructs all others to rollback.

(+) Guarantees atomicity

(-) Single point of failure (coordinator): If the coordinator fails, the entire transaction needs to be restarted. (-)Blocking Issue: While 2PC provides atomicity, it may suffer from blocking issues and isn’t suitable for all scenarios due to its blocking nature.

Approach 3: Saga Pattern

The Saga Pattern decomposes a long-lived transaction into a sequence of smaller, loosely coupled transactions. Each step in the sequence represents a microservice operation, and compensating transactions handle rollback in case of failures.

In this scenario, a reservation saga could consist of steps such as checking seat availability, reserving the seat, and processing payment. If any step fails, compensating transactions are executed to undo the changes made by preceding steps.

Saga Pattern with Orchestrator:

A central orchestrator manages the sequence of transactions. Orchestrator sends commands to each service to perform its part of the transaction. If a service fails, the orchestrator sends compensating commands to undo the completed steps.

(+) Centralized Control: Orchestrator manages the flow and state centrally. (+) Compensation: Handles failures gracefully through compensating transactions.

(-) Single Point of Failure: The orchestrator can become a bottleneck or a single point of failure. (-) Complex: As the number of services grows, orchestrating them becomes more complex. (-)Increased tight Coupling: Can introduce tight coupling between services and the orchestrator.

Saga Pattern — Choreography:

In choreography-based Saga, each service involved communicates directly with others through events using a message broker , avoiding a central coordinator.

  • Each service publishes events to broker based on its state changes.
  • Other services subscribe to relevant events and react accordingly.

(+) Decentralization: No single point of control, promotes loose coupling between services, making the system more resilient. (+) Scalability: Easier to scale individual services independently.

(-) Complexity: Requires careful design and coordination for distributed communication.(-) Consistency: Ensuring consistency might require additional effort due to the decentralized nature.

Approach 4: Transactional Outbox Pattern:

Why Choosing this over SAGA Pattern: While applying the Saga pattern, you will have two operations at each step. The local ACID transaction for business logic, and the event publishing. These two operations (Create order in local DB of order-service & publish event to Apache Kafka to process-payment) cannot be in the same single unit of work as they target separate data sources in different microservices. One is the local database, and the other is the event store.

To perform these operations (Saving order-service database & publishing event) consistently, you can apply the Outbox pattern. The Outbox pattern relies on having a local outbox table to hold events in the same database where you run the local transactions for business logic.

Then you can use these two database tables in the same transaction to perform local ACID transaction for business logic create order and event publishing. You can then read the events from the outbox table and publish them asynchronously.

  • Each microservice writes its outgoing messages (events or commands) to a local outbox table within the same database transaction that updates its database state.
  • A separate process (e.g., message dispatcher) reads these messages from the outbox tables and sends them to message brokers or other microservices. If the transaction fails, the outgoing messages are not dispatched, ensuring atomicity.

Choosing the right Approach: Use Cases

Synchronous Communication: when immediate and strong consistency is a strict requirement. In an online banking scenario, where immediate consistency is must for financial transaction, synchronous communication with a blocking mechanism is preferred. Two-Phase Commit (2PC) not useful, it could introduce additional latency and potential blocking

Asynchronous Communication with Event Sourcing + CQRS: Suitable for scenarios where eventual consistency is acceptable, and high throughput, different read/write scaling and low-latency are priorities.

In an e-commerce system, when a customer places an order, the order processing microservice updates its local database immediately. However, inventory management, which can tolerate eventual consistency, subscribes to events related to order placement, updating its inventory over time.

Two-Phase Commit (2PC): Appropriate for situations requiring strong consistency and where blocking is acceptable.

When a passenger reserves a seat on a flight, immediate and strong consistency is required. The reservation system initiates a two-phase commit to ensure that the seat is reserved across multiple services (seat availability, passenger details, payment processing) atomically.

Saga Pattern with Orchestrator: Ideal for complex workflows, multi-step transactions with compensating logic, and when centralized control is necessary.

When a customer places an order, a saga with an orchestrator manages the flow. The orchestrator sends commands to the payment, shipping, and inventory services. If any step fails, compensating transactions are executed by the orchestrator to undo completed steps,

Saga Pattern — Choreography: Suitable for highly distributed systems, independent service scaling is a priority, and loose coupling is crucial.

In a highly distributed and scalable social media platform, user actions like posting, commenting, and liking trigger events. Each service (post service, comment service, like service) subscribes to relevant events and updates its local data accordingly.

The choice of approach depends on factors such as the nature of your application, performance requirements, and the level of consistency required. No one-size-fits-all solution exists, and each approach has trade-offs. It’s essential to carefully analyze the requirements of your system and choose the approach that best fits your specific use case.

--

--

The Java Trail

Scalable Distributed System, Backend Performance Optimization, Java Enthusiast. (mazumder.dip.auvi@gmail.com Or, +8801741240520)