Rahul Jayaraman explorations

Building state from unordered events

NOTE: I had written this article while at Gojek to share with my team members. I haven’t had time to generalize the article sufficiently for an external audience. Apologies in advance.

For one of our features, events had to be ordered to build some state. But our infra didn’t guarantee order and the event producer did not reliably encode order into the event. While trying to solve this problem, we realized that sometimes, it’s possible to order events before they occur.

This article tries to build the intuition that state machines only protect your data against invalid state transitions. They don’t protect your data from becoming inconsistent, if your events are out of order. We explore how one can design an eventually consistent system, with out of order events by removing cycles from your state machine (if feasible for business).

Before going further, here’s some background about our system.

Background

Different teams build different services to offer promotions to their customers. The Pay team might have services to offer cashbacks, while the Food team might have different services to offer SKU, Cart and Delivery discounts to customers, and the Ads team might have services to show ads to customers.

Merchant Promotion Service (MPS) exposes these customer facing promotion services (CPS) to merchants, allowing them to create promotions, and keep track of them.

One aspect merchants keep track of is the status of a promotion ie. if a promotion is activated, completed or deleted. These statuses are managed by CPS, while MPS simply lets merchants keep track of them.

To keep promotion status in MPS synchronized with CPS, CPS publishes status changes and MPS consumes them. eg: When the food sku discounting service changes status of a promotion to activated, they publish the new state against the promotion ID. MPS consumes these changes and keeps it’s state syncrhonized.

Need for order

If MPS doesn’t consume events in the correct order, promotions might end up in the wrong state.

Consider the state machine below. It’s a simplified version of a state machine that CPS implements.

Interactive version

If MPS simply applies the events it consumes, it might end up in the wrong state.

Correct order: CREATED -> <<activate>> -> ACTIVATED -> <<delete>> -> DELETED
Out of order: CREATED -> <<delete>> -> DELETED -> <<activate>> -> ACTIVATED

Implementing the same state machine at MPS to avoid invalid transitions doesn’t help either

Correct order: CREATED -> <<activate>> -> ACTIVATED -> <<complete>> -> COMPLETED
Out of order: CREATED -> <<complete>> -> CREATED -> <<activate>> -> ACTIVATED (complete event is ignored)

<<complete>> event is ignored as there's no transition from `CREATED -> COMPLETED`.

Different ordering of events lead to different states.

NOTE: One way to ensure events are ordered is to rely on systems which preserve order. This might not always be feasible, as the cost of setting up such systems might be high.

Building order

In the absense of systems which preserve event order, MPS needs a way to look at any two CPS events related to a single promotion, and figure out which one “happened before” the other. Using this “happened before” relationship we can come up with an order for events related to a promotion. Order of events between two different promotions is not relevant, and any arbitrary order may be chosen.

For the sake of this discussion, we assume that CPS generates the events and knows the correct order. Let’s look at a way in which CPS may communicate this order to MPS.

Consider two events e1 and e2 for a single promotion. CPS may assign them versions v1 and v2 respectively, such that, if e1 happened before e2, then v1 < v2.

We could rely on the producer (CPS in this case) to encode these versions into events.

<<{1, complete}, {0, activate>>
can be re-ordered to
<<{0, activate}, {1, complete}>>
since 0 < 1

Further reading: Time, Clocks and the Ordering of Events in a Distributed System The paper introduces the concept of logical clocks, and gives us an algorithm, which can be used to build a total order of events.

Problem with time-of-day clocks

If we use time-of-day clocks to order events, the order might not respect the “happened before” relation. In the above case, if v1 and v2 are generated by system clocks, there’s no guarantee that v1 < v2. Clocks might end up giving same values for e1 & e2. Clocks might even jump back (eg: during NTP updates) after assigning e1 a value and give e2 a lower value, making v1 > v2

Unless special care is taken to sync time-of-day clocks across processes, which preserve “happened before” (like in Google’s spanner), relying on them to order events might cause problems.

Further reading: How and why the leap second affected Cloudflare DNS Trouble with timestamps

Ordering events when source doesn’t specify order

In our case, some CPS don’t encode versions into events. But sometimes, it might be possible to totally order events even before they occur.

For promotions, we know that.

activate always happens before complete (activate < complete)
activate always happens before delete (activate < delete)

We could use the above relations to come up with either of the following total orders.

activate < complete < delete
activate < delete < complete 

Above, we may pick any order between delete and complete as they will never appear together in a valid sequence of events. (look at the CPS state machine)

This allows us to hard-code versions on the consumer side, without relying on producer (CPS) to encode versions.

activate => 1
complete => 2
delete => 3
<<complete, activate>> 
can be read as
<<{2, complete}, {1, activate}>> 
can be re-ordered as 
<<{1, activate}, {2, complete}>> 

When will this not work

In state machines where the same event may be repeated more than once in a valid history, order can’t be figured out beforehand. Eg: If the same event is used to transition to different states, or state machines where there are cyclic transitions.

TODO: Prove that if events can’t be repeated, order can always be figured out beforehand.

Correct order: activate, delete, re-activate, delete

If delete > re-activate
activate, delete (Intermediate ACTIVATED state dropped)

If re-activate > delete
activate, delete, re-activate (state stuck in ACTIVATED)

Interactive: Cyclic transitions

Getting rid of cycles

Sometimes it might be possible to remove cyclic transitions from a state machine. Let’s assume that CPS allows transitions from ACTIVATED to TEMP-DEACTIVATED via a temp-deactivate event, and TEMP-DEACTIVATED back to ACTIVATED via a activate event. This may cycle any number of times.

Let’s assume that this is a temporary transition concerning a specific CPS and merchants don’t care about it. MPS may disregard the whole cycle by assigning activate a higher weight than temp-deactivate. Once MPS reached the ACTIVATED state, temp-deactivate event will be ignored.

Building up state real time

In real time systems, sometimes it’s not be feasible to collect all events, order them and then apply them to the state machine everytime a new event comes in. We might want to apply unordered events as and when they come in, by maybe storing only the last processed event.

This might be possible in cases when we don’t care about intermediate states, and if there’s a way to determine next state of a state machine by looking at the event. (eg: like in our case where every event is associated with a single next state, or in cases where the next state might be encoded in the event)

One way to solve this is to only apply an event e2 if

  1. e2 is the first event to be processed, or
  2. If the previous processed event e1 happened before e2 ie. e1 < e2.

    [name=Rahul Jayaraman] above transaction has to be serializable

Here we need to save the previously processed event somewhere for comparison.

The algorithm ensures that consumer state eventually reaches the same state as the producer. Another interesting property is that the system can tolerate loss of any or all intermediate events.

Conclusion

As an event publisher, be nice to downstream systems by encoding monotonic versions in the event, if you can.