The Processing Stage Component

A Processing Stage component of a Mantis Job processes the stream of data by [transforming] it in some way. You can combine multiple Processing Stages into a single Job, or you can create a Job that consists of a single Processing Stage.

In its simplest form, a Processing Stage is a chain of RxJava operators operating on the Observable provided by the Source component.

Transformations can be broadly categorized into two types: Scalar or Grouped. There are four varieties of Processing Stage, based on what type of transformation they accomplish:

  1. Scalar-to-Scalar — Also known as “narrow transformation,” this variety of Stage converts an Observable<T> into an Observable<R>. Such a Stage extends the ScalarToScalar class.
    the Scalar-to-Scalar stage
  2. Scalar-to-Key/Scalar-to-Group — Also known as “widening transformation,” this variety of Stage groups each element emitted by the source Observable by key. Typically you use such a Stage when you build a map/reduce-style job in which you need to perform stateful computations but the volume of data is too large to fit on a single worker. In such a model, the incoming data is divided into multiple streams, one per group. A subsequent State will typically to the stateful computation (for instance, calculating percentiles for the group). The purpose of this Scalar-to-Key State is to tag each incoming element with the group that it belongs to. Mantis then takes care of routing all traffic for the same group to the same worker in the subsequent stage.
    the Scalar-to-Key stage
    1. Scalar-to-Key — This is the legacy way of grouping data (it is more elegant but comes with a performance penalty). You extend the ScalarToKey class and transform an Observable<T> into an Observable<GroupedObservable<K,R>> (where K is the key). See the RxJava groupBy operator for more information.
    2. Scalar-to-Group — This is a more efficient way to group data. You extend the ScalarToGroup class and transform an Observable<T> into an Observable<MantisGroup<K,R>> (where K is the key). This avoids the overhead associated with creating a GroupedObservable which can limit the number of groups it is possible to create.
  3. Key-to-Scalar/Group-to-Scalar — Once you have split a stream, you need a stage that can take grouped data and return it to a scalar form.
    the Key-to-Scalar stage
    1. Key-to-Scalar — This is the legacy method and is designed for streams that have been split via a ScalarToKey Stage. You extend the KeyToScalar class and transform a GroupedObservable<K,T> into an Observable<T>.
    2. Group-to-Scalar — This is the newer, faster method. It is less elegant, in that you must transform an Observable<MantisGroup> that contains payloads from all groups, and you must therefore manage the per-group state. Typically you would do this via a map that holds per-group state and evicts entries that haven’t been touched recently. This method has much better performance than the Key-to-Scalar method because it omits the overhead around RxJava’s GroupedObservable.
  4. Key-to-Key/Group-to-Group — You can further split a keyed stream by grouping it again if you have some use case that requires this.
    the Key-to-Key stage