Processing Stage
A Processing Stage component of a Mantis Job processes the stream of data by [transforming] it in some way. You can combine multiple Processing Stages into a single Job, or you can create a Job that consists of a single Processing Stage.
In its simplest form, a Processing Stage is a chain of RxJava operators operating on the Observable provided by the Source component.
Note
There is an alternate implementation that allows writing a Mantis Job as a series of operators operating directly on a MantisStream
instance which abstracts information about RxJava, Observables from the user and offers a simpler way (hopefully 🤞) to write mantis jobs. Please see Mantis DSL docs for more details
Transformations can be broadly categorized into two types: Scalar or Grouped. There are four varieties of Processing Stage, based on what type of transformation they accomplish:
- Scalar-to-Scalar — Also known as “narrow transformation,” this variety of Stage converts an
Observable<T>
into anObservable<R>
. Such a Stage extends theScalarToScalar
class. - Scalar-to-Key/Scalar-to-Group — Also known as “widening transformation,” this variety of Stage groups each element emitted by the source Observable by key. Typically you use such a Stage when you build a map/reduce-style job in which you need to perform stateful computations but the volume of data is too large to fit on a single worker. In such a model, the incoming data is divided into multiple streams, one per group. A subsequent State will typically to the stateful computation (for instance, calculating percentiles for the group). The purpose of this Scalar-to-Key State is to tag each incoming element with the group that it belongs to. Mantis then takes care of routing all traffic for the same group to the same worker in the subsequent stage.
- Scalar-to-Key — This is the legacy way of grouping data (it is more elegant but comes with a performance penalty). You extend the
ScalarToKey
class and transform anObservable<T>
into anObservable<GroupedObservable<K,R>>
(whereK
is the key). See the RxJavagroupBy
operator for more information. - Scalar-to-Group — This is a more efficient way to group data. You extend the
ScalarToGroup
class and transform anObservable<T>
into anObservable<MantisGroup<K,R>>
(whereK
is the key). This avoids the overhead associated with creating aGroupedObservable
which can limit the number of groups it is possible to create.
- Scalar-to-Key — This is the legacy way of grouping data (it is more elegant but comes with a performance penalty). You extend the
- Key-to-Scalar/Group-to-Scalar — Once you have split a stream, you need a stage that can take grouped data and return it to a scalar form.
- Key-to-Scalar — This is the legacy method and is designed for streams that have been split via a
ScalarToKey
Stage. You extend theKeyToScalar
class and transform aGroupedObservable<K,T>
into anObservable<T>
. - Group-to-Scalar — This is the newer, faster method. It is less elegant, in that you must transform an
Observable<MantisGroup>
that contains payloads from all groups, and you must therefore manage the per-group state. Typically you would do this via a map that holds per-group state and evicts entries that haven’t been touched recently. This method has much better performance than the Key-to-Scalar method because it omits the overhead around RxJava’sGroupedObservable
.
- Key-to-Scalar — This is the legacy method and is designed for streams that have been split via a
- Key-to-Key/Group-to-Group — You can further split a keyed stream by grouping it again if you have some use case that requires this.