Skip to content

Introduction

A Mantis Job is a JVM-based stream processing application that takes in an Observable stream of data items, transforms this stream by using RxJava operators, and then outputs the results as another Observable stream.

RxJava Observables can be visualized by using “marble diagrams”:

RxJava Observables can be represented by “marble diagrams”

You can combine multiple RxJava operators to transform an Observable stream of items in many ways:

An incoming stream is processed by a Mantis Job composed of the operators map and merge, to compose an output stream

The above diagram shows a Mantis Job composed of two operators that process an input stream to compose an output stream. The first operator, map, emits a new Observable for each item emitted by the source Observable. The second operator, merge, emits each item emitted by those Observables as a fresh Observable stream.

There is an enormous wealth of ways in which you can transform streams of data by using RxJava Observable operators.

If the volume of data to be processed is too large for a single worker to handle, you can “divide and conquer” by grouping and dividing the operators across various processing stages, as in the following diagram:

An incoming stream is processed by a Mantis Job composed of three stages, one composed of the groupBy operator, the second by the window and reduce operators, and the third composed of the merge operator, together resulting in an output stream

Mantis Job Clusters

You define a Job Cluster before you submit Jobs. A Job Cluster is a containing entity for Jobs. It defines metadata and certain service-level agreements (SLAs). Job Clusters ease Job lifecycle management and Job revisioning.

For example, by setting the SLA of a Job Cluster to Min=1 and Max=1, you ensure that exactly one Job instance is always running for that Cluster. The Job Cluster also has default Job parameters that any new Jobs submitted to the Cluster inherit. You can update new Job artifacts into the Job Cluster so that the next Job submission picks up the latest version.

Components of a Mantis Job

A Mantis Job has three components, each of which has a corresponding chapter in this documentation:

  1. Source — Fetches data from an external source and makes it available in the form of an Observable.

  2. Processing Stage — Transforms the Observable stream by means of a variety of operators. In the new [DSL], we can think of it as the middle instead of a combination of stages — read more

  3. Sink — Pushes the resulting Observable out in the form of a fresh stream.

Note

There is an alternate implementation that allows writing a Mantis Job as a series of operators operating directly on a MantisStream instance which abstracts information about RxJava, Observables from the user and offers a simpler way (hopefully 🤞) to write mantis jobs. Please see Mantis DSL docs for more details

A Mantis Job is composed of three components: the “Source” fetches data in a streaming, non-blocking, back-pressure-aware manner; the “Stage” transforms the incoming stream by applying high-level functions such as map, scan, reduce, et cetera; the “Sink” pushes the results of the transformation in a non-blocking, asynchronous manner.

Directory Structure of a Mantis Job

In addition to the source files, Mantis requires some meta-files to be present in the Job artifact.

Here is a sample directory structure:

src/
  - main/
    - java/
      - com/
        - myorg/
          - MyJob.java
    - resources/
      - META-INF/
        - services/
          - io.mantisrx.runtime.MantisJobProvider
      - job.properties
      - job-test.properties
  • io.mantisrx.runtime.MantisJobProvider (required) — lists the fully-qualified name of the Java class that implements the MantisJobProvider interface
  • job.properties and job-test.properties (optional) — required only if you are initializing the platform via the .lifecycle() method

Creating a Mantis Job

To create a Mantis Job, call MantisJob…create(). When you do so, interpolate the following builder methods. The first three — .source(), .stage(), and .sink() — are required, they must be the first three of the methods that you call, and you must call them in that order:

  1. .source(AbstractJobSource) — required, see The Source Component
  2. .stage(Stage, stageConfig) — required, (call this one or more times) see The Processing Stage Component. Alternatively,
    .filter|.map|.flatMap|.keyBy — required, (call these operators one or more times) an alternative to .stage to simplify job description. See Mantis DSL
  3. .sink(Sink) — required, see The Sink Component
  4. .lifecycle(Lifecycle) — optional, allows for start-up and shut-down hooks
  5. .parameterDefinition(ParameterDefinition) — optional, (call this zero to many times) define any parameters your job requires here
  6. .metadata(Metadata) — optional, (call this zero to many times) define your job name and description here
of this class this method returns an object of this class
MantisStream init() MantisStream
MantisStream source() MantisStream
MantisStream sink() MantisStream
MantisStream map, flatMap, ... MantisStream
MantisStream keyBy() KeyedMantisStream
KeyedMantisStream map, window, ... KeyedMantisStream
KeyedMantisStream reduce MantisStream
MantisJob source() SourceHolder
SourceHolder stage() Stages
Stages stage() Stages
Stages sink() Config
Config lifecycle() Config
Config parameterDefinition() Config
Config metadata() Config
Config create() Job

Lifecycle Management

You can establish start-up and shut-down procedures for your Mantis Job by means of the .lifecycle(). builder method.

Pass this method a Lifecycle object, that is to say, an object that implements the following methods:

  • startup() — initialize arbitrary application configs, perform dependency injection, and any long running or shared service libraries.
  • shutdown() — gracefully close connections, shut down long running or shared service libraries, and general process cleanup.
  • getServiceLocator() — returns a ServiceLocator that implements the service(key) method. Implement this method to return your dependency injection object such as Guice.

Defining Parameters

To create a Parameter in order to pass it to the .parameterDefinition() builder method of the MantisJob builder, use the following new ParameterVariety()…build() builder methods:

  • .name(string) — a user-friendly name for your Parameter
  • .description(string) — a user-friendly description of your Parameter
  • .defaultValue(value) — the value of this Parameter if the Job does not override it
  • .validator(Validator) — a way of checking the proposed Parameter values for validity so bad values can be rejected before you submit the Job
  • .required() — call this builder method if the Job must provide a value for this Parameter

There are some built-in Parameter varieties you can choose from that correspond to common data types:

  • BooleanParameter
  • DoubleParameter
  • IntParameter
  • StringParameter

Validators

There are some standard Validators you can choose from that cover some common varieties of parameters:

  • Validators.range(start, end) — will reject as invalid Parameter values that do not lie between the indicated start and end numerical values (where start and end themselves are valid Parameter values)
  • Validators.notNullOrEmpty() — will reject empty strings or null values
  • Validators.alwaysPass() — will not reject any Parameter values as invalid

Example

For example:

myStringParameter = new StringParameter()
    .name("MyParameter")
    .description("This is a human-friendlydescription of my parameter")
    .validator(Validators.notNullOrEmpty())
    .defaultValue("SomeValue")
    .required()
    .build();

Defining Metadata

To create metadata in order to pass it to the .metadata() builder method of the MantisJob builder, use the following new Metadata.Builder()…build() builder methods:

  • .name(string)
  • .description(string)

For example:

myMetadata = new Metadata.Builder()
    .name("MyMetadata")
    .description("Description of my metadata")
    .build();