Scaling Limits

MQL Queries that implement aggregates (COUNT, SUM, etc.), as well as those that use windowing (like window or group by), must necessarily maintain state.

The Mantis Publish library) and the Source Jobs do not perform MQL aggregation on the server side; consequently this must take place on the client side. This limits the blast radius if a client runs a long window or runs an aggregate against a large amount of data. However this creates potential scaling problems on the client side. This page will help you address such problems.

Details

There are two primary technical concerns in these cases:

  1. Windowing must necessarily buffer the output until the end of the window.
  2. group by cannot run multithreaded as RxJava does not guarantee pairs with the same key end up on the same thread.

In the first case, as a client you’re simply facing a memory issue as you must have enough memory to buffer until the end of each window. However usually when you window, you are also performing some form of aggregate against the window (otherwise there is little point in windowing rather than just emitting the data immediately).

The second case requires a single thread in order to guarantee that aggregate calculations against groups will be performed together.

A naive solution would be to always run MQL on a single thread. This solves the problem but is the least-scalable solution and also impacts queries do not require such restrictive behavior.

Solutions

There are a number of possible approaches you can take to address this problem. These primarily involve a tradeoff between engineering effort and effectiveness.

Solution A (Implemented)

The MQL library includes isAggregate(), which is a function of StringBoolean that indicates whether or not a given query requires an aggregate operation. This allows you to test a query to determine if if requires single-threading and handle it appropriately if so.

This represents a low-effort / medium-effectiveness tradeoff; only queries that require the serialized behavior will suffer the performance penalty.

Solution B (TBD?)

A more scalable solution would be to enable MQL to emit additional formats. An example of this would be a Mantis topology backend that could emit multi-stage Mantis Jobs that implement the desired operation in a scalable fashion. A group-by stage that distributes values to downstream workers using consistent hashing would be able to scale horizontally based on the number of groups (still vertically for window size).

This solution requires significantly more engineering effort, but represents a very large increase in scalability for these queries. It would be a possibly-fruitful road for the Mantis open source software project to travel.