1. Introduction
This is the reference documentation for Genie version 4.0.0-rc.70. Here you will find high level concept documentation as well as documentation about how to install and configure this Genie version.
For more Genie information see the website. |
For documentation of the REST API for this version of Genie please see the API Guide. |
For a demo of this version of Genie please see the Demo Guide. |
2. 4.0.0-rc.70 Release Notes
The following are the release notes for Genie 4.0.0-rc.70.
2.1. Features
-
Upgrade to Spring Boot 2.0
-
Switch metric collection to use Micrometer
-
Allows users to plugin their own backend (Atlas, DataDog, etc)
-
Standard for Boot 2.0
-
-
Switched to JSR 310 classes for date and time manipulation
-
Upgrade to JPA 2.2
-
Integrate gRPC for Agent to server communication
-
Break 3.x security implementations into
genie-security
module
2.2. Library Upgrades
-
Spring Boot 2.0
-
Many changes see their Release Notes and Migration Guide
-
-
Upgrade to Hibernate 5.2.x
-
Supports JPA 2.2 specification and JSR 310 Java 8 Times (including
Instant
which isn’t supported by JPA 2.2)
-
-
Switched to HikariCP for DB connection pool
2.3. Property Changes
Many Spring properties were changed as part of the Boot 2.0 release. Those are not documented here. Please refer to their documentation. |
2.3.1. Added
Property | Description | Default Value |
---|---|---|
cloud.aws.credentials.useDefaultAwsCredentialsChain |
Whether to attempt creation of a standard AWS credentials chain. See Spring Cloud AWS for more information. |
true |
cloud.aws.region.auto |
Whether the AWS region will be attempted to be auto recognized via the AWS metadata services on EC2. See Spring Cloud AWS for more information. |
false |
cloud.aws.region.static |
The default AWS region. See Spring Cloud AWS for more information. |
us-east-1 |
cloud.aws.stack.auto |
Whether auto stack detection is enabled. See Spring Cloud AWS for more information. |
false |
management.endpoints.web.base-path |
The default base path for the Spring Actuator[https://docs.spring.io/spring-boot/docs/current/actuator-api/html/]
management endpoints. Switched from default |
/admin |
genie.aws.s3.buckets.[bucketName].roleARN |
For the bucket with name |
|
genie.aws.s3.buckets.[bucketName].region |
The AWS region the bucket with |
|
genie.grpc.server.services.job-file-sync.ackIntervalMilliseconds |
How many milliseconds to wait between checks whether some acknowledgement should be sent to the agent regardless of
whether the |
30,000 |
genie.grpc.server.services.job-file-sync.maxSyncMessages |
How many messages to receive from the agent before an acknowledgement message is sent back from the server |
10 |
spring.data.redis.repositories.enabled |
Whether Spring data repositories are enabled on top of redis as the backend store |
false |
genie.agent.filter.enabled |
If set to |
|
genie.agent.filter.version.minimum |
(Dynamic) The minimum version an agent needs to be (e.g., |
|
genie.agent.filter.version.blacklist |
(Dynamic) A regex matched against agent version (e.g., |
|
genie.agent.filter.version.whitelist |
(Dynamic) A regex matched against agent version (e.g., |
2.3.2. Renamed
Old Name | New Name | Reason |
---|---|---|
genie.tasks.clusterChecker.* |
genie.tasks.cluster-checker.* |
Modifications to how Spring handled property binding |
genie.tasks.databaseCleanup.* |
genie.tasks.database-cleanup.* |
Modifications to how Spring handled property binding |
genie.tasks.diskCleanup.* |
genie.tasks.disk-cleanup.* |
Modifications to how Spring handled property binding |
genie.jobs.clusters.loadBalancers.script.* |
genie.jobs.clusters.load-balancers.script.* |
Modifications to how Spring handled property binding |
genie.jobs.users.activeLimit.* |
genie.jobs.active-limit.* |
Modifications to how Spring handled property binding |
genie.zookeeper.enabled |
spring.cloud.zookeeper.enabled |
Leverage existing code rather than re-invent the wheel |
genie.zookeeper.connectionString |
spring.cloud.zookeeper.connectString |
Leverage existing code rather than re-invent the wheel |
2.3.3. Removed
Name | Reason |
---|---|
genie.aws.credentials.file |
No longer necessary as |
genie.aws.credentials.role |
Replaced with |
spring.jackson.date-format |
Switched to |
genie.jobs.clusters.load-balancers.script.order |
This behavior was to control the order the script load balancer was evaluated relative to other cluster load
balancer’s. This logic is more a framework or execution configuration than anything specific to the class itself. It
shouldn’t know how it’s used. The new mechanism is to provide an |
genie.jobs.cleanup.deleteArchiveFile |
Archival is now done in a flat structure to the destination location |
2.4. Metric Changes
Switched to Micrometer
2.4.1. Added
Name | Reason |
---|---|
genie.services.agentJob.handshake.counter |
Count usages of 'handshake' protocol, record agent metadata and decision made |
2.4.2. Renamed
Old Name | New Name | Reason |
---|---|---|
genie.jobs.coordination.clusterCommandQuery.timer |
genie.services.specification.clusterCommandQuery.timer |
Functionality moved to new service |
genie.jobs.submit.selectCluster.loadBalancer.counter |
genie.services.specification.loadBalancer.counter |
Functionality moved to new service |
genie.jobs.submit.localRunner.selectApplications.timer |
genie.services.specification.selectApplications.timer |
Functionality moved to new service |
genie.jobs.submit.localRunner.selectCluster.timer |
genie.services.specification.selectCluster.timer |
Functionality moved to new service |
genie.jobs.submit.selectCluster.noneSelected.counter |
genie.services.specification.selectCluster.noneSelected.counter |
Functionality moved to new service |
genie.jobs.submit.selectCluster.noneFound.counter |
genie.services.specification.selectCluster.noneFound.counter |
Functionality moved to new service |
genie.jobs.submit.localRunner.selectCommand.timer |
genie.services.specification.selectCommand.timer |
Functionality moved to new service |
3. 3.3.0 Release Notes
The following are the release notes for Genie 3.3.0.
3.1. Features
-
Complete database schema and interaction code re-write for more normalization
-
Allows more insights into job and user behavior by breaking apart large JSON blobs and other denormalized fields
-
Improved cluster selection algorithm to speed up selection
-
Projections on tables improve data transfer speeds
-
Merge jobs tables to reduce duplicate data
-
Surrogate primary keys for improved join performance and space usage vs. Sting based external unique ids
-
-
New fields added to jobs
-
grouping
-
A way to provide search for jobs related to each other. E.g. The name of an entire workflow in a job scheduler can be set in this field to provide way to find all the jobs related to this workflow
-
Added to search API as optional field
-
-
groupingInstance
-
Building on
grouping
this provides a field for the unique instance of the grouping e.g. the run identifier of the workflow -
Added to search API as optional field
-
-
-
New field(s) added to Job Request, Job, Cluster, Command, Application
-
metadata
-
Allows users to insert any additional metadata they wish to these resources. MUST be valid JSON.
-
Stored as blob so no search available. Meant for use by higher level systems to take metadata and parse it themselves for use in building up business use cases (lineage, relationships, etc) that the Genie data model doesn’t support natively
-
-
-
Switch to H2 for in memory database
-
Turn on Hibernate schema validation at boot
3.2. Upgrade Instructions
Flyway will upgrade the database schema for you. Due to performance reasons at large scale, the data for jobs are not copied over between versions by default. Data for applications, commands and clusters are copied so as not to interrupt operation. If you desire to copy over your old job data the tables were copied over into {tableName}_old and for MySQL and PostgreSQL scripts exist to copy over the job data. You can execute these scripts on your database and they should be able to run while your application is active and copy over data in the background.
If you run the data movement scripts it will remove the old tables. If you don’t they will sit in your schema. The next major Genie release will remove these tables in their schema upgrade scripts if they still exist. Feel free to drop them yourself if they’re no longer needed. |
3.3. Library Upgrades
-
Upgrade Spring Boot to 2.2.5.RELEASE
-
Upgrade to Spring Cloud Hoxton.SR3 for cloud dependency management
4. 3.2.0 Release Notes
The following are the release notes for Genie 3.2.0.
4.1. Upgrade Instructions
If upgrading from existing 3.1.x installation run appropriate database upgrade script:
This must be done before deploying the 3.2.0 binary or Flyway will break. Going forward this will no longer be necessary and Genie binary will package upgrade scripts and Flyway will apply them automatically.
Once the script is run you can deploy the 3.2.0 binary. Once successfully deployed in your db schema you should see a
new table schema_version
. Do not delete or modify this table it is used by Flyway to manage upgrades.
4.2. Features
-
Database improvements
-
Switch to Flyway for database upgrade management
-
-
Abstract internal eventing behind common interface
-
Bug fixes
4.3. Library Upgrades
-
Upgrade Spring Boot to 1.5.7.RELEASE
-
Upgrade to Spring Platform IO Brussels-SR5 for library dependency management
-
Upgrade to Spring Cloud Dalston.SR3 for cloud dependency management
4.5. Database Upgrades
-
Standardize database schemas for consistency
-
Switch to Flyway for database upgrade management
-
If using MySQL now require 5.6.3+ due to properties needed. See Installation for details
5. 3.1.0 Release Notes
The following are the release notes for Genie 3.1.0.
5.1. Features
-
Spring Session support made more flexible
-
Now can support none (off), Redis, JDBC and HashMap as session data stores based on spring.session.store-type property
-
-
Actuator endpoints secured by default
-
Follows new Spring default
-
Turn off by setting
management.security.enabled
tofalse
-
-
Optional cluster load balancer via Admin supplied script
-
Add dependencies to the Cluster and Command entities
-
Add configurations to the JobRequest entity
5.2. Library Upgrades
-
Upgrade Spring Boot from 1.3.8.RELEASE to 1.5.4.RELEASE
-
Upgrade to Spring Platform IO Brussels-SR3 for library dependency management
-
Upgrade to Spring Cloud Dalston.SR2 for cloud dependency management
-
Removal of Spring Cloud Cluster
-
Spring Cloud Cluster was deprecated and the leadership election functionality previously leveraged by Genie was moved to Spring Integration Zookeeper. That library is now used.
-
-
Tomcat upgraded to 8.5 from 8.0
5.3. Property Changes
5.3.1. Added
Property | Description | Default Value |
---|---|---|
genie.jobs.clusters.loadBalancers.script.destination |
The location on disk where the script source file should be stored after it is downloaded from
|
|
genie.jobs.clusters.loadBalancers.script.enabled |
Whether the script based load balancer should be enabled for the system or not.
See also: |
false |
genie.jobs.clusters.loadBalancers.script.order |
The order which the script load balancer should be evaluated. The lower this number the sooner it is evaluated. 0 would be the first thing evaluated if nothing else is set to 0 as well. Must be < 2147483647 (Integer.MAX_VALUE). If no value set will be given Integer.MAX_VALUE - 1 (default). |
2147483646 |
genie.jobs.clusters.loadBalancers.script.refreshRate |
How frequently to refresh the load balancer script (in milliseconds) |
300000 |
genie.jobs.clusters.loadBalancers.script.source |
The location of the script the load balancer should load to evaluate which cluster to use for a job request |
file:///tmp/genie/loadBalancers/script/source/loadBalance.js |
genie.jobs.clusters.loadBalancers.script.timeout |
The amount of time (in milliseconds) that the system will attempt to run the cluster load balancer script before it forces a timeout |
5000 |
genie.tasks.databaseCleanup.batchSize |
The number of jobs to delete from the database at a time. Genie will loop until all jobs older than the retention time are deleted. |
10000 |
management.security.roles |
The roles a user needs to have in order to access the Actuator endpoints |
ADMIN |
security.oauth2.resource.filter-order |
The order the OAuth2 resource filter is places within the spring security chain |
3 |
spring.data.redis.repositories.enabled |
Whether Spring data repositories should attempt to be created for Redis |
true |
spring.session.store-type |
The back end storage system for Spring to store HTTP session information. See Spring Boot Session for more information. Currently on classpath only none, hash_map, redis and jdbc will work. |
hash_map |
5.3.2. Changed Default Value
Property | Old Default | New Default |
---|---|---|
genie.tasks.clusterChecker.healthIndicatorsToIgnore |
memory,genie,discoveryComposite |
memory,genieMemory,discoveryComposite |
management.security.enabled |
false |
true |
5.3.4. Renamed
Old Name | New Name |
---|---|
multipart.max-file-size |
spring.http.multipart.max-file-size |
multipart.max-request-size |
spring.http.multipart.max-file-size |
spring.cloud.cluster.leader.enabled |
genie.zookeeper.enabled |
spring.cloud.cluster.zookeeper.connect |
genie.zookeeper.connectionString |
spring.cloud.cluster.zookeeper.namespace |
genie.zookeeper.leader.path |
spring.datasource.min-idle |
spring.datasource.tomcat.min-idle |
spring.datasource.max-idle |
spring.datasource.tomcat.max-idle |
spring.datasource.max-active |
spring.datasource.tomcat.max-active |
spring.datasource.validation-query |
spring.datasource.tomcat.validation-query |
spring.datasource.test-on-borrow |
spring.datasource.tomcat.test-on-borrow |
spring.datasource.test-on-connect |
spring.datasource.tomcat.test-on-connect |
spring.datasource.test-on-return |
spring.datasource.tomcat.test-on-return |
spring.datasource.test-while-idle |
spring.datasource.tomcat.test-while-idle |
spring.datasource.min-evictable-idle-time-millis |
spring.datasource.tomcat.min-evictable-idle-time-millis |
spring.datasource.time-between-eviction-run-millis |
spring.datasource.tomcat.time-between-eviction-run-millis |
spring.jpa.hibernate.naming-strategy |
spring.jpa.hibernate.naming.strategy |
5.4. Database Upgrades
-
Add cluster and command dependencies table
-
Rename MySQL and PostgreSQL schema files
-
Index 'name' column of Jobs table
-
Switch Job and JobRequest tables 'description' column to text
-
Switch Applications' table 'cluster_criterias' and 'command_criteria' columns to text
-
Increase the size of 'tags' column for applications, clusters, commands, jobs, job_requests
-
Switch JobRequest table 'dependencies' column to text
-
Add job request table configs column
-
Double the size of 'config' and 'dependencies' column for Application, Cluster, Command
6. Concepts
6.1. Data Model
The Genie 3 data model contains several modifications and additions to the Genie 2 data model to enable even more flexibility, modularity and meta data retention. This section will go over the purpose of the various resources available from the Genie API and how they interact together.
6.1.1. Caveats
-
The specific resource fields are NOT defined in this document. These fields are available in the REST API documentation
6.1.2. Resources
The following sections describe the various resources available from the Genie REST APIs. You should reference the API Docs for how to manipulate these resources. These sections will focus on the high level purposes for each resource and how they rely and/or interact with other resources within the system.
6.1.2.1. Tagging
An important concept is the tagging of resources. Genie relies heavily on tags for how the system discovers resources like clusters and commands for a job. Each of the core resources has a set of tags that can be associated with them. These tags can be of set to whatever you want but it is recommended to come up with some sort of consistent structure for your tags to make it easier for users to understand their purpose. For example at Netflix we’ve adopted some of the following standards for our tags:
-
sched:{something}
-
This corresponds to any schedule like that this resource (likely a cluster) is expected to adhere to
-
e.g.
sched:sla
orsched:adhoc
-
-
type:{something}
-
e.g.
type:yarn
ortype:presto
for a cluster ortype:hive
ortype:spark
for a command
-
-
ver:{something}
-
The specific version of said resource
-
e.g. two different Spark commands could have
ver:1.6.1
vsver:2.0.0
-
-
data:{something}
-
Used to classify the type of data a resource (usually a command) will access
-
e.g.
data:prod
ordata:test
-
6.1.2.2. Configuration Resources
The following resources (applications, commands and clusters) are considered configuration, or admin, resources. They’re generally set up by the Genie administrator and available to all users for user with their jobs.
6.1.2.2.1. Application Resource
An application resource represents pretty much what you’d expect. It is a reusable set of binaries, configuration files
and setup files that can be used to install and configure (surprise!) an application. Generally these resources are
used when an application isn’t already installed and on the PATH
on a Genie node.
When a job is run the job Genie will download all the dependencies, configuration files and setup files of each application and cluster and store it all in the job working directory. It will then execute the setup script in order to install that application for that job. Genie is "dumb" as to the contents or purpose of any of these files so the onus is on the administrators to create and test these packages.
Applications are very useful for decoupling application binaries from a Genie deployment. For example you could deploy a Genie cluster and change the version of Hadoop, Hive, Spark that Genie will download without actually re-deploying Genie. Applications can be combined together via a command. This will be explained more in the Command section.
The first entity to talk about is an application. Applications are linked to commands in order for binaries and configurations to be downloaded and installed at runtime. Within Netflix this is frequently used to deploy new clients without redeploying a Genie cluster.
At Netflix our applications frequently consists of a zip of all the binaries uploaded to s3 along with a setup file to unzip and configure environment variables for that application.
It is important to note applications are entirely optional and if you prefer to just install all client binaries on a Genie node beforehand you’re free to do that. It will save in overhead for job launch times but you will lose flexibility in the trade-off.
6.1.2.2.2. Command Resource
Commands resources primarily represent what a user would enter at the command line if you wanted to run a process on a cluster and what binaries (i.e., applications) you would need on your PATH to make that possible.
Commands can have configuration, setup and dependencies files just like applications but primarily they should have an ordered list of applications associated with them if necessary. For example lets take a typical scenario involving running Hive. To run Hive you generally need a few things:
-
A cluster to run its processing on (we’ll get to that in the Cluster section)
-
A hive-site.xml file which says what metastore to connect to amongst other settings
-
Hive binaries
-
Hadoop configurations and binaries
So a typical setup for Hive in Genie would be to have one, or many, Hive commands configured. Each command would have its own hive-site.xml pointing to a specific metastore (prod, test, etc). Alternatively, site-specific configuration can be associated to clusters and will be available to all commands executing against it. The command would depend on Hadoop and Hive applications already configured which would have all the default Hadoop and Hive binaries and configurations. All this would be combined in the job working directory in order to run Hive.
You can have any number of commands configured in the system. They should then be linked to the clusters they can execute on. Clusters are explained next.
6.1.2.2.3. Cluster
A cluster stores all the details of an execution cluster including connection information, properties, etc. Some cluster examples are Hadoop, Spark, Presto, etc. Every cluster can be linked to a set of commands that it can run.
Genie does not launch clusters for you. It merely stores metadata about the clusters you have running in your environment so that jobs using commands and applications can connect to them.
Once a cluster has been linked to commands your Genie instance is ready to start running jobs. The job resources are described in the following section. One important thing to note is that the list of commands linked to the cluster is a priority ordered list. That means if you have two pig commands available on your system for the same cluster the first one found in the list will be chosen provided all tags match. See How it Works for more details.
6.1.2.3. Job Resources
The following resources all relate to a user submitting and monitoring a given job. They are split up from the Genie 2 Job idea to provide better separation of concerns as usually a user doesn’t care about certain things. What node a job ran on or its Linux process exit code for example.
Users interact with these entities directly though all but the initial job request are read-only in the sense you can only get their current state back from Genie.
6.1.2.3.1. Job Request
This is the resource you use to kick off a job. It contains all the information the system needs to run a job. Optionally the REST APIs can take attachments. All attachments and file dependencies are downloaded into the root of the jobs working directory. The most important aspects are the command line arguments, the cluster criteria and the command criteria. These dictate the which cluster, command and arguments will be used when the job is executed. See the How it Works section for more details.
6.1.2.3.2. Job
The job resource is created in the system after a Job Request is received. All the information a typical user would be interested in should be contained within this resource. It has links to the command, cluster and applications used to run the job as well as the meta information like status, start time, end time and others. See the REST API documentation for more details.
6.1.2.3.3. Job Execution
The job execution resource contains information about where a job was run and other information that may be more interesting to a system admin than a regular user. Frequently useful in debugging.
A job contains all the details of a job request and execution including any command line arguments. Based on the request parameters, a cluster and command combination is selected for execution. Job requests can also supply necessary files to Genie either as attachments or using the file dependencies field if they already exist in an accessible file system. As a job executes, its details are recorded in the job record within the Genie database.
6.1.2.3.4. Resource configuration vs. dependencies
Genie allows associating files with the resources above so that these files are retrieved and placed in the job execution directory as part of the setup. When creating an Application, a Cluster, a Command or a Job, it is possible to associate configs and/or dependencies. Configs are expected to be small configuration files (XML, JSON, YAML, …), whereas dependencies are expected to be larger and possibly binary (Jars, executables, libraries, etc). Application, Cluster, and Command dependencies are deleted after job completion (unless Genie is configured to preserve them), to avoid storing and archiving them over and over. Configurations are preserved. Job configurations and dependencies are also preserved.
6.1.3. Wrap-up
This section was intended to provide insight into how the Genie data model is thought out and works together. It is meant to be very generic and support as many use cases as possible without modifications to the Genie codebase.
6.2. How it Works
This section is meant to provide context for how Genie can be configured with Clusters, Commands and Applications (see Data Model for details) and then how these work together in order to run a job on a Genie node.
6.2.1. Resource Configuration
This section describes how configuration of Genie works from an administrator point of view. This isn’t how to install and configure the Genie application itself. Rather it is how to configure the various resources involved in running a job.
6.2.1.1. Register Resources
All resources (clusters, commands, applications) should be registered with Genie before attempting to link them together. Any files these resources depend on should be uploaded somewhere Genie can access them (S3, web server, mounted disk, etc).
Tagging of the resources, particularly Clusters and Commands, is extremely important. Genie will use the tags in order
to find a cluster/command combination to run a job. You should come up with a convenient tagging scheme for your
organization. At Netflix we try to stick to a pattern for tags structures like {tag category}:{tag value}
. For
example type:yarn
or data:prod
. This allows the tags to have some context so that when users look at what resources
are available they can find what to submit their jobs with so it is routed to the correct cluster/command combination.
6.2.1.2. Linking Resources
Once resources are registered they should be linked together. By linking we mean to represent relationships between the resources.
6.2.1.2.1. Commands for a Cluster
Adding commands to a cluster means that the administrator acknowledges that this cluster can run a given set of commands. If a command is not linked to a cluster it cannot be used to run a job.
The commands are added in priority order. For example say you have different Spark commands you want to add to a given YARN cluster but you want Genie to treat one as the default. Here is how those commands might be tagged:
Spark 1.6.0 (id: spark16)
* type:sparksubmit
* ver:1.6
* ver:1.6.0
Spark 1.6.1 (id: spark161)
* type:sparksubmit
* ver:1.6.1
Spark 2.0.0 (id: spark200)
* type:sparksubmit
* ver:2.0
* ver:2.0.0
Now if we added the commands to the cluster in this order: spark16, spark161, spark200
and a user submitted a job
only requesting a command tagged with type:sparksubmit
(as in they don’t care what version just the default) they
would get Spark 1.6.0. However if we later deemed 2.0.0 to be ready to be the default we would reorder the commands to
spark200, spark16, spark161
and that same job if submitted again would now run with Spark 2.0.0.
6.2.1.2.2. Applications for a Command
Linking application(s) to commands means that a command has a dependency on said application(s). The order of the applications added is important because Genie will setup the applications in that order. Meaning if one application depends on another (e.g. Spark depends on Hadoop on classpath for YARN mode) Hadoop should be ordered first. All applications must successfully be installed before Genie will start running the job.
6.2.2. Job Submission
The system admin has everything registered and linked together. Things could change but that’s mostly transparent to end users, who just want to run jobs. How does that work? This section walks through what happens at a high level when a job is submitted.
6.2.2.1. Cluster and command matching
In order to submit a job request there is some work a user will have to do up front. What kind of job are they running?
What cluster do they want to run on? What command do they want to use? Do they care about certain details like version
or just want the defaults? Once they determine the answers to the questions they can decide how they want to tag their
job request for the clusterCriterias
and commandCriteria
fields.
General rule of thumb for these fields is to use the lowest common denominator of tags to accomplish what a user requires. This will allow the most flexibility for the job to be moved to different clusters or commands as need be. For example if they want to run a Spark job and don’t really care about version it is better to just say "type:sparksubmit" (assuming this is tagging structure at your organization) only instead of that and "ver:2.0.0". This way when versions 2.0.1 or 2.1.0 become available, the job moves along with the new default. Obviously if they do care about version they should set it or any other specific tag.
The clusterCriterias
field is an array of ClusterCriteria
objects. This is done to provide a fallback mechanism.
If no match is found for the first ClusterCriteria
and commandCriteria
combination it will move onto the second
and so on until all options are exhausted. This is handy if it is desirable to run a job on some cluster that is only
up some of the time but other times it isn’t and its fine to run it on some other cluster that is always available.
Only clusters with status UP and commands with status ACTIVE will be considered during the selection process
all others are ignored.
|
6.2.2.1.1. Cluster matching example
Say the following 3 clusters exists tagged as follows:
PrestoTestCluster: . sched:test . type:presto . ver:0.149
HadoopProdCluster: . sched:sla . type:yarn . ver:2.7.0 . ver:2.7
HadoopTestCluster: . sched:test . type:yarn . ver:2.7.1 . ver:2.7
Criteria | Match | Reason |
---|---|---|
|
HadoopProdCluster |
HadoopProdCluster satisfies all criteria |
|
HadoopProdCluster or HadoopTestCluster |
Two clusters satisfy the criteria, a choice behavior is unspecified |
|
HadoopTestCluster |
HadoopTestCluster satisfies all criteria |
|
- |
No cluster matches all criteria |
|
PrestoTestCluster |
The first criteria does not match any cluster, so fallback happens to the second, less restrictive criteria ("any presto cluster"). |
6.2.2.2. User Submits a Job Request
There are other things a user needs to consider when submitting a job. All dependencies which aren’t sent as attachments must already be uploaded somewhere Genie can access them. Somewhere like S3, web server, shared disk, etc.
Users should familiarize themselves with whatever the executable
for their desired command includes. It’s possible
the system admin has setup some default parameters they should know are there so as to avoid duplication or unexpected
behavior. Also they should make sure they know all the environment variables that may be available to them as part of
the setup process of all the cluster, command and application setup processes.
6.2.2.3. Genie Receives the Job Request
When Genie receives the job request it does a few things immediately:
-
If the job request doesn’t have an id it creates a GUID for the job
-
It saves the job request to the database so it is recorded
-
If the ID is in use a 409 will be returned to the user saying there is a conflict
-
-
It creates job and job execution records in data base for consistency
-
It saves any attachments in a temporary location
Next Genie will attempt to find a cluster and command matching the requested tag combinations. If none is found it will send a failure back to the user and mark the job failed in the database.
If a combination is found Genie will then attempt to determine if the node can run the job. By this it means it will check the amount of client memory the job requires against the available memory in the Genie allocation. If there is enough the job will be accepted and will be run on this node and the jobs memory is subtracted from the available pool. If not it will be rejected with a 503 error message and user should retry later.
The amount of memory used by a job is not strictly enforced or even monitored. Such size is determined as follows:
-
Account for the amount requested in the job request (which must be below an admin-defined threshold)
-
If not provided in the request, use the number provided by the admins for the given command
-
If not provided in the command, use a global default set by the admins
Successful job submission results in a 202 message to the user stating it’s accepted and will be processed asynchronously by the system.
6.2.2.4. Genie Performs Job Setup
Once a job has been accepted to run on a Genie node, a workflow is executed in order to setup the job working directory and launch the job. Some minor steps left out for brevity.
-
Job is marked in
INIT
state in the database -
A job directory is created under the admin configured jobs directory with a name of the job id
-
A run script file is created with the name
run
under the job working directory-
Currently this is a bash script
-
-
Kill handlers are added to the run script
-
Directories for Genie logs, application files, command files, cluster files are created under the job working directory
-
Default environment variables are added to the run script to export their values
-
Cluster configuration files are downloaded and stored in the job work directory
-
Cluster related variables are written into the run script
-
Application configuration and dependency files are downloaded and stored in the job directory if any applications are needed
-
Application related variables are written into the run script
-
Cluster configuration and dependency files are downloaded and stored in the job directory
-
Command configuration and dependency files are downloaded and stored in the job directory
-
Command related variables are written into the run script
-
All job dependency files (including configurations, dependencies, attachments) are downloaded into the job working directory
-
Job related variables are written into the run script
6.2.2.5. Genie Launches and Monitors the Job Execution
Assuming no errors occurred during the setup, the job is launched.
-
Job
run
script is executed in a forked process. -
Script
pid
stored in databasejob_executions
table and job marked asRUNNING
in database -
Monitoring process created for pid
Once the job is running Genie will poll the PID periodically waiting for it to no longer be used.
Assumption made as to the amount of process churn on the Genie node. We’re aware PID’s can be reused but reasonably this shouldn’t happen within the poll period given the amount of available PID to the processes a typical Genie node will run. |
Once the pid no longer exists Genie checks the done file for the exit code. It marks the job succeeded, failed or killed depending on that code.
6.2.2.6. Genie Performs Job Clean-Up
To save disk space Genie will delete application, cluster and command dependencies from the job working directory after a job is completed.
This can be disabled by an admin. If the job is marked as it should be archived the working directory will be zipped up
and stored in the default archive location as {jobId}.tar.gz
.
6.2.3. User Behavior
Users can check on the status of their job using the status
API and get the output using the output APIs. See the
REST Documentation for specifics on how to do that.
6.2.4. Wrap Up
This section should have helped you understand how Genie works at a high level from configuration all the way to user job submission and monitoring. The design of Genie is intended to make this process repeatable and reliable for all users while not hiding any of the details of what is executed at job runtime.
6.3. Netflix Example
Understanding Genie without a concrete example is hard. This section attempts to walk through an end to end configuration and job execution example of a job at Netflix. To see more examples or try your own please see the Demo Guide. Also see the REST API documentation which will describe the purpose of the fields of the resources shown below.
This example contains JSON representations of resources. |
6.3.1. Configuration
For the purpose of brevity we will only cover a subset of the total Netflix configuration.
6.3.1.1. Clusters
At Netflix there are tens of active clusters available at any given time. For this example we’ll focus on the
production (SLA) and adhoc Hadoop YARN clusters and the production
Presto cluster. For the Hadoop clusters we launch using
Amazon EMR but it really shouldn’t matter how clusters are launched provided you can
access the proper *site.xml
files.
The process of launching a YARN cluster at Netflix involves a set of Python tools which interact with the Amazon and Genie APIs. First these tools use the EMR APIs to launch the cluster based on configuration files for the cluster profile. Then the cluster site XML files are uploaded to S3. Once this is complete all the metadata is sent to Genie to create a cluster resource which you can see examples of below.
Presto clusters are launched using Spinnaker on regular EC2 instances. As part of the pipeline the metadata is registered with Genie using the aforementioned Python tools, which in turn leverage the OSS Genie Python Client.
In the following cluster resources you should note the tags
applied to each cluster. Remember that the genie.id
and
genie.name
tags are automatically applied by Genie but all other tags are applied by the admin.
For the YARN clusters note that all the configuration files are referenced by their S3 locations. These files are downloaded into the job working directory at runtime.
6.3.1.1.1. Hadoop Prod Cluster
{
"id": "bdp_h2prod_20161217_205111",
"created": "2016-12-17T21:09:30.845Z",
"updated": "2016-12-20T17:31:32.523Z",
"tags": [
"genie.id:bdp_h2prod_20161217_205111",
"genie.name:h2prod",
"sched:sla",
"ver:2.7.2",
"type:yarn",
"misc:h2bonus3",
"misc:h2bonus2",
"misc:h2bonus1"
],
"version": "2.7.2",
"user": "dataeng",
"name": "h2prod",
"description": null,
"setupFile": null,
"configs": [
"s3://bucket/users/bdp/h2prod/20161217/205111/genie/yarn-site.xml",
"s3://bucket/users/bdp/h2prod/20161217/205111/genie/mapred-site.xml",
"s3://bucket/users/bdp/h2prod/20161217/205111/genie/hdfs-site.xml",
"s3://bucket/users/bdp/h2prod/20161217/205111/genie/core-site.xml"
],
"dependencies": [],
"status": "UP",
"_links": {
"self": {
"href": "https://genieHost/api/v3/clusters/bdp_h2prod_20161217_205111"
},
"commands": {
"href": "https://genieHost/api/v3/clusters/bdp_h2prod_20161217_205111/commands"
}
}
}
6.3.1.1.2. Hadoop Adhoc Cluster
{
"id": "bdp_h2query_20161108_204556",
"created": "2016-11-08T21:07:17.284Z",
"updated": "2016-12-07T00:51:19.655Z",
"tags": [
"sched:adhoc",
"misc:profiled",
"ver:2.7.2",
"sched:sting",
"type:yarn",
"genie.name:h2query",
"genie.id:bdp_h2query_20161108_204556"
],
"version": "2.7.2",
"user": "dataeng",
"name": "h2query",
"description": null,
"setupFile": "",
"configs": [
"s3://bucket/users/bdp/h2query/20161108/204556/genie/core-site.xml",
"s3://bucket/users/bdp/h2query/20161108/204556/genie/hdfs-site.xml",
"s3://bucket/users/bdp/h2query/20161108/204556/genie/mapred-site.xml",
"s3://bucket/users/bdp/h2query/20161108/204556/genie/yarn-site.xml"
],
"dependencies": [],
"status": "UP",
"_links": {
"self": {
"href": "https://genieHost/api/v3/clusters/bdp_h2query_20161108_204556"
},
"commands": {
"href": "https://genieHost/api/v3/clusters/bdp_h2query_20161108_204556/commands"
}
}
}
6.3.1.1.3. Presto Prod Cluster
{
"id": "presto-prod-v009",
"created": "2016-12-05T19:33:52.575Z",
"updated": "2016-12-05T19:34:14.725Z",
"tags": [
"sched:adhoc",
"genie.id:presto-prod-v009",
"type:presto",
"genie.name:presto",
"ver:0.149",
"data:prod",
"type:spinnaker-presto"
],
"version": "1480966454",
"user": "dataeng",
"name": "presto",
"description": null,
"setupFile": null,
"configs": [],
"dependencies": [],
"status": "UP",
"_links": {
"self": {
"href": "https://genieHost/api/v3/clusters/presto-prod-v009"
},
"commands": {
"href": "https://genieHost/api/v3/clusters/presto-prod-v009/commands"
}
}
}
6.3.1.2. Commands
Commands and applications at Netflix are handled a bit differently than clusters. The source data for these command and application resources are not generated dynamically like the cluster configuration files. Instead they are stored in a git repository as a combination of YAML, bash, python and other files. These configuration files are synced to an S3 bucket every time a commit occurs. This makes sure Genie is always pulling in the latest configuration. This sync is performed by a Jenkins job responding to a commit hook trigger. Also done in this Jenkins job is registration of the commands and applications with Genie via the same python tool set and Genie python client as with clusters.
Pay attention to the tags applied to the commands as they are used to select which command to use when a job is run. The presto command includes a setup file which allows additional configuration when it is used.
6.3.1.2.1. Presto 0.149
{
"id": "presto0149",
"created": "2016-08-08T23:22:15.977Z",
"updated": "2016-12-20T23:28:44.678Z",
"tags": [
"genie.id:presto0149",
"type:presto",
"genie.name:presto",
"ver:0.149",
"data:prod",
"data:test"
],
"version": "0.149",
"user": "builds",
"name": "presto",
"description": "Presto Command",
"setupFile": "s3://bucket/builds/bdp-cluster-configs/genie3/commands/presto/0.149/setup.sh",
"configs": [],
"dependencies": [],
"status": "ACTIVE",
"executable": "${PRESTO_CMD} --server ${PRESTO_SERVER} --catalog hive --schema default --debug",
"checkDelay": 5000,
"memory": null,
"_links": {
"self": {
"href": "https://genieHost/api/v3/commands/presto0149"
},
"applications": {
"href": "https://genieHost/api/v3/commands/presto0149/applications"
},
"clusters": {
"href": "https://genieHost/api/v3/commands/presto0149/clusters"
}
}
}
Presto 0.149 Setup File
#!/bin/bash
set -o errexit -o nounset -o pipefail
chmod 755 ${GENIE_APPLICATION_DIR}/presto0149/dependencies/presto-cli
export JAVA_HOME=/apps/bdp-java/java-8-oracle
export PATH=${JAVA_HOME}/bin/:$PATH
export PRESTO_SERVER="http://${GENIE_CLUSTER_NAME}.rest.of.url"
export PRESTO_CMD=${GENIE_APPLICATION_DIR}/presto0149/dependencies/presto-wrapper.py
chmod 755 ${PRESTO_CMD}
6.3.1.2.2. Spark Submit Prod 1.6.1
{
"id": "prodsparksubmit161",
"created": "2016-05-17T16:38:31.152Z",
"updated": "2016-12-20T23:28:33.042Z",
"tags": [
"genie.id:prodsparksubmit161",
"genie.name:prodsparksubmit",
"ver:1.6",
"type:sparksubmit",
"data:prod",
"ver:1.6.1"
],
"version": "1.6.1",
"user": "builds",
"name": "prodsparksubmit",
"description": "Prod Spark Submit Command",
"setupFile": "s3://bucket/builds/bdp-cluster-configs/genie3/commands/spark/1.6.1/prod/scripts/spark-1.6.1-prod-submit-cmd.sh",
"configs": [
"s3://bucket/builds/bdp-cluster-configs/genie3/commands/spark/1.6.1/prod/configs/hive-site.xml"
],
"dependencies": [],
"status": "ACTIVE",
"executable": "${SPARK_HOME}/bin/dsespark-submit",
"checkDelay": 5000,
"memory": null,
"_links": {
"self": {
"href": "https://genieHost/api/v3/commands/prodsparksubmit161"
},
"applications": {
"href": "https://genieHost/api/v3/commands/prodsparksubmit161/applications"
},
"clusters": {
"href": "https://genieHost/api/v3/commands/prodsparksubmit161/clusters"
}
}
}
Spark Submit Prod 1.6.1 Setup File
#!/bin/bash
#set -o errexit -o nounset -o pipefail
export JAVA_HOME=/apps/bdp-java/java-8-oracle
#copy hive-site.xml configuration
cp ${GENIE_COMMAND_DIR}/config/* ${SPARK_CONF_DIR}
cp ${GENIE_COMMAND_DIR}/config/* ${HADOOP_CONF_DIR}/
6.3.1.2.3. Spark Submit Prod 2.0.0
{
"id": "prodsparksubmit200",
"created": "2016-10-31T16:59:01.145Z",
"updated": "2016-12-20T23:28:47.340Z",
"tags": [
"ver:2",
"genie.name:prodsparksubmit",
"ver:2.0",
"genie.id:prodsparksubmit200",
"ver:2.0.0",
"type:sparksubmit",
"data:prod"
],
"version": "2.0.0",
"user": "builds",
"name": "prodsparksubmit",
"description": "Prod Spark Submit Command",
"setupFile": "s3://bucket/builds/bdp-cluster-configs/genie3/commands/spark/2.0.0/prod/copy-config.sh",
"configs": [
"s3://bucket/builds/bdp-cluster-configs/genie3/commands/spark/2.0.0/prod/configs/hive-site.xml"
],
"dependencies": [],
"status": "ACTIVE",
"executable": "${SPARK_HOME}/bin/dsespark-submit.py",
"checkDelay": 5000,
"memory": null,
"_links": {
"self": {
"href": "https://genieHost/api/v3/commands/prodsparksubmit200"
},
"applications": {
"href": "https://genieHost/api/v3/commands/prodsparksubmit200/applications"
},
"clusters": {
"href": "https://genieHost/api/v3/commands/prodsparksubmit200/clusters"
}
}
}
Spark Submit 2.0.0 Setup File
#!/bin/bash
set -o errexit -o nounset -o pipefail
# copy hive-site.xml configuration
cp ${GENIE_COMMAND_DIR}/config/* ${SPARK_CONF_DIR}
6.3.1.3. Applications
Below are the applications needed by the above commands. The most important part of these applications are the dependencies and the setup file.
The dependencies are effectively the installation package and at Netflix typically are a zip of all binaries needed to run a client like Hadoop, Hive, Spark etc. Some of these zips are generated by builds and placed in S3 and others are downloaded from OSS projects and uploaded to S3 periodically. Often minor changes to these dependencies are needed. A new file is uploaded to S3 and the Genie caches on each node will be refreshed with this new file on next access. This pattern allows us to avoid upgrade of Genie clusters every time an application changes.
The setup file effectively is the installation script for the aforementioned dependencies. It is sourced by Genie and the expectation is that after it is run the application is successfully configured in the job working directory.
6.3.1.3.1. Hadoop 2.7.2
{
"id": "hadoop272",
"created": "2016-08-18T16:58:31.044Z",
"updated": "2016-12-21T00:01:08.263Z",
"tags": [
"type:hadoop",
"genie.id:hadoop272",
"genie.name:hadoop",
"ver:2.7.2"
],
"version": "2.7.2",
"user": "builds",
"name": "hadoop",
"description": "Hadoop Application",
"setupFile": "s3://bucket/builds/bdp-cluster-configs/genie3/applications/hadoop/2.7.2/setup.sh",
"configs": [],
"dependencies": [
"s3://bucket/hadoop/2.7.2/hadoop-2.7.2.tgz"
],
"status": "ACTIVE",
"type": "hadoop",
"_links": {
"self": {
"href": "https://genieHost/api/v3/applications/hadoop272"
},
"commands": {
"href": "https://genieHost/api/v3/applications/hadoop272/commands"
}
}
}
Hadoop 2.7.2 Setup File
#!/bin/bash
set -o errexit -o nounset -o pipefail
export JAVA_HOME=/apps/bdp-java/java-7-oracle
export APP_ID=hadoop272
export APP_NAME=hadoop-2.7.2
export HADOOP_DEPENDENCIES_DIR=$GENIE_APPLICATION_DIR/$APP_ID/dependencies
export HADOOP_HOME=$HADOOP_DEPENDENCIES_DIR/$APP_NAME
tar -xf "${HADOOP_DEPENDENCIES_DIR}/hadoop-2.7.2.tgz" -C "${HADOOP_DEPENDENCIES_DIR}"
export HADOOP_CONF_DIR="${HADOOP_HOME}/conf"
export HADOOP_LIBEXEC_DIR="${HADOOP_HOME}/usr/lib/hadoop/libexec"
export HADOOP_HEAPSIZE=1500
cp ${GENIE_CLUSTER_DIR}/config/* $HADOOP_CONF_DIR/
EXTRA_PROPS=$(echo "<property><name>genie.job.id</name><value>$GENIE_JOB_ID</value></property><property><name>genie.job.name</name><value>$GENIE_JOB_NAME</value></property><property><name>lipstick.uuid.prop.name</name><value>genie.job.id</value></property><property><name>dataoven.job.id</name><value>$GENIE_JOB_ID</value></property><property><name>genie.netflix.environment</name><value>${NETFLIX_ENVIRONMENT:-prod}</value></property><property><name>genie.version</name><value>$GENIE_VERSION</value></property><property><name>genie.netflix.stack</name><value>${NETFLIX_STACK:-none}</value></property>" | sed 's/\//\\\//g')
sed -i "/<\/configuration>/ s/.*/${EXTRA_PROPS}&/" $HADOOP_CONF_DIR/core-site.xml
if [ -d "/apps/s3mper/hlib" ]; then
export HADOOP_OPTS="-javaagent:/apps/s3mper/hlib/aspectjweaver-1.7.3.jar ${HADOOP_OPTS:-}"
fi
# Remove the zip to save space
rm "${HADOOP_DEPENDENCIES_DIR}/hadoop-2.7.2.tgz"
6.3.1.3.2. Presto 0.149
{
"id": "presto0149",
"created": "2016-08-08T23:21:58.780Z",
"updated": "2016-12-21T00:21:10.945Z",
"tags": [
"genie.id:presto0149",
"type:presto",
"genie.name:presto",
"ver:0.149"
],
"version": "0.149",
"user": "builds",
"name": "presto",
"description": "Presto Application",
"setupFile": "s3://bucket/builds/bdp-cluster-configs/genie3/applications/presto/0.149/setup.sh",
"configs": [],
"dependencies": [
"s3://bucket/presto/clients/0.149/presto-cli",
"s3://bucket/builds/bdp-cluster-configs/genie3/applications/presto/0.149/presto-wrapper.py"
],
"status": "ACTIVE",
"type": "presto",
"_links": {
"self": {
"href": "https://genieProd/api/v3/applications/presto0149"
},
"commands": {
"href": "https://genieProd/api/v3/applications/presto0149/commands"
}
}
}
Presto 0.149 Setup File
#!/bin/bash
set -o errexit -o nounset -o pipefail
chmod 755 ${GENIE_APPLICATION_DIR}/presto0149/dependencies/presto-cli
chmod 755 ${GENIE_APPLICATION_DIR}/presto0149/dependencies/presto-wrapper.py
export JAVA_HOME=/apps/bdp-java/java-8-oracle
export PATH=${JAVA_HOME}/bin/:$PATH
# Set the cli path for the commands to use when they invoke presto using this Application
export PRESTO_CLI_PATH="${GENIE_APPLICATION_DIR}/presto0149/dependencies/presto-cli"
6.3.1.3.3. Spark 1.6.1
{
"id": "spark161",
"created": "2016-05-17T16:32:21.475Z",
"updated": "2016-12-21T00:01:07.951Z",
"tags": [
"genie.id:spark161",
"type:spark",
"ver:1.6",
"ver:1.6.1",
"genie.name:spark"
],
"version": "1.6.1",
"user": "builds",
"name": "spark",
"description": "Spark Application",
"setupFile": "s3://bucket/builds/bdp-cluster-configs/genie3/applications/spark/1.6.1/scripts/spark-1.6.1-app.sh",
"configs": [
"s3://bucket/builds/bdp-cluster-configs/genie3/applications/spark/1.6.1/configs/spark-env.sh"
],
"dependencies": [
"s3://bucket/spark/1.6.1/spark-1.6.1.tgz"
],
"status": "ACTIVE",
"type": "spark",
"_links": {
"self": {
"href": "https://genieHost/api/v3/applications/spark161"
},
"commands": {
"href": "https://genieHost/api/v3/applications/spark161/commands"
}
}
}
Spark 1.6.1 Setup File
#!/bin/bash
set -o errexit -o nounset -o pipefail
VERSION="1.6.1"
DEPENDENCY_DOWNLOAD_DIR="${GENIE_APPLICATION_DIR}/spark161/dependencies"
# Unzip all the Spark jars
tar -xf ${DEPENDENCY_DOWNLOAD_DIR}/spark-${VERSION}.tgz -C ${DEPENDENCY_DOWNLOAD_DIR}
# Set the required environment variable.
export SPARK_HOME=${DEPENDENCY_DOWNLOAD_DIR}/spark-${VERSION}
export SPARK_CONF_DIR=${SPARK_HOME}/conf
export SPARK_LOG_DIR=${GENIE_JOB_DIR}
export SPARK_LOG_FILE=spark.log
export SPARK_LOG_FILE_PATH=${GENIE_JOB_DIR}/${SPARK_LOG_FILE}
export CURRENT_JOB_WORKING_DIR=${GENIE_JOB_DIR}
export CURRENT_JOB_TMP_DIR=${CURRENT_JOB_WORKING_DIR}/tmp
export JAVA_HOME=/apps/bdp-java/java-8-oracle
export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
# Make Sure Script is on the Path
export PATH=$PATH:${SPARK_HOME}/bin
# Delete the zip to save space
rm ${DEPENDENCY_DOWNLOAD_DIR}/spark-${VERSION}.tgz
Spark 1.6.1 Environment Variable File
#!/bin/bash
#set -o errexit -o nounset -o pipefail
export JAVA_HOME=/apps/bdp-java/java-8-oracle
export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
6.3.1.3.4. Spark 2.0.0
{
"id": "spark200",
"created": "2016-10-31T16:58:54.155Z",
"updated": "2016-12-21T00:01:11.105Z",
"tags": [
"type:spark",
"ver:2.0",
"ver:2.0.0",
"genie.id:spark200",
"genie.name:spark"
],
"version": "2.0.0",
"user": "builds",
"name": "spark",
"description": "Spark Application",
"setupFile": "s3://bucket/builds/bdp-cluster-configs/genie3/applications/spark/2.0.0/setup.sh",
"configs": [],
"dependencies": [
"s3://bucket/spark-builds/2.0.0/spark-2.0.0.tgz"
],
"status": "ACTIVE",
"type": "spark",
"_links": {
"self": {
"href": "https://genieHost/api/v3/applications/spark200"
},
"commands": {
"href": "https://genieHost/api/v3/applications/spark200/commands"
}
}
}
Spark 2.0.0 Setup File
#!/bin/bash
set -o errexit -o nounset -o pipefail
start_dir=`pwd`
cd `dirname ${BASH_SOURCE[0]}`
SPARK_BASE=`pwd`
cd $start_dir
export JAVA_HOME=/apps/bdp-java/java-8-oracle
export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
SPARK_DEPS=${SPARK_BASE}/dependencies
export SPARK_VERSION="2.0.0"
tar xzf ${SPARK_DEPS}/spark-${SPARK_VERSION}.tgz -C ${SPARK_DEPS}
# Set the required environment variable.
export SPARK_HOME=${SPARK_DEPS}/spark-${SPARK_VERSION}
export SPARK_CONF_DIR=${SPARK_HOME}/conf
export SPARK_LOG_DIR=${GENIE_JOB_DIR}
export SPARK_LOG_FILE=spark.log
export SPARK_LOG_FILE_PATH=${GENIE_JOB_DIR}/${SPARK_LOG_FILE}
export CURRENT_JOB_WORKING_DIR=${GENIE_JOB_DIR}
export CURRENT_JOB_TMP_DIR=${CURRENT_JOB_WORKING_DIR}/tmp
# Make Sure Script is on the Path
export PATH=$PATH:${SPARK_HOME}/bin
# Delete the tarball to save space
rm ${SPARK_DEPS}/spark-${SPARK_VERSION}.tgz
chmod a+x ${SPARK_HOME}/bin/dsespark-submit.py
6.3.1.4. Relationships
Now that all the resources are available they need to be linked together. Commands need to be added to the clusters they can be run on and applications need to be added as dependencies for commands.
6.3.1.4.1. Commands for a Cluster
When commands are added to a cluster they should be in priority order. Meaning if two commands both match a users tags for a job the one higher in the list will be used. This allows us to switch defaults quickly and transparently.
Note: The lists below leave out a lot of commands and fields for brevity. Only the id of the command is included so it can reference the same command resource defined earlier in this article.
Hadoop Prod Cluster
The Hadoop clusters have both currently supported Spark versions added. Spark 1.6.1 is the default but users can
override to Spark 2 using the ver
tag.
[
...
{
"id": "prodsparksubmit161"
...
},
{
"id": "prodsparksubmit200"
...
}
...
]
Hadoop Adhoc Cluster
[
...
{
"id": "prodsparksubmit161"
...
},
{
"id": "prodsparksubmit200"
...
}
...
]
Presto Prod Cluster
Presto clusters only really support the Presto command but possible that it could have multiple backwards compatible versions of the client available.
[
...
{
"id": "presto0149"
...
}
...
]
6.3.1.4.2. Applications for a Command
Linking applications to a command tells Genie that these applications need to be downloaded and setup in order to successfully run the command. The order of the applications will be the order the download and setup is performed so dependencies between applications should be managed via this order.
Presto 0.149
Presto only needs the corresponding Presto application which contains the Presto Java CLI jar and some setup wrapper scripts.
[
{
"id": "presto0149"
...
}
]
Spark Submit Prod 1.6.1
Since we submit Spark jobs to YARN clusters in order to run the Spark submit commands we need both Spark and Hadoop applications installed and configured on the job classpath in order to run. Hadoop needs to be setup first so that the configurations can be copied to Spark.
[
{
"id": "hadoop272"
...
},
{
"id": "spark161"
...
}
]
Spark Submit Prod 2.0.0
[
{
"id": "hadoop272"
...
},
{
"id": "spark200"
...
}
]
6.3.2. Job Submission
Everything is now in place for users to submit their jobs. This section will walk through the components and outputs of that process. For clarity we’re going to show a PySpark job being submitted to show how Genie figures out the cluster and command to be used based on what was configured above.
6.3.2.1. The Request
Below is an actual job request (with a few obfuscations) made by a production job here at Netflix to Genie.
{
"id": "SP.CS.FCT_TICKET_0054500815", (1)
"created": "2016-12-21T04:13:07.244Z",
"updated": "2016-12-21T04:13:07.244Z",
"tags": [ (2)
"submitted.by:call_genie",
"scheduler.job_name:SP.CS.FCT_TICKET",
"scheduler.run_id:0054500815",
"SparkPythonJob",
"scheduler.name:uc4"
],
"version": "NA",
"user": "someNetflixEmployee",
"name": "SP.CS.FCT_TICKET",
"description": "{\"username\": \"root\", \"host\": \"2d35f0d397fd\", \"client\": \"nflx-kragle-djinn/0.4.3\", \"kragle_version\": \"0.41.11\", \"job_class\": \"SparkPythonJob\"}",
"setupFile": null,
"commandArgs": "--queue root.sla --py-files dea_pyspark_core-latest.egg fct_ticket.py", (3)
"clusterCriterias": [ (4)
{
"tags": [
"sched:sla"
]
}
],
"commandCriteria": [ (5)
"type:sparksubmit",
"data:prod"
],
"group": null,
"disableLogArchival": false,
"email": null,
"cpu": null,
"memory": null,
"timeout": null,
"configs": [],
"dependencies": [ (6)
"s3://bucket/DSE/etl_code/cs/ticket/fct_ticket.py",
"s3://bucket/dea/pyspark_core/dea_pyspark_core-latest.egg"
],
"applications": [],
"_links": {
"self": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/request"
},
"job": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815"
},
"execution": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/execution"
},
"output": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/output"
},
"status": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/status"
}
}
}
Lets look at a few of the fields of note:
1 | The user set the ID. This is a popular pattern in Netflix for tracking jobs between systems and reattaching to jobs. |
2 | The user added a few tags that will allow them to search for the job later. This is optional but convenient. |
3 | The user specifies some arguments to add to the default set of command arguments specified by the command
executable field. In this case it’s what python file to run. |
4 | The user wants this job to run on any cluster that is labeled as having an SLA which also supports the command
selected using the commandCriteria |
5 | User wants the default Spark Submit command (no version specified) and wants to be able to access production data |
6 | Here you can see that they add the two files referenced in the commandArgs as dependencies. These files will be
downloaded in the root job directory parallel to the run script so they are accessible. |
6.3.2.2. The Job
In this case the job was accepted by Genie for processing. Below is the actual job object containing fields the user might care about. Some are copied from the initial request (like tags) and some are added by Genie.
{
"id": "SP.CS.FCT_TICKET_0054500815",
"created": "2016-12-21T04:13:07.245Z",
"updated": "2016-12-21T04:20:35.801Z",
"tags": [
"submitted.by:call_genie",
"scheduler.job_name:SP.CS.FCT_TICKET",
"scheduler.run_id:0054500815",
"SparkPythonJob",
"scheduler.name:uc4"
],
"version": "NA",
"user": "someNetflixEmployee",
"name": "SP.CS.FCT_TICKET",
"description": "{\"username\": \"root\", \"host\": \"2d35f0d397fd\", \"client\": \"nflx-kragle-djinn/0.4.3\", \"kragle_version\": \"0.41.11\", \"job_class\": \"SparkPythonJob\"}",
"status": "SUCCEEDED", (1)
"statusMsg": "Job finished successfully.", (2)
"started": "2016-12-21T04:13:09.025Z", (3)
"finished": "2016-12-21T04:20:35.794Z", (4)
"archiveLocation": "s3://bucket/genie/main/logs/SP.CS.FCT_TICKET_0054500815.tar.gz", (5)
"clusterName": "h2prod", (6)
"commandName": "prodsparksubmit", (7)
"runtime": "PT7M26.769S", (8)
"commandArgs": "--queue root.sla --py-files dea_pyspark_core-latest.egg fct_ticket.py",
"_links": {
"self": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815"
},
"output": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/output"
},
"request": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/request"
},
"execution": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/execution"
},
"status": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/status"
},
"cluster": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/cluster"
},
"command": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/command"
},
"applications": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/applications"
}
}
}
Some fields of note:
1 | The current status of the job. Since this sample was taken after the job was completed it’s already marked SUCCESSFUL |
2 | This job was successful but if it failed for some reason a more human readable reason would be found here |
3 | The time this job was forked from the Genie process |
4 | The time Genie recognized the job as complete |
5 | Where Genie uploaded a zip of the job directory after the job was completed |
6 | The name of the cluster where this job ran and is de-normalized from the cluster record at the time |
7 | The name of the command used to run this job which is de-normalized from the command record at the time |
8 | The total run time in ISO8601 |
6.3.2.2.1. Cluster Selection
Because the user submitted with sched:sla
this limits the clusters it can run on to any with that tag applied. In our
example case only the cluster with ID bdp_h2prod_20161217_205111
has this tag. This isn’t enough to make sure this
job can run (there also needs to be a matching command). If there had been multiple sla clusters Genie would consider
them all equal and randomly select one.
6.3.2.2.2. Command Selection
The command criteria states that this job needs to run on a SLA cluster that supports a command of type
prodsparksubmit
that can access prod
data. Two commands (prodsparksubmit161
and prodsparksubmit200
) match this
criteria. Both are linked to the cluster bdp_h2prod_20161217_205111
. Since both match Genie selects the "default" one
which is the first on in the list. In this case it was prodsparksubmit161
.
6.3.2.3. The Job Execution
Below is the job execution resource. This is mainly for system and admin use but it can have some useful information for users as well. Mainly it shows which Genie node it actually ran on, how much memory it was allocated, how frequently the system polled it for status and when it would have timed out had it kept running.
{
"id": "SP.CS.FCT_TICKET_0054500815",
"created": "2016-12-21T04:13:07.245Z",
"updated": "2016-12-21T04:20:35.801Z",
"hostName": "a.host.com",
"processId": 68937,
"checkDelay": 5000,
"timeout": "2016-12-28T04:13:09.016Z",
"exitCode": 0,
"memory": 1536,
"_links": {
"self": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/execution"
},
"job": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815"
},
"request": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/request"
},
"output": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/output"
},
"status": {
"href": "https://genieHost/api/v3/jobs/SP.CS.FCT_TICKET_0054500815/status"
}
}
}
6.3.2.4. Job Output
Below is an image of the root of the job output directory (displayed via Genie UI) for the above job. Note that the dependency files are all downloaded there and some standard files are available (run, stdout, stderr).
The URI’s in this section point to the UI output endpoint however they are also available via the REST API and the UI is really calling this REST API to get the necessary information. Showing the UI endpoints for the better looking output and because most users will see this version. |
Click image for full size |
6.3.2.4.1. The Run Script
Clicking into the run script shows the below contents. This run script is generated specifically for each individual
job by Genie. It has some standard bits (error checking, exit process) but also specific information like environment
variables and what to actually run. Everything is specific to the job working directory. In particular note all the
GENIE_*
environment variable exports. These can be used when building your setup and configuration scripts to be more
flexible.
#!/usr/bin/env bash
set -o nounset -o pipefail
# Set function in case any of the exports or source commands cause an error
trap "handle_failure" ERR EXIT
function handle_failure {
ERROR_CODE=$?
# Good exit
if [[ ${ERROR_CODE} -eq 0 ]]; then
exit 0
fi
# Bad exit
printf '{"exitCode": "%s"}\n' "${ERROR_CODE}" > ./genie/genie.done
exit "${ERROR_CODE}"
}
# Set function for handling kill signal from the job kill service
trap "handle_kill_request" SIGTERM
function handle_kill_request {
KILL_EXIT_CODE=999
# Disable SIGTERM signal for the script itself
trap "" SIGTERM
echo "Kill signal received"
### Write the kill exit code to genie.done file as exit code before doing anything else
echo "Generate done file with exit code ${KILL_EXIT_CODE}"
printf '{"exitCode": "%s"}\n' "${KILL_EXIT_CODE}" > ./genie/genie.done
### Send a kill signal the entire process group
echo "Sending a kill signal to the process group"
pkill -g $$
COUNTER=0
NUM_CHILD_PROCESSES=`pgrep -g ${SELF_PID} | wc -w`
# Waiting for 30 seconds for the child processes to die
while [[ $COUNTER -lt 30 ]] && [[ "$NUM_CHILD_PROCESSES" -gt 3 ]]; do
echo The counter is $COUNTER
let COUNTER=COUNTER+1
echo "Sleeping now for 1 seconds"
sleep 1
NUM_CHILD_PROCESSES=`pgrep -g ${SELF_PID} | wc -w`
done
# check if any children are still running. If not just exit.
if [ "$NUM_CHILD_PROCESSES" -eq 3 ]
then
echo "Done"
exit
fi
### Reaching at this point means the children did not die. If so send kill -9 to the entire process group
# this is a hard kill and will this process itself as well
echo "Sending a kill -9 to children"
pkill -9 -g $$
echo "Done"
}
SELF_PID=$$
echo Start: `date '+%Y-%m-%d %H:%M:%S'`
export GENIE_JOB_DIR="/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815"
export GENIE_APPLICATION_DIR="${GENIE_JOB_DIR}/genie/applications"
export GENIE_COMMAND_DIR="${GENIE_JOB_DIR}/genie/command/prodsparksubmit161"
export GENIE_COMMAND_ID="prodsparksubmit161"
export GENIE_COMMAND_NAME="prodsparksubmit"
export GENIE_CLUSTER_DIR="${GENIE_JOB_DIR}/genie/cluster/bdp_h2prod_20161217_205111"
export GENIE_CLUSTER_ID="bdp_h2prod_20161217_205111"
export GENIE_CLUSTER_NAME="h2prod"
export GENIE_JOB_ID="SP.CS.FCT_TICKET_0054500815"
export GENIE_JOB_NAME="SP.CS.FCT_TICKET"
export GENIE_JOB_MEMORY=1536
export GENIE_VERSION=3
# Sourcing setup file from Application: hadoop272
source ${GENIE_JOB_DIR}/genie/applications/hadoop272/setup.sh
# Sourcing setup file from Application: spark161
source ${GENIE_JOB_DIR}/genie/applications/spark161/spark-1.6.1-app.sh
# Sourcing setup file from Command: prodsparksubmit161
source ${GENIE_JOB_DIR}/genie/command/prodsparksubmit161/spark-1.6.1-prod-submit-cmd.sh
# Dump the environment to a env.log file
env | sort > ${GENIE_JOB_DIR}/genie/logs/env.log
# Kick off the command in background mode and wait for it using its pid
${SPARK_HOME}/bin/dsespark-submit --queue root.sla --py-files dea_pyspark_core-latest.egg fct_ticket.py > stdout 2> stderr &
wait $!
# Write the return code from the command in the done file.
printf '{"exitCode": "%s"}\n' "$?" > ./genie/genie.done
echo End: `date '+%Y-%m-%d %H:%M:%S'`
6.3.2.4.2. Genie Dir
Inside the output directory there is a genie
directory. This directory is where Genie stores all the downloaded
dependencies and any logs. Everything outside this directory is intended to be user generated other than the run
script. Some commands or applications may put their logs in the root directory as well if desired (like spark or hive
logs).
Click image for full size |
Genie system logs go into the logs directory.
Click image for full size |
Of interest in here is the env dump file. This is convenient for debugging jobs. You can see all the environment variables that were available right before Genie executed the final command to run the job in the run script.
You can see this file generated in the run script above on this line:
# Dump the environment to a env.log file
env | sort > ${GENIE_JOB_DIR}/genie/logs/env.log
The contents of this file will look something like the below
APP_ID=hadoop272
APP_NAME=hadoop-2.7.2
CURRENT_JOB_TMP_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/tmp
CURRENT_JOB_WORKING_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815
EC2_AVAILABILITY_ZONE=us-east-1d
EC2_REGION=us-east-1
GENIE_APPLICATION_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications
GENIE_CLUSTER_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/cluster/bdp_h2prod_20161217_205111
GENIE_CLUSTER_ID=bdp_h2prod_20161217_205111
GENIE_CLUSTER_NAME=h2prod
GENIE_COMMAND_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/command/prodsparksubmit161
GENIE_COMMAND_ID=prodsparksubmit161
GENIE_COMMAND_NAME=prodsparksubmit
GENIE_JOB_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815
GENIE_JOB_ID=SP.CS.FCT_TICKET_0054500815
GENIE_JOB_MEMORY=1536
GENIE_JOB_NAME=SP.CS.FCT_TICKET
GENIE_VERSION=3
HADOOP_CONF_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications/hadoop272/dependencies/hadoop-2.7.2/conf
HADOOP_DEPENDENCIES_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications/hadoop272/dependencies
HADOOP_HEAPSIZE=1500
HADOOP_HOME=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications/hadoop272/dependencies/hadoop-2.7.2
HADOOP_LIBEXEC_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications/hadoop272/dependencies/hadoop-2.7.2/usr/lib/hadoop/libexec
HOME=/home/someNetflixUser
JAVA_HOME=/apps/bdp-java/java-8-oracle
LANG=en_US.UTF-8
LOGNAME=someNetflixUser
MAIL=/var/mail/someNetflixUser
NETFLIX_ENVIRONMENT=prod
NETFLIX_STACK=main
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications/spark161/dependencies/spark-1.6.1/bin
PWD=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815
SHELL=/bin/bash
SHLVL=1
SPARK_CONF_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications/spark161/dependencies/spark-1.6.1/conf
SPARK_DAEMON_JAVA_OPTS=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
SPARK_HOME=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/genie/applications/spark161/dependencies/spark-1.6.1
SPARK_LOG_DIR=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815
SPARK_LOG_FILE_PATH=/mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/spark.log
SPARK_LOG_FILE=spark.log
SUDO_COMMAND=/usr/bin/setsid /mnt/genie/jobs/SP.CS.FCT_TICKET_0054500815/run
SUDO_GID=60243
SUDO_UID=60004
SUDO_USER=genie
TERM=unknown
TZ=GMT
USER=someNetflixUser
USERNAME=someNetflixUser
_=/usr/bin/env
Finally inside the applications folder you can see the applications that were downloaded and configured.
Click image for full size |
Click image for full size |
6.3.3. Wrap Up
This section went over how Genie is configured by admins at Netflix and how how users submit jobs and retrieve their logs and output. Anyone is free to configure Genie however suits their needs in terms of tags and applications which are downloaded vs installed already on a Genie node but this method works for us here at Netflix.
6.4. Netflix Deployment
Many people ask how Genie is deployed at Netflix on AWS. This section tries to explain at a high level the components used and how Genie integrates into the environment. Below is a diagram of how deployment looks at Netflix.
6.4.1. Components
Brief descriptions of all the components.
6.4.1.1. Elastic Load Balancer
The Elastic Load Balancer (ELB) is used for a few purposes.
-
Allow a single endpoint for all API calls
-
Distribute API calls amongst all Genie nodes in an ASG
-
Allow HTTPS termination at single point
-
Allow human friendly DNS name to be assigned via Route 53 entry
6.4.1.2. Auto Scaling Group (ASG)
Currently the Genie ASG is a fleet of i2.4xl instances. The primary production ASG sizes about thirty instances at any given time. Each Genie instance is configured to be allocated 80% of the available system memory for jobs. Tomcat itself is given 10 GB. Leaving the rest for the system and other processes.
The ASG is set to auto scale when the average amount of used job memory, that 80% of the system memory, exceeds 60% of the available.
For example an i2.4xl image has 122 GB of available memory. For simplicity we allocate 100 GB for jobs. If the average memory used for jobs per node across the ASG exceeds 60 GB for some period of time we will scale the cluster up by one node to allocate resources before we get in trouble.
Currently we don’t auto scale down but from time to time we take a look to see if a new ASG needs to be launched at a smaller size.
6.4.1.3. Relational Database (RDS)
We currently use an Amazon Aurora cluster on db.r3.4xl instances. Aurora is MySQL compatible so we use the standard MySQL JDBC driver that is packaged with Genie to talk to the database. We deploy to a Multi-AZ cluster and we have a reader endpoint that we use for reporting and backup.
6.4.1.4. Zookeeper
We use an Apache Zookeeper cluster which is deployed and managed by another team within Netflix for leadership election within our Genie ASG. When the Genie ASG comes up it (using Spring Cloud Cluster) looks in Zookeeper to see if there already is a leader for the app/cluster/stack combination. If there isn’t it elects a new one.
The leader is not involved in serving incoming requests, it performs background cleanup task for the entire cluster.
6.4.1.5. ElastiCache
We use AWS ElastiCache to provide a Redis cluster to store our HTTP sessions (via Spring Session). This allows us to have the users only sign in via SAML one time and not have to do it every time the ELB routes them to a new host for the UI.
6.4.1.6. Security Configuration
Internally Genie is secured via OAuth2 (for APIs) and SAML (for UI). We integrate with a Ping Federate IDP service to provide authentication and authorization information.
HTTPS is enabled to the ELB via a Verisign signed certificate tied to the Route 53 DNS address.
See Security for more information.
6.4.1.7. Spinnaker
Genie is deployed using Spinnaker. We currently have a few stacks (prod, test, dev, load, stage, etc) that we use for different purposes. Spinnaker handles all this for us after we configure the pipelines. See the Spinnaker site for more information.
6.4.2. Wrap Up
This section focused on how Genie is deployed within Netflix. Hopefully it helps bring clarity to a way that Genie can be deployed. Genie certainly is not limited to this deployment model and you are free to come up with your own this should only serve as a template or example.
6.5. Security
In version Genie 3.x there existed a security module built on top of Spring Security. This module contained a lot of custom code to make security work in the Netflix security environment at the time. It attempted to abstract this into reusable components but still was pretty specific.
Spring at Netflix has
evolved since that time into the primary Java runtime platform. As part of this process our core runtime teams have
produced security modules built on Spring Security which provide paved path integration with Netflix security
security mechanism. Genie has been using this paved path since the 4.x line has started releasing candidates. As such
we’ve made the decision to remove the genie-security
module as it is a burden to support and likely isn’t very
useful out of the box for most use cases. The recommendation if security is desired is to take the genie-web
project
and build it into your own project in conjunction with some Spring Security implementations that work for your
environment. This is how we ship Genie internally. If you want to reference how this might look have a look
at the 4.0.0-rc.31 code which is the last candidate
release to contain our old genie-security
code.
7. Features
This section has documentation pertaining to specific features of Genie that may need specific examples.
7.1. Cluster Selection
Genie allows administrators to tag clusters with multiple tags that could reflect any type of domain modeling the admins wish to accomplish. Purpose of the cluster, types of data the cluster can access, workload expected are just some examples of ways tags could be used to classify clusters. Users submit jobs with a series of tags that help Genie identify which cluster to run a job on. Sometimes the tags submitted by the job match multiple clusters. At this point Genie needs a mechanism to chose a final runtime target from the set of clusters selected. This is where the cluster selection feature of Genie comes into play.
Genie has an interface ClusterSelector
which can be implemented to provide plugin functionality of algorithms for
cluster selection. The interface has a single method:
/**
* Return best cluster to run job on.
*
* @param clusters An immutable, non-empty list of available clusters to choose from
* @param jobRequest The job request these clusters are being considered for
* @return the "best" cluster to run job on or null if no cluster selected
* @throws GenieException if there is any error
*/
@Nullable
Cluster selectCluster(final List<Cluster> clusters, final JobRequest jobRequest) throws GenieException;
At startup Genie will collect all beans that implement the ClusterSelector
interface and based on their order
store them in an invocation order list. Meaning that when multiple clusters are selected from the database based on tags
Genie will send the list of clusters to the implementations in preference order until one of them returns the cluster to
use.
Genie currently ships with two implementations of this interface which are described below.
7.1.1. RandomizedClusterSelectorImpl
As the name indicates this selector simply uses a random number generator to select a cluster from the list by index. There is no intelligence to this selection algorithm but it does provide very close to an equal distribution between clusters if the same tags are always used.
This implementation is the "default" implementation. It has the lowest priority order so if all other active implementations fail to select a cluster this implementation will be invoked and chose randomly.
7.1.2. ScriptClusterSelector
This implementation, first introduced in 3.1.0, allows administrators to provide a script to be invoked at runtime to
decide which cluster to select. Currently JavaScript
and Groovy
are supported out of the box but others (like
Python
, Ruby
, etc) could be supported by adding their implementations of the
ScriptEngine interface to the Genie classpath.
7.1.2.1. Configuration
The script selector is disabled by default. To enable it an admin must set the property
genie.scripts.cluster-selector.source
to a valid script URI (e.g.: file:///myscript.js
,
classpath:org/my/package/myscript.groovy
, s3://my-app/scripts/myscript.js
).
It is also recommended to set genie.scripts.cluster-selector.auto-load-enabled
to true
to enable eager loading
of the script (default behavior is to load and compile lazily).
Other properties:
Property | Description | Default Value |
---|---|---|
genie.scripts.cluster-selector.source |
URI of the script to load. |
null |
genie.scripts.cluster-selector.auto-load-enabled |
If true, the script eagerly load during startup, as opposed to lazily load on first use. |
false |
genie.scripts.cluster-selector.timeout |
Maximum script execution time (in milliseconds). After this time has elapsed, evaluation is shut down. |
5000 |
See also genie.scripts-manager.*
properties, which affect this component.
7.1.2.2. Script Contract
The contract between the script and the Java code is as follows:
Parameter | Description |
---|---|
clusters |
Non-empty JSON array of cluster objects (serialized as a string) of the clusters to be evaluated |
jobRequest |
JSON object (serialized as a string) of the job request that kicked off this evaluation |
Result | Description |
---|---|
A string |
The id of the cluster selected by the script algorithm that should be used to run the job |
null |
No cluster was selected and the evaluation should fall back to another selector algorithm |
For most of the script engines the last statement will be the return value.
7.1.2.3. Script Examples
Some simple script examples
7.1.2.3.1. Javascript
var cJson = JSON.parse(clusters);
var jJson = JSON.parse(jobRequest);
var index;
for (index = 0; index < cJson.length; index++) {
var cluster = cJson[index];
if (cluster.user === "h") {
break;
}
}
index < cJson.length ? cJson[index].id : null;
7.1.2.3.2. Groovy
import groovy.json.JsonSlurper
def jsonSlurper = new JsonSlurper()
def cJson = jsonSlurper.parseText(clusters)
def jJson = jsonSlurper.parseText(jobRequest)
def index = cJson.findIndexOf {
cluster -> cluster.user == "h"
}
index == -1 ? null : cJson[index].id
7.1.2.4. Caveats
The script selector provides great flexibility for system administrators to test algorithms for cluster load
balancing at runtime. Since the script is refreshed periodically it can even be changed after Genie is running. With
this flexibility comes the trade off that script evaluation is going to be slower than code running direct JVM byte
code. The selector tries to offset this by compiling and caching the script code in between refresh invocations. It
is recommended that once an algorithm is well tested it be converted to a true implementation of the
ClusterSelector
interface if performance is desired.
Additionally if a script error is made the ScriptClusterSelector
will swallow the exceptions and simply return null
from all calls to selectCluster
until the script is fixed and refresh
is invoked again. The metric
genie.jobs.clusters.selectors.script.select.timer
with tag status
and value failed
can be used to monitor
this situation.
Two more metrics are relevant in this context.
genie.scripts.load.timer
timer for scripts loading (and reloading) can also be used to monitor unavailable resources,
compilation errors, etc.
genie.scripts.evaluate.timer
timer for script evaluation, can also be used to monitor evaluation errors, timeouts, etc.
Both metrics are tagged with scriptUri
in case multiple scripts are loaded.
7.1.3. Wrap Up
This section went over the cluster selection feature of Genie. This interface provides an extension point for administrators of Genie to tweak Genie’s runtime behavior to suit their needs.
8. Installation
Installing Genie is easy. You can run Genie either as a standalone application with an embedded Tomcat or by deploying the WAR file to an existing Tomcat or other servlet container. There are trade-offs to these two methods as will be discussed below.
8.1. Standalone Jar
The
is the simplest to deploy as it has no other real moving parts. Just
put the jar somewhere on a system and execute java -jar genie-app-4.0.0-rc.70.jar
. The downside is it’s a little
harder to configure or add jars to the classpath if you want them.
Configuration (application*.yml or application*.properties) files can be loaded from the current working directory or
from a .genie/
directory stored in the users home directory (e.g. ~/.genie/application.yml
). Classpath items (jars,
.jks files, etc) can be added to ~/.genie/lib/
and they will be part of the application classpath.
Properties can be passed in on the command line two ways:
-
java -Dgenie.example.property blah -jar genie-app-4.0.0-rc.70.jar
-
java -jar genie-app-4.0.0-rc.70.jar --genie.example.property=blah
Property resolution goes in this order:
-
Command line
-
Classpath profile specific configuration files (e.g. application-prod.yml)
-
Embedded profile specific configuration files
-
Classpath default configuration file (e.g. application.yml)
-
Embedded default configuration file
For more details see the Spring Boot documentation on external configuration.
8.2. Servlet Container Deployment
If you want to deploy to an existing Servlet container deployment you can re-package the genie-web
jar inside a WAR.
You can see the Spring Boot
docs for an example of how to do this.
8.3. Configuration
Genie has a lot of available configuration options. For descriptions of specific properties you can see the Properties section below. Additionally if you want to know how to configure more parts of the application you should have a look at the Spring Boot docs as they will go in depth on how to configure the various Spring components used in Genie.
8.3.1. Profiles
Spring provides a mechanism of segregating parts of application configuration and activating them in certain
conditions. This mechanism is known as
profiles. By
default Genie will run with the dev
profile activated. This means that all the properties in application-dev.yml
will be appended to, or overwrite, the properties in application.yml
which are the defaults. Changing the active
profiles is easy you just need to change the property spring.profiles.active
and pass in a comma separated list of
active profiles. For example --spring.profiles.active=prod,cloud
would activate the prod and cloud profiles.
Properties for specific profiles should be stored in files named application-{profileName}.yml
. You can make as many
as you want but Genie ships with dev
, s3
and prod
profiles properties already included. Their properties can be
seen in the Properties section below.
8.3.2. Database
By default since Genie will launch with the dev
profile active it will launch with an in memory database running as
part of its process. This means when you shut Genie down all data will be lost. It is meant for development only. Genie
ships with JDBC drivers for MySql
, PostgreSQL
and H2
. If you want to use a different database you should load
the JDBC driver jar file somewhere on the Genie classpath.
For production you should probably enable the prod
profile which creates a connection pool for the database and then
override the properties spring.datasource.url
, spring.datasource.username
and spring.datasource.password
to match
your environment. The datasource url needs to be a valid JDBC connection string for your database. You can see examples
here or
here or search for your database
and JDBC connection string on your search engine of choice.
Genie also ships with database schema scripts for MySQL and PostgreSQL. You will need to load these into your database before you run Genie if you use one of these databases. Genie no longer creates the schema dynamically for performance reasons. Follow the below sections to load the schemas into your table.
Genie 3.2.0+ software is not compatible with previous database schema. Before upgrading existing Genie servers to 3.2.0, follow the steps below to perform database upgrade, or create a new database with 3.1.x schema. Database upgrades beyond 3.2.0 are handled automatically by the Genie binary via Flyway. |
8.3.2.1. MySQL
This assumes the MySQL client binaries are installed |
Genie requires MySQL 5.6.3+ due to certain properties not existing before that version |
Ensure the following properties are set in your my.cnf
:
[mysqld]
innodb_file_per_table=ON
innodb_large_prefix=ON
innodb_file_format=barracuda
Restart MySQL if you’ve changed these properties |
Run:
mysql -u {username} -p{password} -h {host} -e 'create database genie;'
8.3.2.1.1. 3.1.x to 3.2.0 database upgrade
Genie requires MySQL 5.6.3+ due to certain properties not existing before that version |
If you have an existing Genie installation on a database version < 3.2.0 you’ll need to upgrade your schema to 3.2.0 before continuing.
Download the:
Ensure the following properties are set in your my.cnf
:
[mysqld]
innodb_file_per_table=ON
innodb_large_prefix=ON
innodb_file_format=barracuda
Restart MySQL if you’ve changed these properties |
Then run:
mysql -u {username} -p{password} -h {host} genie < upgrade-3.1.x-to-3.2.0.mysql.sql
8.3.2.2. PostgreSQL
This assumes the PSQL binaries are installed |
Run:
createdb genie
8.3.2.2.1. 3.1.x to 3.2.0 database upgrade
If you have an existing Genie installation on a database version < 3.2.0 you’ll need to upgrade your schema to 3.2.0 before continuing.
Download the
Then run:
psql -U {user} -h {host} -d genie -f upgrade-3.1.x-to-3.2.0.postgresql.sql
8.3.3. Local Directories
Genie requires a few directories to run. By default Genie will place them under /tmp
however in production you should
probably create a larger directory you can store the job working directories and other places in. These correspond to
the genie.jobs.locations.*
properties described below in the Properties section.
8.3.3.1. S3
If your commands, applications, or jobs depend on artifacts referenced via S3 URI, you will need to configure the S3 subsystem. If you’re not assuming a role there is nothing you necessarily have to do provided a default credentials provider chain can be created. See here for the rules for that.
If you need to assume a order to access Amazon resources from your Genie node set the property
genie.aws.credentials.role
to the ARN of the role you’d like to assume. This will force Genie to create a
STSAssumeRoleSessionCredentialsProvider
instead of the default one.
Example role setting:
genie:
aws:
credentials:
role: <AWS ROLE ARN>
8.4. Wrap Up
This section contains the basic setup instructions for Genie. There are other components that can be added to the system like Redis, Zookeeper and Security systems that are somewhat outside the scope of an initial setup. You can see the Properties section below for the properties you’d need to configure for these systems.
10. Genie Web
This section describes the various properties that can be set to control the behavior of your Genie node and cluster. For more information on Spring properties you should see the Spring Boot reference documentation and the Spring Cloud documentation. The Spring properties described here are ones that we have overridden from Spring defaults.
10.1. Default Properties
10.1.1. Genie Properties
Properties marked 'dynamic' reflect change of property value in the environment happening at runtime. Whereas static properties values are bound during application startup and do not change after the application is up and running.
Property | Description | Default Value | Dynamic |
---|---|---|---|
genie.agent.configuration.agent-properties-filter-pattern |
Regular expression applied to filter server properties that are forwarded to the agent |
^genie\.agent\.runtime\..* |
no |
genie.agent.configuration.cache-expiration-interval |
Interval after which the agent properties cache is considered stale and re-calculated |
1m |
no |
genie.agent.configuration.cache-refresh-interval |
Interval for after which the agent properties cache is forcefully refreshed in case of no access |
5m |
no |
genie.agent.filestream.max-concurrent-transfers |
Maximum number of concurrent file transfers that a server allows |
100 |
no |
genie.agent.filestream.unclaimed-stream-start-timeout |
Interval after which a transfer stream is shut down if it didn’t send the first chunk of data |
10s |
no |
genie.agent.filestream.stalled-transfer-timeout |
Interval after which a transfer stream is shut down if it didn’t send any more data |
20s |
no |
genie.agent.filestream.stalled-transfer-check-interval |
Interval for checking on stalled downloads |
5s |
no |
genie.agent.filestream.write-retry-delay |
Interval between attempts to write data into a stream buffer |
300ms |
no |
genie.agent.filter.enabled |
If set to |
no |
|
genie.agent.filter.version.blacklist |
A regex matched against agent version (e.g., |
yes |
|
genie.agent.filter.version.minimum |
The minimum version an agent needs to be (e.g., |
yes |
|
genie.agent.filter.version.whitelist |
A regex matched against agent version (e.g., |
yes |
|
genie.agent.heart-beat.send-interval |
Interval for sending heartbeats to all connected clients. |
5s |
no |
genie.agent.launcher.local.additional-environment |
Environment variables to set when spawning an agent (in addition to the inherited server environment) |
no |
|
genie.agent.launcher.local.agent-jar-path |
The location of the agent jar. The value is substituted in the command template if the corresponding placeholder is present. |
/tmp/genie-agent.jar |
no |
genie.agent.launcher.local.host-info-expire-after |
How long after the job information for this host is written into a local cache is it evicted. See Spring Docs for Duration conversion details. |
1m |
no |
genie.agent.launcher.local.host-info-refresh-after |
How long after the job information for this host is written should it be automatically refreshed from the underlying data source. See Spring Docs for Duration conversion details. |
30s |
no |
genie.agent.launcher.local.launch-command-template |
The system command that the launcher should use to launch an agent process. Ordered list of arguments. Contains placeholders that will be replaced at runtime. |
java -jar <AGENT_JAR_PLACEHOLDER> exec --server-host 127.0.0.1 --server-port <SERVER_PORT_PLACEHOLDER> --api-job --job-id <JOB_ID_PLACEHOLDER> |
no |
genie.agent.launcher.local.max-job-memory |
The maximum amount of memory, in megabytes, that a job can be allocated while using the local launcher |
10240 |
no |
genie.agent.launcher.local.max-total-job-memory |
The total number of MB out of the system memory that Genie can use for running agents |
30720 |
no |
genie.agent.launcher.local.process-output-capture-enabled |
Whether to capture stdout and stderr from the forked agent subprocess to a file for debugging purposes |
false |
no |
genie.agent.launcher.local.run-as-user-enabled |
Whether to launch the agent subprocess as the user specified in the job request |
false |
no |
genie.agent.runtime.* |
Properties with this prefix are forwarded to each agent during startup |
yes |
|
genie.aws.credentials.role |
The AWS role ARN to assume when connecting to S3. If this is set Genie will create a credentials provider that will attempt to assume this role on the host Genie is running on |
no |
|
genie.aws.s3.buckets.[bucketName].roleARN |
For the bucket with name |
no |
|
genie.aws.s3.buckets.[bucketName].region |
The AWS region the bucket with |
no |
|
genie.file.cache.location |
Where to store cached files on local disk |
no |
|
genie.grpc.server.services.job-file-sync.ackIntervalMilliseconds |
How many milliseconds to wait between checks whether some acknowledgement should be sent to the agent regardless of
whether the |
30,000 |
no |
genie.grpc.server.services.job-file-sync.maxSyncMessages |
How many messages to receive from the agent before an acknowledgement message is sent back from the server |
10 |
no |
genie.health.maxCpuLoadConsecutiveOccurrences |
Defines the threshold of consecutive occurrences of CPU load crossing the <maxCpuLoadPercent>. Health of the system is marked unhealthy if the CPU load of a system goes beyond the threshold 'maxCpuLoadPercent' for 'maxCpuLoadConsecutiveOccurrences' consecutive times. |
3 |
no |
genie.health.maxCpuLoadPercent |
Defines the threshold for the maximum CPU load percentage to consider for an instance to be unhealthy. Health of the system is marked unhealthy if the CPU load of a system goes beyond this threshold for 'maxCpuLoadConsecutiveOccurrences' consecutive times. |
80 |
no |
genie.http.connect.timeout |
The number of milliseconds before HTTP calls between Genie nodes should time out on connection |
2000 |
no |
genie.http.read.timeout |
The number of milliseconds before HTTP calls between Genie nodes should time out on attempting to read data |
10000 |
no |
genie.jobs.active-limit.count |
The maximum number of active jobs a user is allowed to have. Once a user hits this limit, jobs submitted are rejected. This is property is ignored unless |
100 |
no |
genie.jobs.active-limit.enabled |
Enables the per-user active job limit. The number of jobs is controlled by the |
false |
no |
genie.jobs.active-limit.overrides.<user-name> |
The maximum number of active jobs that user 'user-name' is allowed to have. This is property is ignored unless |
- |
yes |
genie.jobs.agent-execution.agent-probability |
Likelihood (0 ⇐ x ⇐ 1.0) that an incoming job is randomly selected to execute with agent, rather than the regular V3 execution codepath |
null |
yes |
genie.jobs.agent-execution.force-agent |
If true, force all jobs to execute in agent mode |
null |
yes |
genie.jobs.agent-execution.force-embedded |
If true, force all jobs to execute in embedded mode |
null |
yes |
genie.jobs.cleanup.deleteDependencies |
Whether or not to delete the dependencies directories for applications, cluster, command to save disk space after job completion |
true |
no |
genie.jobs.completion-check-back-off.factor |
Multiplication factor that grows the delay between checks for job completions. Must be greater than 1. |
1.2 |
no |
genie.jobs.completion-check-back-off.max-interval |
The maximum time between checks for job completion in milliseconds. This is a fallback value, the value used in most
cases is specified as part of the |
10000 |
no |
genie.jobs.completion-check-back-off.min-interval |
The minimum time between checks for job completion in milliseconds. Must be greater than zero. |
100 |
no |
genie.jobs.files.filter.case-sensitive-matching |
Wether the regular expressions defined in |
true |
no |
genie.jobs.files.filter.directory-traversal-reject-patterns |
List of regex patterns, if a directory matches any, then its contents are not included in the job files manifest |
[] |
no |
genie.jobs.files.filter.directory-reject-patterns |
List of regex patterns, if a directory matches any, then it is not included in the job files manifest |
[] |
no |
genie.jobs.files.filter.file-reject-patterns |
List of regex patterns, if a file matches any, then it is not included in the job files manifest |
[] |
no |
genie.jobs.forwarding.enabled |
Whether or not to attempt to forward kill and get output requests for jobs |
true |
no |
genie.jobs.forwarding.port |
The port to forward requests to as it could be different than ELB port |
8080 |
no |
genie.jobs.forwarding.scheme |
The connection protocol to use (http or https) |
http |
no |
genie.jobs.locations.archives |
The default root location where job archives should be stored. Scheme should be included. Created if doesn’t exist. |
no |
|
genie.jobs.locations.attachments |
The default root location where job attachments will be temporarily stored. Scheme should be included. Created if doesn’t exist. |
no |
|
genie.jobs.locations.jobs |
The default root location where job working directories will be placed. Created by system if doesn’t exist. |
no |
|
genie.jobs.max.stdErrSize |
The maximum number of bytes the job standard error file can grow to before Genie will kill the job |
8589934592 |
no |
genie.jobs.max.stdOutSize |
The maximum number of bytes the job standard output file can grow to before Genie will kill the job |
8589934592 |
no |
genie.jobs.memory.maxSystemMemory |
The total number of MB out of the system memory that Genie can use for running jobs |
30720 |
no |
genie.jobs.memory.defaultJobMemory |
The total number of megabytes Genie will assume a job is allocated if not overridden by a command or user at runtime |
1024 |
no |
genie.jobs.memory.maxJobMemory |
The maximum amount of memory, in megabytes, that a job client can be allocated |
10240 |
no |
genie.jobs.submission.enabled |
Whether new job submission is enabled ( |
true |
yes |
genie.jobs.submission.disabledMessage |
A message to return to the users when new job submission is disabled |
Job submission is currently disabled. Please try again later. |
yes |
genie.jobs.users.creationEnabled |
Whether Genie should attempt to create a system user in order to run the job as or not. Genie user must have sudo rights for this to work. |
false |
no |
genie.jobs.users.runAsUserEnabled |
Whether Genie should run the jobs as the user who submitted the job or not. Genie user must have sudo rights for this to work. |
false |
no |
genie.leader.enabled |
Whether this node should be the leader of the cluster or not. Should only be used if leadership is not being determined by Zookeeper or other mechanism via Spring |
false |
no |
genie.mail.fromAddress |
The e-mail address that should be used as the from address when alert emails are sent |
no |
|
genie.mail.password |
The password for the e-mail server |
no |
|
genie.mail.user |
The user to log into the e-mail server with |
no |
|
genie.notifications.sns.enabled |
Wether to enable SNS publishing of events |
- |
no |
genie.notifications.sns.topicARN |
The SNS topic to publish to |
- |
no |
genie.notifications.sns.additionalEventKeys.<KEY> |
Map of KEYs and corresponding values to be added to the SNS messages published |
- |
no |
genie.redis.enabled |
Whether to enable storage of HTTP sessions inside Redis via Spring Session |
false |
no |
genie.retry.archived-job-get-metadata.initialDelay |
The initial interval between retries to get archived job metadata. Milliseconds |
1000 |
no |
genie.retry.archived-job-get-metadata.multiplier |
The amount the delay should increase on every retry. e.g. start at 1 second → 2 seconds → 4 seconds with a value of 2.0 |
2.0 |
no |
genie.retry.archived-job-get-metadata.noOfRetries |
The number of times to retry requests to get archived job metadata before failure |
5 |
no |
genie.retry.initialInterval |
The amount of time to wait after initial failure before retrying the first time in milliseconds |
10000 |
no |
genie.retry.maxInterval |
The maximum amount of time to wait between retries for the final retry in the back-off policy |
60000 |
no |
genie.retry.noOfRetries |
The number of times to retry requests to before failure |
5 |
no |
genie.retry.s3.noOfRetries |
The number of times to retry requests to S3 before failure |
5 |
no |
genie.retry.sns.noOfRetries |
The number of times to retry requests to SNS before failure |
5 |
no |
genie.scripts-manager.refresh-interval |
Interval for the script manager to reload and recompile known scripts (in milliseconds) |
300000 |
no |
genie.scripts.cluster-selector.source |
URI of the script to load. |
null |
no |
genie.scripts.cluster-selector.auto-load-enabled |
If true, the script eagerly load during startup, as opposed to lazily load on first use. |
false |
no |
genie.scripts.cluster-selector.timeout |
Maximum script execution time (in milliseconds). After this time has elapsed, evaluation is shut down. |
5000 |
no |
genie.scripts.command-selector.source |
URI of the script to load. |
null |
no |
genie.scripts.command-selector.auto-load-enabled |
If true, the script eagerly load during startup, as opposed to lazily load on first use. |
false |
no |
genie.scripts.command-selector.timeout |
Maximum script execution time (in milliseconds). After this time has elapsed, evaluation is shut down. |
5000 |
no |
genie.scripts.execution-mode-filter.source |
URI of the script to load. |
null |
no |
genie.scripts.execution-mode-filter.auto-load-enabled |
If true, the script eagerly load during startup, as opposed to lazily load on first use. |
false |
no |
genie.scripts.execution-mode-filter.timeout |
Maximum script execution time (in milliseconds). After this time has elapsed, evaluation is shut down. |
5000 |
no |
genie.s3filetransfer.strictUrlCheckEnabled |
Whether to strictly check an S3 URL for illegal characters before attempting to use it |
false |
no |
genie.swagger.enabled |
Whether to enable Swagger to be bootstrapped into the Genie service so that the endpoint /swagger-ui.html shows API documentation generated by the swagger specification |
false |
no |
genie.tasks.agent-cleanup.enabled |
Whether to enable the task that detects jobs whose agent has gone AWOL, and marks them failed |
true |
no |
genie.tasks.agent-cleanup.launchTimeLimit |
How long a job can stay in ACCEPTED state, waiting for the agent to claim it, before the job is marked failed, in milliseconds |
240000 |
no |
genie.tasks.agent-cleanup.refreshInterval |
How often the AWOL agent tasks executed, in milliseconds |
10000 |
no |
genie.tasks.agent-cleanup.reconnectTimeLimit |
How long of a leeway to give a job after its agent disconnected and before the job is marked failed, in milliseconds |
120000 |
no |
genie.tasks.cluster-checker.healthIndicatorsToIgnore |
The health indicator groups from the actuator /health endpoint to ignore when determining if a node is lost or not as a comma separated list |
mail,genieAgent,localAgentLauncher |
no |
genie.tasks.cluster-checker.lostThreshold |
The number of times a Genie nodes need to fail health check in order for jobs running on that node to be marked as lost and failed by the Genie leader |
3 |
no |
genie.tasks.cluster-checker.port |
The port to connect to other Genie nodes on |
8080 |
no |
genie.tasks.cluster-checker.rate |
The number of milliseconds to wait between health checks to other Genie nodes |
300000 |
no |
genie.tasks.cluster-checker.scheme |
The scheme (http or https) for connecting to other Genie nodes |
http |
no |
genie.tasks.database-cleanup.application-cleanup.skip |
Skip the Applications table when performing database cleanup |
false |
yes |
genie.tasks.database-cleanup.cluster-cleanup.skip |
Skip the Clusters table when performing database cleanup |
false |
yes |
genie.tasks.database-cleanup.command-cleanup.skip |
Skip the Commands table when performing database cleanup |
false |
yes |
genie.tasks.database-cleanup.command-deactivation.commandCreationThreshold |
The number of days before the current cleanup run that a command must have been created before in the system to be considered for deactivation. |
false |
yes |
genie.tasks.database-cleanup.command-deactivation.jobCreationThreshold |
The number of days before the current cleanup run that command must not have been used in a job for that command to be considered for deactivation. |
false |
yes |
genie.tasks.database-cleanup.command-deactivation.skip |
Skip deactivating Commands when performing database cleanup |
false |
yes |
genie.tasks.database-cleanup.enabled |
Whether or not to delete old and unused records from the database at a scheduled interval.
See: |
true |
no |
genie.tasks.database-cleanup.expression |
The cron expression for how often to run the database cleanup task |
0 0 0 * * * |
yes |
genie.tasks.database-cleanup.file-cleanup.skip |
Skip the Files table when performing database cleanup |
false |
yes |
genie.tasks.database-cleanup.job-cleanup.skip |
Skip the Jobs table when performing database cleanup |
false |
yes |
genie.tasks.database-cleanup.job-cleanup.pageSize |
The max number of jobs to delete per transaction |
1000 |
yes |
genie.tasks.database-cleanup.job-cleanup.retention |
The number of days to retain jobs in the database |
90 |
yes |
genie.tasks.database-cleanup.tag-cleanup.skip |
Skip the Tags table when performing database cleanup |
false |
yes |
genie.tasks.disk-cleanup.enabled |
Whether or not to remove old job directories on the Genie node or not |
true |
no |
genie.tasks.disk-cleanup.expression |
How often to run the disk cleanup task as a cron expression |
0 0 0 * * * |
no |
genie.tasks.disk-cleanup.retention |
The number of days to leave old job directories on disk |
3 |
no |
genie.tasks.executor.pool.size |
The number of executor threads available for tasks to be run on within the node in an adhoc manner. Best to set to the number of CPU cores x 2 + 1 |
1 |
no |
genie.tasks.scheduler.pool.size |
The number of available threads for the scheduler to use to run tasks on the node at scheduled intervals. Best to set to the number of CPU cores x 2 + 1 |
1 |
no |
genie.tasks.user-metrics.enabled |
Whether or not to publish user-tagged metrics |
true |
no |
genie.tasks.user-metrics.refresh-interval |
Publish/refresh interval in milliseconds |
30000 |
no |
genie.zookeeper.discovery-path |
The namespace to use for Genie discovery service (maps agents to the node they’re connected to) |
/genie/discovery/ |
no |
genie.zookeeper.leader-path |
The namespace to use for Genie leadership election of a given cluster |
/genie/leader/ |
no |
10.1.2. Spring Properties
Property | Description | Default Value |
---|---|---|
info.genie.version |
The Genie version to be displayed by the UI and returned by the actuator /info endpoint. Set by the build. |
Current build version |
management.endpoints.web.base-path |
The default base path for the Spring Actuator[https://docs.spring.io/spring-boot/docs/current/actuator-api/html/]
management endpoints. Switched from default |
/admin |
spring.application.name |
The name of the application in the Spring context |
genie |
spring.banner.location |
Banner file location |
genie-banner.txt |
spring.data.redis.repositories.enabled |
Whether Spring data repositories should attempt to be created for Redis |
false |
spring.datasource.url |
JDBC URL of the database |
jdbc:h2:mem:genie |
spring.datasource.username |
Username for the datasource |
root |
spring.datasource.password |
Database password |
|
spring.datasource.hikari.leak-detection-threshold |
How long to wait (in milliseconds) before a connection should be considered leaked out of the pool if it hasn’t been returned |
30000 |
spring.datasource.hikari.pool-name |
The name of the connection pool. Will show up in logs under this name. |
genie-hikari-db-pool |
spring.flyway.baselineDescription |
Description for the initial baseline of a database instance |
Base Version |
spring.flyway.baselineOnMigrate |
Whether or not to baseline when Flyway is present and the datasource targets a DB that isn’t managed by Flyway |
true |
spring.flyway.baselineVersion |
Initial DB version (When Genie migrated to Flyway is current setting. Shouldn’t touch) |
3.2.0 |
spring.flyway.locations |
Where flyway should look for database migration files |
classpath:db/migration/{vendor} |
spring.jackson.serialization.write-dates-as-timestamps |
Whether to serialize instants as timestamps or ISO8601 strings |
false |
spring.jackson.time-zone |
Time zone used when formatting dates. For instance |
UTC |
spring.jpa.hibernate.ddl-auto |
DDL mode. This is actually a shortcut for the "hibernate.hbm2ddl.auto" property. |
validate |
spring.jpa.hibernate.properties.hibernate.jdbc.time_zone |
The timezone to use when writing dates to the database see article |
UTC |
spring.profiles.active |
The default active profiles when Genie is run |
dev |
spring.mail.host |
The hostname of the mail server |
|
spring.mail.testConnection |
Whether to check the connection to the mail server on startup |
false |
spring.redis.host |
Endpoint for the Redis cluster used to store HTTP session information |
|
spring.servlet.multipart.max-file-size |
Max attachment file size. Values can use the suffixed "MB" or "KB" to indicate a Megabyte or Kilobyte size. |
100MB |
spring.servlet.multipart.max-request-size |
Max job request size. Values can use the suffixed "MB" or "KB" to indicate a Megabyte or Kilobyte size. |
200MB |
spring.session.store-type |
The back end storage system for Spring to store HTTP session information. See Spring Boot Session for more information. Currently on classpath only none, redis and jdbc will work. |
none |
10.1.3. Spring Cloud Properties
Properties set by default to manipulate various Spring Cloud libraries.
Property | Description | Default Value |
---|---|---|
cloud.aws.credentials.useDefaultAwsCredentialsChain |
Whether to attempt creation of a standard AWS credentials chain. See Spring Cloud AWS for more information. |
true |
cloud.aws.region.auto |
Whether the AWS region will be attempted to be auto recognized via the AWS metadata services on EC2. See Spring Cloud AWS for more information. |
false |
cloud.aws.region.static |
The default AWS region. See Spring Cloud AWS for more information. |
us-east-1 |
cloud.aws.stack.auto |
Whether auto stack detection is enabled. See Spring Cloud AWS for more information. |
false |
spring.cloud.zookeeper.enabled |
Whether to enable zookeeper functionality or not |
false |
spring.cloud.zookeeper.connectString |
The connection string for the zookeeper cluster |
localhost:2181 |
10.1.4. gRPC Server properties
Property |
Description |
Default Value |
grpc.server.port |
The port on which to bind the gRPC server, if enabled. |
9090 |
grpc.server.address |
The address on which to bind the gRPC server, if enabled. |
0.0.0.0 |
10.2. Profile Specific Properties
10.2.1. Prod Profile
Property | Description | Default Value |
---|---|---|
spring.datasource.url |
JDBC URL of the database |
jdbc:mysql://127.0.0.1/genie?useUnicode=yes&characterEncoding=UTF-8&useLegacyDatetimeCode=false |
spring.datasource.username |
Username for the datasource |
root |
spring.datasource.password |
Database password |
|
spring.datasource.hikari.data-source-properties.cachePrepStmts |
true |
|
spring.datasource.hikari.data-source-properties.prepStmtCacheSize |
250 |
|
spring.datasource.hikari.data-source-properties.prepStmtCacheSqlLimit |
2048 |
|
spring.datasource.hikari.data-source-properties.serverTimezone |
UTC |
|
spring.datasource.hikari.data-source-properties.userServerPrepStatements |
true |
11. Genie Agent
This section describes the various properties that can be set to control the behavior of the Genie agent.
Unless otherwise noted, properties are loaded from the standard sources (defaults, profiles, other files). The server also has a chance to override them during the 'Agent Configuration' execution stage.
11.1. Default Properties
11.1.1. Genie Properties
Property | Description | Default Value | Notes |
---|---|---|---|
|
Time allowed to the agent to shut down cleanly (archive, cleanup, …) before the JVM is forcefully shut down |
5m |
|
|
Maximum time block when trying to forcefully push a manifest update |
5s |
|
|
Scheduling policy for backoff in case of error during file streaming |
FROM_PREVIOUS_EXECUTION_BEGIN |
|
|
Minimum delay before another attempt during file streaming |
1s |
|
|
Maximum delay before another attempt during file streaming |
10s |
|
|
Multiplication factor for retry delay before another attempt during file streaming |
1.1 |
|
|
Wether to enable compression when transmitting file chunks to the server |
true |
|
|
Max size of a file chunk sent to the server |
1MB |
|
|
Maximum number of files transmitted concurrently to the server |
5 |
|
|
Maximum time a file transfer is allowed to complete before it is terminated during agent shutdown |
15s |
Should be lower then |
|
Interval between heartbeats |
2s |
|
|
Interval to wait before re-establishing the heartbeat stream |
1s |
|
|
Whether to periodically poll the running job status from the server, and to shut down in case the job is marked failed |
true |
|
|
How often to check for files limits |
1m |
|
|
Maximum number of files in the job directory |
64000 |
|
|
Maximum size of the largest file in the job directory |
8GB |
|
|
Maximum total size of the job directory |
16GB |
|
|
Scheduling policy for backoff in case of error during kill request |
FROM_PREVIOUS_EXECUTION_COMPLETION |
|
|
Minimum delay before another attempt during kill request |
500ms |
|
|
Maximium delay before another attempt during kill request |
5s |
|
|
Multiplication factor for retry delay before another attempt during kill request |
1.2 |
|
|
Time allowed to the job execution state machine to shut down cleanly before the JVM is shut down |
60s |
|
|
Time allowed on task running on internal task executors to complete before the agent terminates |
30s |
This property is bound during initialization and cannot be modified at runtime by the server. |
|
Time allowed on task running on internal task schedulers to complete before the agent terminates |
30s |
This property is bound during initialization and cannot be modified at runtime by the server. |
|
Time allowed on task running on Spring’s system task executor to complete before the agent terminates |
60s |
This property is bound during initialization and cannot be modified at runtime by the server. |
|
Time allowed on task running on Spring’s system task scheduler to complete before the agent terminates |
60s |
This property is bound during initialization and cannot be modified at runtime by the server. |
12. Metrics
The following is an extensive list of metrics (counters, timers, gauges, …) published organically by Genie, in addition to metrics published by Spring, JVM and system metrics and statistics.
Metrics are collected using Micrometer which allows system admins to plugin a variety of backend collection systems (Atlas, Datadog, Graphite, Ganglia, etc).
See website for more details.
Genie ships with no backend system compiled in.
It will have to be added if one is desired otherwise metrics are just published within the local JVM and available on the Actuator /metrics
endpoint.
Name | Description | Unit | Source | Tags |
---|---|---|---|---|
|
Number of agents connected to the node |
count |
|
|
|
Number of agents connected to the node and registered in discovery |
count |
|
|
|
Count of Zookeeper session state changes |
count |
|
|
|
Timing and count of registrations of local agent with discovery service |
count |
|
|
|
Timing and count of unregistrations of local agent with discovery service |
count |
|
|
|
Count of new agent connections to the local node |
count |
|
|
|
Count of new agent disconnections from the local node |
count |
|
|
|
The number of agents sending heartbeats to the server |
count |
|
|
|
Count of file transfer from remote agents to this node |
count |
|
|
|
Count of attempted file transfers that were rejected because too many transfers are already in progress on this node |
count |
|
|
|
Size of the manifest cache |
count |
|
|
|
Number of active agent control streams |
size |
|
|
|
Count of transfers that timed out on this node |
count |
|
|
|
Number of bytes requested from the agent for a given transfer |
distribution (bytes) |
|
|
|
Number of active transfers on this node |
count |
|
|
|
Counts the number of jobs submitted without an attachment |
count |
|
|
|
Counts the number of jobs submitted with one or more attachments |
count |
|
|
|
Time taken to download a file from via HTTP |
nanoseconds |
|
|
|
Time taken to retrieve last modification time for a HTTP document |
nanoseconds |
|
|
|
Time taken to upload a file via HTTP |
nanoseconds |
|
|
|
Time taken to download a file from S3 |
nanoseconds |
|
|
|
Count the number of times a S3 URL fails strict validation, but is allowed through anyway |
count |
|
|
|
Time taken to obtain S3 file metadata (modification time) |
nanoseconds |
|
|
|
Time taken to upload a local file to S3 |
nanoseconds |
|
|
|
Time taken to serve a file |
nanoseconds |
|
|
|
Time taken for each health indicator to report its status |
nanoseconds |
|
|
|
Number of jobs currently active locally |
amount |
|
|
|
Current number of agent jobs whose agent is not connected to any node. |
count |
|
|
|
Counter of jobs terminated because the agent disappeared for too long |
count |
|
|
|
Time taken by the loaded script to select a cluster among the one passed as input |
nanoseconds |
|
|
|
Time taken to initialize the job database record and resolve applications, command, cluster, based on criteria and cluster selection strategy. |
nanoseconds |
|
|
|
Time taken to perform post-job-completion finalization such as folder cleanup, archival and email notification. |
nanoseconds |
|
|
|
Counts various kinds of nonfatal errors encountered (email, archival, cleanup, …). A single request may increment for multiple errors. |
count |
|
|
|
Counts jobs marked to execute in agent mode (V4) and embedded mode (V3) |
count |
|
|
|
File cache hit ratio |
ratio |
|
|
|
File cache loading exception ratio |
ratio |
|
|
|
File cache miss ratio |
ratio |
|
|
|
Counts the number of jobs that completed (successfully or not) |
count |
|
|
|
Total amount of memory allocated to local jobs (according to job request) |
Megabytes |
|
|
|
Count the number of completed job notifications |
count |
|
|
|
Count the number of job transitions notifications |
count |
|
|
|
Time taken to set up individual applications (creating folders, staging dependencies and configurations) |
nanoseconds |
|
|
|
Time taken to stage all applications that a job depends on |
nanoseconds |
|
|
|
Time taken to set up cluster a job runs on (creating folders, staging dependencies and configurations) |
nanoseconds |
|
|
|
Time taken to set up command a job runs (creating folders, staging dependencies and configurations) |
nanoseconds |
|
|
|
Time taken to set up job environment (creating folder structure, shell environment script) |
nanoseconds |
|
|
|
Time taken to set up run script section that deals with child process termination |
nanoseconds |
|
|
|
Time taken to complete job launch |
nanoseconds |
|
|
|
Time taken to set up job-specific environment (creating folders, staging attachments, dependencies) |
nanoseconds |
|
|
|
Counts the number of jobs killed for exceeding the maximum allowed standard error limit |
count |
|
|
|
Counts the number of jobs killed for exceeding the maximum allowed standard output limit |
count |
|
|
|
Time taken to write a file with details about failure to launch a job |
nanoseconds |
|
|
|
Time taken to create a job working directory (includes failures to create) |
nanoseconds |
|
|
|
Time taken to create the job run script |
nanoseconds |
|
|
|
Time taken to execute the job workflow tasks |
nanoseconds |
|
|
|
Time taken to submit a new job (create workspace and scripts, register in database and kick off) |
nanoseconds |
|
|
|
Time taken to publish the event that announces a job has started |
nanoseconds |
|
|
|
Time taken to persist information about job execution |
nanoseconds |
|
|
|
Time taken to persist the job runtime information in the database |
nanoseconds |
|
|
|
Count of jobs rejected by the server because the user is exceeding the maximum number of running jobs |
count |
|
|
|
Time taken initialize the job environment (working directory, script) and fork the children |
nanoseconds |
|
|
|
Counts the successful checks made on locally running jobs |
count |
|
|
|
Counts the number of jobs killed for exceeding the maximum allowed run time |
count |
|
|
|
Count number of times a job asynchronous task cancelling was requested and failed (failure to cancel may be due to the task no longer being running) |
count |
|
|
|
Counts the number of times a genie node failed to resume monitoring a local job process after server restart |
count |
|
|
|
Counts the number of time an exception was raised while trying to check on a locally running job |
count |
|
|
|
Count the number of notification published to SNS |
count |
|
|
|
Time taken to load (download, read, compile) a given script |
nanoseconds |
|
|
|
Time taken to evaluate a given script (if previously compiled successfully) |
nanoseconds |
|
|
|
Counter for calls to the 'handshake' protocol of the Genie Agent Job Service |
count |
|
|
|
Time taken to generate all the permutations for cluster criteria between the command options and the job request |
nanoseconds |
|
|
|
Time taken to completely resolve the job |
nanoseconds |
|
|
|
Time taken to retrieve applications information for this task |
nanoseconds |
|
|
|
Counter for cluster selector algorithms invocations |
count |
|
|
|
Time taken to resolve the cluster to use for a job |
nanoseconds |
|
|
|
Time taken to resolve the command to use for a job |
nanoseconds |
|
|
|
The time taken to fetch the metadata of an archived job if it isn’t already cached |
nanoseconds |
|
|
|
Counts the number of agent connections the leader reaped due to the host being unhealthy |
count |
|
|
|
Counts the number of time the leader retrieved health status of a remote node and one of the (non-ignored) indicators had a status different than UP |
count |
|
|
|
Counts the number of time the leader retrieved health status of a remote node and failed to parse the response |
count |
|
|
|
Number of jobs marked as "lost" due to a consistent failure to contact the Genie node hosting them |
count |
|
|
|
Number of Genie nodes that the leader has currently marked unhealthy |
Current amount |
|
|
|
Counts the number of time the leader failed to retrieve health status of a remote node (example: socket timeout). |
count |
|
|
|
Time taken to delete application records from the database |
nanoseconds |
|
|
|
Time taken to delete cluster records from the database |
nanoseconds |
|
|
|
Time taken to deactivate command records in the database |
nanoseconds |
|
|
|
Time taken to delete command records from the database |
nanoseconds |
|
|
|
Time taken to delete file records from the database |
nanoseconds |
|
|
|
Time taken to delete tag records from the database |
nanoseconds |
|
|
|
Time taken to cleanup database records for jobs that executed over a given amount of time in the past |
nanoseconds |
|
|
|
Number of deleted application records purged during the last database cleanup pass |
amount |
|
|
|
Number of command records set to INACTIVE during the last database cleanup pass |
amount |
|
|
|
Number of terminated cluster records purged during the last database cleanup pass |
amount |
|
|
|
Number of deleted command records purged during the last database cleanup pass |
amount |
|
|
|
Number of unused file references purged during the last database cleanup pass |
amount |
|
|
|
Number of job records purged during the last database cleanup pass |
amount |
|
|
|
Number of unused tag records purged during the last database cleanup pass |
amount |
|
|
|
Number of job folders deleted during the last cleanup pass |
amount |
|
|
|
Number of failures deleting job folders during the last cleanup pass |
amount |
|
|
|
Counts the number of times a local job folder could not be deleted |
count |
|
|
|
Counts the number of times a local job folder is encountered during cleanup and the corresponding job record in the database cannot be found |
count |
|
|
|
Number of active jobs tagged with owner user. |
count |
|
|
|
Amount of memory used by active jobs tagged with owner user. |
Megabytes |
|
|
|
Number of distinct users with at least one job in RUNNING state. |
count |
|
|
|
Counts exceptions returned to the user |
count |
|
|
(*) Source may add additional tags on a case-by-case basis