1. Introduction
Genie is a complicated service. It can be hard to understand the value it brings to a data platform without seeing it in action. For this reason this set of demo steps exists to show how Genie fits into a data platform and how it can help both administrators and users.
For high level concept documentation please see the website. |
For high level information and installation instructions please see the Reference Guide. |
For documentation of the REST API for this version of Genie please see the API Guide. |
2. Info
2.1. Prerequisites
Probably at least 6 GB
Disk Space
About ~5.5 GB for 5 images
Available Ports on your local machine
8080 (Genie)
8088, 19888, 50070, 50075, 8042 (YARN Prod Cluster)
8089, 19889, 50071, 50076, 8043 (YARN Test Cluster)
9090 (Trino Cluster)
2.2. Development Environment
For reference here are the machine specs that this demo has been tested on
Mid-2018 MacBook Pro
MacOS Catalina 10.15.5
2.9 GHz 6-Core Intel Core i9
32 GB 2400 MHz DDR4
Docker Desktop
Docker Engine 19.03.8
Docker Compose 1.25.5
6 CPUs
1 GB swap
2.3. Caveats
Since all this is running locally on one machine it can be slow, much slower than you’d expect production level systems to run
Networking is kind of funky within the Hadoop UI due to how DNS is working between the containers. Sometimes if you click a link in the UI and it doesn’t work try swapping in localhost for the hostname instead.
2.4. Port Usages
Endpoint | URL |
UI |
Actuator |
UI | Prod URL | Test URL |
Resource Manager |
Job History Server |
NameNode |
DataNode |
Container Logs |
Endpoint |
Web UI |
2.5. Scripts
Script Name | Invocation | Purpose |
Init |
Initialize the configuration data in the Genie system for the rest of the demo |
Move Tags |
Move the production tag |
Reset Tags |
Move the production tag |
Job | Invocation | Action |
Hadoop |
Runs grep against input directory in HDFS |
Runs a |
Spark Shell |
Simply prints the Spark Shell help output to stdout |
Spark Submit 2.4.x |
Runs the SparkPi example for Spark 2.4.x with input of 10. Results stored in stdout |
Spark Submit 3.0.x |
Runs the SparkPi example for Spark 3.0.x with input of 10. Results stored in stdout |
Trino |
Sends query ( |
Lists all yarn applications from the resource manager into stdout |
3. Demo Steps
Open a terminal
Download the Docker Compose file
Save the below file as
somewhere on your machine
Go to your working directory
Wherever you downloaded the
to -
cd YourWorkDir
Start the demo containers
docker-compose up -d
The first time you run this it could take quite a while as it has to download 5 large images
This will use docker compose to bring up 6 containers
Instantiation of
Image from official Genie build which runs Genie app server
Maps port 8080 for Genie UI
Instantiation of
Extension of apache image which includes files used during demo that Genie will download
Instantiation of
Simulates a client node for Genie which includes several python scripts to configure and run jobs on Genie
Instantiations of
Simulates having two clusters available and registered with Genie with roles as a production and a test cluster
Hadoop Interfaces
table for list of available ports
Instantiation of
Single node Trino cluster
Web UI bound to
Wait for all services to start
Verify Genie UI and both Resource Manager UI’s are available via your browser
Check out the Genie UI
In a browser navigate to the Genie UI and notice there are no
currently -
These are available by clicking on the tabs in the top left of the UI
Login to the client container
From terminal
docker exec -it genie_demo_client_4.3.21 /bin/bash
This should put you into a bash shell in
within the running container
Initialize the System
Back in the terminal initialize the configurations for the two clusters (prod and test), 5 commands (hadoop, hdfs, yarn, spark-submit, spark-shell) and two application (hadoop, spark)
Feel free to
the contents of this script to see what is happening
Verify Configurations Loaded
In the browser browse the Genie UI again and verify that now
have data in them
Run some jobs
See the
Job Scripts
table for available commands -
For example:
./run_hadoop_job.py test
./run_yarn_job.py test
./run_hdfs_job.py test
./run_spark_submit_job.py sla 2.1.3
to run the jobs against the Prod cluster -
If any of the Docker container crashes, you may need to increase the default memory available in the Docker preferences. The current default for a fresh installation is 2GB, which is not sufficient for this demo. Use
docker stats
to verify the limit is 4GB or higher.
For each of these jobs you can see their status, output and other information via the UI’s
In the
tab of the Genie UI you can see all the job history-
Clicking any row will expand that job information and provide more links
Clicking the folder icon will bring you to the working directory for that job
Go to the respective cluster Resource Manager UI’s and verify the jobs ran on their respective cluster
Move load from prod to test
Lets say there is something wrong with the production cluster. You don’t want to interfere with users but you need to fix the prod cluster. Let’s switch the load over to the test cluster temporarily using Genie
In terminal switch the prod tag
from Prod to Test cluster-
Verify in Genie UI
tab that thesched:sla
tag only appears on theGenieDemoTest
Run more of the available jobs
Verify that all jobs went to the
cluster and none went to theGenieDemoProd
cluster regardless of whichenv
you passed into the Gradle commands above
Reset the system
You’ve resolved the issues with your production cluster. Move the
tag back -
Verify in Genie UI
tab thatsched:sla
tag only appears onGenieDemoProd
Run some jobs
Verify jobs are again running on
cluster based on environment
Explore the scripts
Look through the scripts to get a sense of what is submitted to Genie
Log out of the container
Login to the main Genie app container (which it contains the agent CLI )
From terminal
docker exec -it genie_demo_app_4.3.21 /bin/bash
Verify you can launch the agent
java -jar /usr/local/bin/genie-agent.jar help
Verify the agent can connect to the local Genie server
java -jar /usr/local/bin/genie-agent.jar ping --serverHost localhost --serverPort 9090
Launch a Genie job, similar to the ones above
java -jar /usr/local/bin/genie-agent.jar exec --serverHost localhost --serverPort 9090 --jobName 'Genie Demo CLI Trino Job' --commandCriterion 'TAGS=type:trino' --clusterCriterion 'TAGS=sched:adhoc,type:trino' — --execute 'select * from tpcds.sf1.item limit 100;'
java -jar /usr/local/bin/genie-agent.jar exec --serverHost localhost --serverPort 9090 --jobName 'Genie Demo CLI Spark Shell Interactive Job' --commandCriterion 'TAGS=type:spark-shell' --clusterCriterion 'TAGS=sched:sla,type:yarn' --interactive
This starts an interactive Spark shell. Hit
to exit gracefully
In the Genie UI, explore the two jobs
Notice how the first one (non-interactive) dumped the query results in a
Notice how the second one (interactive) does not create
files, since the streams are presented directly in the shell
Log out of the container
Once you’re done trying everything out you can shut down the demo
docker-compose down
This will stop and remove all the containers from the demo. The images will remain on disk and if you run the demo again it will startup much faster since nothing needs to be downloaded or built.
4. Feedback
If you have any feedback about this demo feel free to reach out to the Genie team via any of the communication methods listed in the Contact page.