Netflix is committed to open source. Netflix both leverages and provides open source technology focused on providing the leading Internet television network. Our technology focuses on providing immersive experiences across all internet-connected screens. Netflix's deployment technology allows for continuous build and integration into our worldwide deployments serving members in over 50 countries. Our focus on reliability defined the bar for cloud based elastic deployments with several layers of failover. Netflix also provides the technology to operate services responsibility with operational insight, peak performance, and security. We provide technologies for data (persistent & semi-persistent) that serve the real-time load to our 62 million members, as well as power the big data analytics that allow us to make informed decisions on how to improve our service. If you want to learn more, jump into any of the functional areas below to learn more.
Data is invaluable in making Netflix such an exceptional service for our customers. Behind the scenes, we have a rich ecosystem of (big) data technologies facilitating our algorithms and analytics. We use and contribute to broadly-adopted open source technologies including Hadoop, Hive, Pig, Parquet, Presto, and Spark. In addition, we’ve developed and contributed some additional tools and services, which have further elevated our data platform. Genie is a powerful, REST-based abstraction to our various data processing frameworks, notably Hadoop. Inviso provides detailed insights into the performance of our Hadoop jobs and clusters. Lipstick shows the workflow of Pig jobs in a clear, visual fashion. And Aegisthus enables the bulk abstraction of data out of Cassandra for downstream analytic processing.
Netflix has open sourced many of our Gradle plugins under the name Nebula. Nebula started off as a set of strong opinions to make Gradle simple to use for our developers. But we quickly learned that we could use the same assumptions on our open source projects and on other Gradle plugins to make them easy to build, test and deploy. By standardizing plugin development, we've lowered the barrier to generating them, allowing us to keep our build modular and composable.
We require additional tools to take these builds from the developers' desks to AWS. There are tens of thousands of instances running Netflix. Every one of these runs on top of an image created by our open source tool Aminator. Once packaged, these AMIs are deployed to AWS using our cloud deployment and management tool, Asgard. Asgard is well recognized outside of Netflix, and was even used by President Obama's team in the 2012 election.
The cloud platform is the foundation and technology stack for the majority of the services within Netflix. The cloud platform consists of cloud services, application libraries and application containers. Specifically, the platform provides service discovery through Eureka, distributed configuration through Archaius, resilent and intelligent inter-process and service communication through Ribbon. To provide reliability beyond single service calls, Hystrix is provided to isolate latency and fault tolerance at runtime. The previous libraries and services can be used with any JVM based container.
The platform provides JVM container services through Karyon and Governator and support for non-JVM runtimes via the Prana sidecar. While Prana provides proxy capabilities within an instance, Zuul (which integrates Hystrix, Eureka, and Ribbon as part of its IPC capabilities) provides dyamically scriptable proxying at the edge of the cloud deployment.
The platform works well within the EC2 cloud utilizing the Amazon autoscaler. For container applications and batch jobs running on Apache Mesos, Fenzo is a scheduler that provides advanced scheduling and resource management for cloud native frameworks. Fenzo provides plugin implementations for bin packing, cluster autoscaling, and custom scheduling optimizations can be implemented through user-defined plugins.
Handling over a trillion data operations per day requires an interesting mix of “off the shelf OSS” and in house projects. No single data technology can meet every use case or satisfy every latency requirement. Our needs range from non-durable in-memory stores like Memcached and Redis, to searchable datastores such as Elastic and durable must-never-go-down datastores like Cassandra and MySQL.
Our Cloud usage and the scale at which we consume these technologies, has required us to build tools and services that enhance the datastores we use. We’ve created the sidecars Raigad and Priam to help with the deployment, management and backup/recovery of our hundreds of Elastic and Cassandra clusters. We’ve created EVCache and Dynomite to use Memcached and Redis at scale. We’ve even developed the Astyanax and Dyno client libraries to better consume datastores in the Cloud.
Telemetry and metrics play a critical role in the operations of any company, and at more than a billion metrics per minute flowing into Atlas, our time-series telemetry platform, they play a critical role at Netflix. However, Operational Insight is considered a higher-order family of products at Netflix, including the ability to understand the current components of our cloud ecosystem via Edda, and the easy integration of Java application code with Atlas via the Spectator library.
Effective performance instrumentation allows engineers to drill quickly on a massive volume of metrics, making critical decisions quickly and efficiently. Vector exposes high-resolution host-level metrics with minimal overhead.
Outside of the operational domain, cost management and visibility into where our resources are used in the cloud is a multi-million question to be answered; we've built Ice as a way to expose ongoing cost and and cloud utilization trends to engineers so they can have a better understanding of the footprint of their applications in our environment.
Finally to validate reliability, we have the Simian Army which tests our instances for random failures.
Security is an increasingly important area for organizations of all types and sizes, and Netflix is happy to contribute a variety of security tools and solutions to the open source community. Our security-related open source efforts generally fall into one of two categories - operational tools and systems to make security teams more efficient and effective when securing large and dynamic environments and security infrastructure components that provide critical security services for modern distributed systems.
On the operational side, Security Monkey helps monitor and secure large AWS-based environments and Scumblr leverages Internet-wide targeted searches to surface specific security issues for investigation. On the security infrastructure side, MSL is an extensible and flexible secure messaging protocol that addresses a number of secure communications use cases and requirements.