Although it’s quite easy to start a journey with containers, finding the best solution for handling them can be much more challenging. Especially, if you need to take care of scheduling, deploying and managing containers in an environment that is a more complex than a simple instance.
At Apptension we had to face the same dilemma while implementing our Continuous Delivery stack from the scratch. Finally, we’ve found two solutions that we use during our day-to-day work for clients around the globe.
We’re about to guide you through AWS Elastic Beanstalk and Google Kubernetes Engine, two services we use for handling containers on production. Why use both? We wanted to be flexible and avoid getting trapped in a vendor lock-in situation. Using both of them we are more flexible, avoiding getting trapped in vendor lock-in situation. This article should help you decide which one meets your needs best.
Understanding AWS Elastic Beanstalk and Google Kubernetes Engine
Both solutions come with advantages stemming from their very structure.
AWS Elastic Beanstalk (or AWS EB) requires less knowledge and experience when one starts playing with it. Google Kubernetes Engine (here referred to as GKE) is a grown-up solution and is absorbing more and more of companies focused on containers.
Before we describe some significant differences between these two services, let’s talk about their general characteristics.
AWS Elastic Beanstalk
That was the obvious way to go when Apptension started to talk containers and we left behind the VMs and all that Ansible stuff we did back then. (We really appreciate the way Ansible helped us structurizing our deployment, though). We’ve been using AWS for a long time, so it was a natural choice looking for a native container solution. We started out with AWS Elastic Container Service (AWS ECS) and later evolved into AWS Elastic Beanstalk. We’ve been using the latter ever since.
AWS EB is based on the aforementioned AWS ECS. What it does is apply a layer of abstraction and add some convenient wrapping around the Elastic Container Services’ features. When one starts to work with ECS, it’s required to understand everything, starting from the VPC it’s gonna be placed in. Given that we use Terraform for each AWS part we create, it’s been hard to connect all the pieces together to retrieve the 200 status code on the environment’s URL.
We managed to make it happen, but Elastic Beanstalk was tempting us with a better UI and a way to better manage the environments. It also has a setup wizard for beginners.
Managing environments with Elastic Beanstalk
Elastic Beanstalk aggregates environments in applications, and every environment must be included in exactly one application.
There’s no specific configuration for the application itself. It can be easily treated as a logical grouping of environments. The configuration takes place in the environment setup. Here we’re assuming the only environment type we’re talking about is Multi Container Docker setup (but there’re more).
Every environment has to be placed in a VPC, whether in public or private subnets. While setting up the environment, you can choose what instance type to use and that’s the most important thing to notice – Elastic Beanstalk environments consists of whole instances.
There’s a configuration file called Dockerrun.aws.json which defines containers and the way they’re connected to each other. It’s just a JSON file after all, but some specific fields are required. The configuration file describes the environment being deployed on every environment’s instance.
If your environment is a scalable one and has, let’s say, 5 instances, the configuration will be deployed on 5 instances in the same manner. It results in a “monolith” being deployed on 5 instances here. If your configuration file contains nginx and backend containers, the pair will be deployed on each instance. We figured it would be convenient to define such small monoliths.
The configuration based on different environments connected via network may complicate things, but there’s definitely a better way to go. If we go along with the previous example, monolith containing nginx and backend containers could be divided. Nginx would create a reverse proxy sitting behind the Load Balancer and the backend, residing in private subnets, would be responsible for the requests. Then the backend connects to the RDS instance and everything works fine.
Google Kubernetes Engine
GKE requires a more complex approach, but it results in some specific advantages you can notice migrating from AWS EB. Google Kubernetes Engine takes advantage of Kubernetes itself, “an open-source system for automating deployment, scaling, and management of containerized applications”.
It’s not a framework, it’s not a library, it’s a whole platform. It can be used almost everywhere and it’s completely infrastructure independent. You can use it natively as Google Kubernetes Engine (former Google Container Engine), utilize “kubeadm” provided to set up the cluster on bare-metals or “kops”, which is the solution for AWS at the moment. ( AWS is currently working on EKS, a native solution similar to GKE, it’s, however, in a closed preview phase when this post is being written.) You can even set up everything by hand. It will work as long as there’s a network connection between instances involved in the cluster.
“So why have you moved to Google and GKE when there’s kops for AWS?”, you may ask. Well, that’s what we were thinking back then. But things turned up to be a bit more complicated than we expected.
Truth is, GKE is a no-hassle solution. You can create everything with a setup wizard on Google Cloud Platform admin panel or with an SDK provided called gcloud. It’s as simple as looking at the Google Cloud admin panel to get information about the cluster itself.
Since it’s a native solution, there’s not much to worry about when the very first Kubernetes cluster is being created. If there’s a need for advanced configuration, it’s right there too.
Kops, on the other hand, is a way of implementing Kubernetes into AWS. It’s not a native solution (hence no official support) and there’s a need for master instances to be created, deployed and managed along with nodes. While master represents REST API, which combined with other parts, takes care of the whole cluster, you may not want to be responsible for that. GKE takes care of it by default. We had to choose. And we chose GKE because of its simplicity. It’s a no-brainer that it is simply a safer way to go.
Oh, there’s one more thing. We really wanted to add another big Cloud provider to our stack. But we still wait to get our hands on AWS EKS and confront it with GKE.
Managing environments and configuration
Kubernetes itself is not working on the instances level as AWS EB does. Instances create a cluster Kubernetes operates on, but the smallest entity is Pod, representing some tightly coupled containers. Going up in the hierarchy of objects, there’s a ReplicaSet consisting of Pods and responsible for holding a current configuration of Pods and their scaling policy. ReplicaSets can be standalone objects, but currently a Deployment object creates ReplicaSets with every update of configuration. Then it can manage the scaling, perform rolling updates and offer simple rollbacking.
Everything is isolated in the cluster by default. If there’s a need for exposing anything, it’s done with Service object type having 3 different types: ClusterIP (exposed on the cluster level internally), NodePort (exposed on specific port number on every node of cluster), and finally – LoadBalancer type, which stands for a real Load Balancer if the cluster is backed on the cloud. It’s worth noting that with Kubernetes being deployed on AWS with kops it’s a bit more complicated to set up a Load Balancer expose type than in GKE and it was one of the advantages we took into consideration when choosing the cloud provider for Kubernetes stack.
The different approach to environment management these two services take is where all the advantages and disadvantages of both really arise. Let’s describe a few of them below.
Load Balancers and service exposition
AWS Elastic Beanstalk environment can be represented with just one instance being directly accessible and that’s a Single instance type, but the only type we really consider and use is a Scalable one, when everything is placed behind a provisioned Load Balancer (either Classic or Application load balancer). Hence, there’s a Load Balancer which is responsible for requests proxying to instances, health checks and scaling activity, and on the other side there are instances that can be easily created and terminated.
Thanks to Load Balancer’s existence, the environment can be updated with zero downtime. Sounds fine, but when there’s an application consisting of 3 environments (one for QA, one for developers and one as a staging), it means 3 different load balancers, where one costs you ~20$ a month. And it’s only when your application is a monolith. Then, depending on which approach has been chosen, there might be a separate RDS instance for each environment which results in ~15$ a month for the smallest type possible. It really becomes expensive. The separation of resources here is good, isolation is always good, but it requires new resources each time an environment is added.
With GKE, you can create one Ingress object with a LoadBalancer type Service (which results in a Load Balancer being created on Google Cloud Platform), and then all the services for different environments can be proxied with this Ingress based on paths or domains. While it creates two forwarding rules (default-backend and the Ingress) and Google priced Load Balancer based on forwarding rules, it results in ~35$ a month. It is more expensive than AWS Load Balancer, but it can be used for all environments in your project. Actually, it can be used for all projects in your cluster, so you pay just for one Load Balancer.
Gone are the days where one had to update certificate files when it reached its expiration date. Nowadays, we all try to automate as much as possible and it applies here as well. AWS EB and GKE offer different ways of SSL termination.
SSL certificates can be created for free on Amazon, but those are not the files you need to implement somewhere else, it’s just a object that you can refer to on your Cloudfront or, that’s what we want, Load Balancer.
Simply add Listener for HTTPS:443 and choose SSL Certificate from a dropdown and it’s done. Request for HTTPS will be served and all requests will reach your EB instances on port 80 (same as for HTTP). Then just check X-Forwarded-Proto header and set proper redirections. That’s how simple SSL Termination is on AWS. And it’s free.
For GKE, it’s a bit different. There’s no built-in certification you can use. Google offers defining certificates as below:
But why would you do it since automation is key here? We dropped this option right after we discovered that. Kubernetes, thankfully, has some neat extensions which with some effort can give you free, self-renewable certificates. Cert-manager is the way to go. It can be configured to use Let’s Encrypt. It cooperates with Ingress based on different Ingress Controllers, in our case it’s just nginx and gce. I have described both Ingress Controllers on my personal blog. Here we’re assuming that both are well-known to you. Since Ingress can be annotated as nginx or gce-based, cert-manager can create required paths for ACME checks there, so certification is fully automated. The domain you’re using must have A record set as IP of a Load Balancer itself or the one exposing Ingress on the Internet.
For a service exposed as LoadBalancer type or for Ingress with GCE annotation, it will create a certificate object in Google Cloud and set it as Frontend protocol as below.
If an Ingress has NGINX Ingress Controller it’s still exposed as Google Load Balancer, but it works on 4th OSI layer, while the previous example is a Layer 7 Load Balancer, so TCP traffic on 80 and 443 ports is passed to the instances. Then the controller does the routing and uses special objects called Certificates to terminate SSL when needed.
Multiple environments and resources utilization
Then there’s resources utilization. Something that has a direct impact on the prices paid month by month. AWS Elastic Beanstalk operates on the instances level. If anything is scaled up or scaled down, new instances are created or terminated respectively. And each instance needs to be paid.
Google Kubernetes Engine (and Kubernetes itself, thanks to the way it works) defines Pods. It can schedule multiple Pods on one instance. If there’s an instance with enough resources, Kubernetes can place Pods of multiple environments there and keep them reasonably isolated.
Resource utilization is the key when it comes to the costs. One instance in Kubernetes can be used for multiple environments or even for multiple projects. One instance in AWS EB stands for one “replica” of an environment of a specific project. Most of the time there’s no need to keep as many resources as EB does, hence the Kubernetes is the way to go. It’s a fact.
I believe everyone was waiting for this! Honestly, it’s like heaven and hell, but I can’t really tell what’s playing the heaven’s role here.
AWS Elastic Beanstalk, being the AWS-specific solution, relies on the AWS’ developers here. As it was mentioned, it’s based on JSON configuration file. You can just upload the configuration as a zip file, which creates the so-called application version, and this can be used further in specific environment.
AWS EB follows the configuration steps, provisions the instances according to the update policy that’s set and informs about specific steps’ results in the Event dashboard. Sometimes it’s hard to tell what’s responsible for broken deployment. Sometimes it’s just a missing environment key, sometimes an invalid command, but CloudWatch logs integration helps here. Special script present on each EB instance pushes logs to the CloudWatch (with a small help from us with .ebextensions), so we can easily check our aggregated logs and discover the problem. From the beginning, we’ve been using a Python-based application, written by us, to deploy anything from Jenkins.
GKE aggregates logs by default. There’s no additional configuration needed, which is good. But the deployment is a whole new thing. By default, Kubernetes operates on YAML files with a very specific structure. Those files are quite similar to the Dockerrun.aws.json from AWS EB, but just in theory. There’re many more options to be used and much more configuration to be taken care of. It’s not a bad thing, because it opens new doors.
Comparing to GKE, AWS EB is quite limited, actually. We’ve started with custom Python application responsible for deployment (it had support for Job and Deployment objects, rollback in case of fail and some neat features for logs printing after deployment) implemented on Jenkins. But that was not enough.
Maintaining a custom script is a hassle. What’s more, we really wanted to check something new. Hence, Spinnaker came into play. That’s a tool (formerly called Asgard) created by Netflix, then Google joined the team and they are developing this tool for a while now. It’s not just a Kubernetes support, it works well for AWS instances too (but it doesn’t support AWS EB).
Spinnaker supports any Kubernetes cluster independently from the way this cluster was created (so kops or kubeadm would work well here). At the moment, we’re just building stuff on Jenkins, then Docker images are pushed to the registries and Spinnaker is run automatically, starting its pipeline that may look like this.
We really like the Spinnaker-based deployment, actually. It’s a new experience for us and currently, If I had to choose one of those two deployments, I would definitely go for Spinnaker rather than anything done on Jenkins.
It’s a no-brainer that AWS EB and GKE really differ. AWS Elastic Beanstalk is quite simple to start playing with. Kubernetes, when done properly, helps your team reach a different level. It’s a pleasure to work with it, especially on Google Kubernetes Engine, where everything was meant to be 1:1 as in Kubernetes. While Kubernetes requires definitely more involvement at the very beginning and a knowledge while in production use, it pays off.
Currently, we’re working on both AWS EB and GKE, because such a stack gives us more flexibility and we can easily adapt to different projects’ needs. Of course we can’t wait to get access to AWS EKS (Elastic Container Service for Kubernetes), because when integrated properly with Spinnaker, we can deploy on both clouds with the same flow, the same software and the same confidence.
If you’re looking for a simple containers service to use on production, you can go with AWS EB, but when you have to cater for the whole infrastructure, different projects, changing requirements and overall variability, Kubernetes (and GKE for sure) is the way to go.