Wednesday, October 02, 2019

Cloud Foundry on Kubernetes

As you probably know Cloud Foundry (CF) is opinionated "Open Source Cloud Application Platform" in the PaaS space. It works with existing / pre-alocated VMs. CF then uses the VMs to spawn containers in order to increase workload density and speed of creation.

"Containers" in the paragraph above should rang a bell and you're probably thinking about similarities with Kubernetes (K8s). If not you should start thinking about this now :) Because there are a lot of similarities. 

"Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications" (#2). As I mentioned this is quite close to what CF does.

Of course there are lots of differences as well. While K8s can easily handle stateful workloads, CF refuses to support these or at least does not make things easier for you. One consequence of this is that running your own DB or Queue (Kafka for instance) is much easier done on K8s.

On the other side CF offers buildpacks to lift the burden of building secure and up-to-date docker images for your application, and can take care for your app health and scaling via a combination of opinionated requirements it has (12 Factor App) and services that are offered in the CF ecosystem.

So while the choice for services might be clear (VMs if you don't need scale or K8s cluster for elasticity), the application development space is in turmoil. 

There a numerous frameworks from packaging, deploying, CI/CD to management and FaaS. You might heard or used some of the these: Helm, Argo CD, Knative. Not only these projects change scope and deliveries quite fast, but there is also a consolidation effort needed from developers to make use of all of them. The dynamic is not quite like Node.js modules and frameworks with their twice-per-day releases, but you can still feel the disturbance in the force with every update of "minor" K8s version.

While K8s can offer much, there are people that like the simplicity and restrictions imposed by 12 Factor approach of CF (including me). 

So what would happen if we use the K8s power and add CF as a layer on top of it? One would expect this to be an quick and easy task based on what we talked so far, but there are a number of things to consider.

Applications

Both CF and K8s run apps in containers. No big deal, right? Turns out there are different approaches to spawning containers. 

CF containerization goes back to VMWare's VCAP (VMWare Cloud Application Platform) that used a shell script and a bit of gluing native C code, several years before Docker was born. It is now the Garden project, that allows CF to create containers on different back-ends like Windows, Linux and runC. We'll talk about the last one in the next paragraph.

Docker (and K8s) redefined the container world for good. They offer not only isolation but also standardized way to pack and run your code. To have "run" part standardized, Docker extracted the run code from Docker and donated it ro Open Container Initiative (OCI) as runC project.

You might already be thinking "Ok, would it be fine just to swap Garden for Docker?. Garden already uses runC backend by default in CF so not a big deal, right." Well it will work. Actually it works as proven by several attempts. Mostly.

The biggest issues however are: 
  • Garden adds more security rules and restrictions than runC defaults. Some of these restrictions heled avoid CVEs reported for K8s (CVE-2019-5736)
  • Users lose buildpacks and need to add additional build steps in their CI/CD pipelines. There is an ongoing effort to have the buildpacks "translate source code into OCI images" (#3)

Orchestration, APIs

CF uses Diego as a workload scheduler. It is driven by the CF API - the Cloud Controller. These two components define a feature set that users expect to find from every CF installation. We need to support as much as possible from this feature set on top of K8s to consider the "merge" successful.

Routing

Both CF and K8s are experimenting with Istio/Envoy as a way to handle loadbalancing, security/isolation, service discovery.

Scalability

Istio does not scale to handle 250000 app instances as required by some of the leading CF providers.

K8s does not handle more than 5000 nodes and the existing CF installations already have VMs that exceed this limit (some close to 7500 VMs)

So its obvious that a single K8s  cluster cannot be used to replace the biggest CF instance. Not that creating such a behemoth was a good idea in the first place.

We need to think of using different clusters, as this approach has additional added advantages like better isolation, operations and potentially better onboarding experience.

Services

CF ecosystem offers OSBAPI interface to abstract the interaction with external services (such as DBs, Machine Learning APIs, etc). This comes quite handy as CF apps should not care where services are running.

Having services running in a separate K8s cluster is nice idea considering again isolation, scalability, operations aspects.

While we can run all steteful services in K8s there are several stateless service that feel quite happy running in CF managed mode. CF mode here might simply mean K8s cluster overseen by CF components/pods, besides K8s itself.

Provisioning

CF is now provisioned using BOSH on VMs. I see no reason to keep this provisioning model, especially since 90% of the cloud providers already offer managed K8s clusters. 

Seems like we're moving from tools that try to bridge IaaS and PaaS world (CF on AWS, CF on Azure) to tools that work with managed K8s clusters (CF on GKE, CF on EKS). Or in other words the IaaS is moving from VMs to K8s clusters / containers.


References:

No comments:

id_rsa.pub: invalid format, error in libcrypto

After I upgraded my Linux and got Python 3.10 by default, it turned out that Ansible 2.9 will no longer run and is unsupported together with...