Wednesday, October 02, 2019

Cloud Foundry on Kubernetes: missing features

I already laid out some thought on the issues we need to deal with adding Cloud Foundry (CF) layer on top of Kubernetes (K8s) in my previous blog.

The CF on K8s topic sees a lot of interest especially since the creation of two projects in CF Foundation: Eirini - a thin layer that replaces the CF Diego scheduler with K8s and Quarks - provides CF components as containers.
Eirini provides a way to run CF apps on K8s by replacing Diego with K8s scheduler. You might have guessed that this alone is not enough. As we mentioned in the previous blog we need the Cloud Controller (CC) API to successfully mimic CF.

Well the simplest way would be to have the CC and all dependent components running in K8s pod. To achieve this Eirini deploys a containerized CF - the SUSE Cloud Foundry (SCF) on K8s. SCF contains gluing code to make everything work on K8s. Quarks aims to get rid of this glue and replace it with natively working on K8s components, but until then SCF is the way to go.

Additionally there is an Eirini pod running the gluing code between CF components and K8s. This includes translation of CC LRPs to K8s pods, handling of logs and routes, etc.

Feature parity?

To get a clear picture what needs to be done to have feature parity between Diego and Eirini, we need to have a look at the differences between
  • Garden and K8s containers security
  • Diego and Eirini features
Let's discuss the gaps in these two groups.

Containers security

The Eirini and Garden teams already have this assessed, so the table below is based on their investigation:



Remarks:
#1 Possible with mutating webhooks
#2 https://github.com/kubernetes/enhancements/issues/127
#3 Application is restarted after reaching the limit. The limit is configured globally for every application.
#4 Fewer masked paths than garden/docker (e.g. /proc/scsi)
#5 different implementation; not less secure
#6 not used in Eirini, Garden or K8s; https://github.com/kubernetes/frakti
#7 AppArmor is used


Low-level details and comments are available in the Garden team story.

Diego vs Eirini features

Eirini CI runs CF Acceptance Tests (CATS), so based on them here is a feature comparison based on SAP CP features (Diego based):


Remarks:
#1: Not tested
#2: Performance and log-cache route as impact; SCF/Quarks issue?
#3: Old SCF version; scalability tests will be needed
#4: use CredHub or K8s secrets? SCF/Quarks issue?
#5: no v3 support in SCF 
#7: part of Eirini migration plans; cluster per tenant/org? What about CF flow?
#8: Bits-service as private registry or K8s re-configuration
#9: Not tested; Flaky tests
#10: Isolation segments dependency; clusters in K8s?
#11: Reuse K8s services
#12: eirinix project; mutating webhooks 
#13: Not tested. Waiting on API v3 in SCF?
#14: eirinix project; (persi-eirini)

There are several features that are not tested with Eirini, mainly due to flaky CATS.

Log Cache is a feature that adds performance improvements over existing functionality, so at least for initial adoption we can live without it. To add this feature we need to add components in SCF.

Container Networking is supported in most CF offerings out there. Currently it is under tests on SAP Cloud Platform. However at scale we've seen scalability issues in vanilla CF.

CredHub is currently not supported on Eirini. The reason for this is that K8s has secrets support and can also rotate secrets.

Eirini uses SCF 2.16.4 fork that currently does not come with CF API v3 support, so Container Networking, Rolling Deployments, Security Groups and V3 features cannot be used or tested.

Docker support is currently integrated in CC and Eirini. It is supported as long as staging of apps is done with Diego stagers.

Isolation Segments and routing are not supported. This is a feature that is needed to migrate Diego backed apps to Eirini, but not very valuable inside K8s world as we already have clusters.

Support for Private Docker Registry can be achieved in two ways. One can either use the existing Bits Service part of SCF (deprecated) or configure K8s itself to pull from private repo. Currently none of these is configurable via Eirini.

Eirinix project brings SSH and Volume Services support for Eirini. This is based on mutating webhooks.


Cloud Foundry on Kubernetes

As you probably know Cloud Foundry (CF) is opinionated "Open Source Cloud Application Platform" in the PaaS space. It works with existing / pre-alocated VMs. CF then uses the VMs to spawn containers in order to increase workload density and speed of creation.

"Containers" in the paragraph above should rang a bell and you're probably thinking about similarities with Kubernetes (K8s). If not you should start thinking about this now :) Because there are a lot of similarities. 

"Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications" (#2). As I mentioned this is quite close to what CF does.

Of course there are lots of differences as well. While K8s can easily handle stateful workloads, CF refuses to support these or at least does not make things easier for you. One consequence of this is that running your own DB or Queue (Kafka for instance) is much easier done on K8s.

On the other side CF offers buildpacks to lift the burden of building secure and up-to-date docker images for your application, and can take care for your app health and scaling via a combination of opinionated requirements it has (12 Factor App) and services that are offered in the CF ecosystem.

So while the choice for services might be clear (VMs if you don't need scale or K8s cluster for elasticity), the application development space is in turmoil. 

There a numerous frameworks from packaging, deploying, CI/CD to management and FaaS. You might heard or used some of the these: Helm, Argo CD, Knative. Not only these projects change scope and deliveries quite fast, but there is also a consolidation effort needed from developers to make use of all of them. The dynamic is not quite like Node.js modules and frameworks with their twice-per-day releases, but you can still feel the disturbance in the force with every update of "minor" K8s version.

While K8s can offer much, there are people that like the simplicity and restrictions imposed by 12 Factor approach of CF (including me). 

So what would happen if we use the K8s power and add CF as a layer on top of it? One would expect this to be an quick and easy task based on what we talked so far, but there are a number of things to consider.

Applications

Both CF and K8s run apps in containers. No big deal, right? Turns out there are different approaches to spawning containers. 

CF containerization goes back to VMWare's VCAP (VMWare Cloud Application Platform) that used a shell script and a bit of gluing native C code, several years before Docker was born. It is now the Garden project, that allows CF to create containers on different back-ends like Windows, Linux and runC. We'll talk about the last one in the next paragraph.

Docker (and K8s) redefined the container world for good. They offer not only isolation but also standardized way to pack and run your code. To have "run" part standardized, Docker extracted the run code from Docker and donated it ro Open Container Initiative (OCI) as runC project.

You might already be thinking "Ok, would it be fine just to swap Garden for Docker?. Garden already uses runC backend by default in CF so not a big deal, right." Well it will work. Actually it works as proven by several attempts. Mostly.

The biggest issues however are: 
  • Garden adds more security rules and restrictions than runC defaults. Some of these restrictions heled avoid CVEs reported for K8s (CVE-2019-5736)
  • Users lose buildpacks and need to add additional build steps in their CI/CD pipelines. There is an ongoing effort to have the buildpacks "translate source code into OCI images" (#3)

Orchestration, APIs

CF uses Diego as a workload scheduler. It is driven by the CF API - the Cloud Controller. These two components define a feature set that users expect to find from every CF installation. We need to support as much as possible from this feature set on top of K8s to consider the "merge" successful.

Routing

Both CF and K8s are experimenting with Istio/Envoy as a way to handle loadbalancing, security/isolation, service discovery.

Scalability

Istio does not scale to handle 250000 app instances as required by some of the leading CF providers.

K8s does not handle more than 5000 nodes and the existing CF installations already have VMs that exceed this limit (some close to 7500 VMs)

So its obvious that a single K8s  cluster cannot be used to replace the biggest CF instance. Not that creating such a behemoth was a good idea in the first place.

We need to think of using different clusters, as this approach has additional added advantages like better isolation, operations and potentially better onboarding experience.

Services

CF ecosystem offers OSBAPI interface to abstract the interaction with external services (such as DBs, Machine Learning APIs, etc). This comes quite handy as CF apps should not care where services are running.

Having services running in a separate K8s cluster is nice idea considering again isolation, scalability, operations aspects.

While we can run all steteful services in K8s there are several stateless service that feel quite happy running in CF managed mode. CF mode here might simply mean K8s cluster overseen by CF components/pods, besides K8s itself.

Provisioning

CF is now provisioned using BOSH on VMs. I see no reason to keep this provisioning model, especially since 90% of the cloud providers already offer managed K8s clusters. 

Seems like we're moving from tools that try to bridge IaaS and PaaS world (CF on AWS, CF on Azure) to tools that work with managed K8s clusters (CF on GKE, CF on EKS). Or in other words the IaaS is moving from VMs to K8s clusters / containers.


References:

id_rsa.pub: invalid format, error in libcrypto

After I upgraded my Linux and got Python 3.10 by default, it turned out that Ansible 2.9 will no longer run and is unsupported together with...