Wednesday, October 02, 2019

Cloud Foundry on Kubernetes: missing features

I already laid out some thought on the issues we need to deal with adding Cloud Foundry (CF) layer on top of Kubernetes (K8s) in my previous blog.

The CF on K8s topic sees a lot of interest especially since the creation of two projects in CF Foundation: Eirini - a thin layer that replaces the CF Diego scheduler with K8s and Quarks - provides CF components as containers.
Eirini provides a way to run CF apps on K8s by replacing Diego with K8s scheduler. You might have guessed that this alone is not enough. As we mentioned in the previous blog we need the Cloud Controller (CC) API to successfully mimic CF.

Well the simplest way would be to have the CC and all dependent components running in K8s pod. To achieve this Eirini deploys a containerized CF - the SUSE Cloud Foundry (SCF) on K8s. SCF contains gluing code to make everything work on K8s. Quarks aims to get rid of this glue and replace it with natively working on K8s components, but until then SCF is the way to go.

Additionally there is an Eirini pod running the gluing code between CF components and K8s. This includes translation of CC LRPs to K8s pods, handling of logs and routes, etc.

Feature parity?

To get a clear picture what needs to be done to have feature parity between Diego and Eirini, we need to have a look at the differences between
  • Garden and K8s containers security
  • Diego and Eirini features
Let's discuss the gaps in these two groups.

Containers security

The Eirini and Garden teams already have this assessed, so the table below is based on their investigation:



Remarks:
#1 Possible with mutating webhooks
#2 https://github.com/kubernetes/enhancements/issues/127
#3 Application is restarted after reaching the limit. The limit is configured globally for every application.
#4 Fewer masked paths than garden/docker (e.g. /proc/scsi)
#5 different implementation; not less secure
#6 not used in Eirini, Garden or K8s; https://github.com/kubernetes/frakti
#7 AppArmor is used


Low-level details and comments are available in the Garden team story.

Diego vs Eirini features

Eirini CI runs CF Acceptance Tests (CATS), so based on them here is a feature comparison based on SAP CP features (Diego based):


Remarks:
#1: Not tested
#2: Performance and log-cache route as impact; SCF/Quarks issue?
#3: Old SCF version; scalability tests will be needed
#4: use CredHub or K8s secrets? SCF/Quarks issue?
#5: no v3 support in SCF 
#7: part of Eirini migration plans; cluster per tenant/org? What about CF flow?
#8: Bits-service as private registry or K8s re-configuration
#9: Not tested; Flaky tests
#10: Isolation segments dependency; clusters in K8s?
#11: Reuse K8s services
#12: eirinix project; mutating webhooks 
#13: Not tested. Waiting on API v3 in SCF?
#14: eirinix project; (persi-eirini)

There are several features that are not tested with Eirini, mainly due to flaky CATS.

Log Cache is a feature that adds performance improvements over existing functionality, so at least for initial adoption we can live without it. To add this feature we need to add components in SCF.

Container Networking is supported in most CF offerings out there. Currently it is under tests on SAP Cloud Platform. However at scale we've seen scalability issues in vanilla CF.

CredHub is currently not supported on Eirini. The reason for this is that K8s has secrets support and can also rotate secrets.

Eirini uses SCF 2.16.4 fork that currently does not come with CF API v3 support, so Container Networking, Rolling Deployments, Security Groups and V3 features cannot be used or tested.

Docker support is currently integrated in CC and Eirini. It is supported as long as staging of apps is done with Diego stagers.

Isolation Segments and routing are not supported. This is a feature that is needed to migrate Diego backed apps to Eirini, but not very valuable inside K8s world as we already have clusters.

Support for Private Docker Registry can be achieved in two ways. One can either use the existing Bits Service part of SCF (deprecated) or configure K8s itself to pull from private repo. Currently none of these is configurable via Eirini.

Eirinix project brings SSH and Volume Services support for Eirini. This is based on mutating webhooks.


1 comment:

Anonymous said...

Services CATS used to work in Eirini.
There was an issue in CAPI that made the tests flaky - https://github.com/cloudfoundry/cloud_controller_ng/issues/1382
The tests itself were quite flaky and slow, so we have disabled them in our CI. Eirini does not touch this functionality, it should just work.

id_rsa.pub: invalid format, error in libcrypto

After I upgraded my Linux and got Python 3.10 by default, it turned out that Ansible 2.9 will no longer run and is unsupported together with...