Wednesday, July 24, 2013

SAP HANA Cloud: Updating productive applications

So far the update of the productive applications was entirely in the hands of the developers. Not necessarily a bad thing, but this required lots of boilerplate code that every application has to embed.

The new bi-weekly update of the HANA Cloud will introduce small feature that however will enable the customers to update their application with reduced or no-downtime at all. 

On HANA Cloud you can have one or more instances of your application and each of these instances is called an application process. Previously the allowed actions were:
  • start of a single application process
  • stop of all application processes at once

This however implies that you can increase the worker application processes (scale up), but you cannot scale down.
 
What's new is that you can finally stop a specific application process running on the HANA Cloud's compute units by specifying its process ID. If you wonder what the heck do I mean here is some glossary-style explanations:
  • compute unit - look at it as a hardware box, you anyway pay for this to get more CPUs and memory
  • application process - the software that runs on top of the hardware - basically the SAP server that in turn hosts your own application code. 
  • process ID - the unique ID associated with every  application process. Used as application process name in commands (as the term suggests).

So with this minimal change from user perspective now you can achieve:
  • scaling up & down
  • ageing
  • rolling update / zero downtime

Let's see what these three things mean...

Scaling your application
 

HANA Cloud provided the ability to scale your application from the very beginning of its existence. 

As I said customers were allowed to start new processes, but didn't have the ability to stop a single process. Now this is fixed and you can easily include the new application process id parameter in the stop command:
neo stop myapplication.properties --application-process-id <id>
The list of process IDs is displayed after you issue start command, or you can use the status command to have the list printed. In both cases you'll get something similar to the output below:

Application processes
  ID                                           Status
  a182761d75b18b6fe17ed4285089d6447ae4ab3c     STARTED
  385b2cacd896c45dd39c8f444774329869282b80     PENDING
The next step would be to copy the ID and use it in the stop command like this:
neo stop myapplication.properties --application-process-id a182761d75b18b6fe17ed4285089d6447ae4ab3c
The above command will stop the first process, leaving you only one application process to handle the incoming user requests.

Ageing

The ageing is a way to deal with applications that have issues with resource consumption. They either get too slow or consume too much memory. 

This may be due to badly written code, the use of 3rd party library that has leaks or whatever other reason you may think of. You may recognize this approach from home routers or other home appliances that have poorly written firmware, suffer from bad hardware design or most often both :)

In HANA Cloud thanks to the process id you can stop the unhealthy application processes and start new fresh ones to replace them

Rolling update or Zero Downtime

The most interesting application of the new process ID is to update your application. 

In general you can update your application in three ways:
  • without your customers notice anything (zero downtime)
  • before your customers notice anything (rolling update)
  • after your customers notice a warning (maintenance page)

The maintenance page approach includes adding a banner, window or in general something flashy to get the customer attention and inform them that from day/hour 1 to day/hour 2 they will not be able to access the application. This however is quite disruptive since you'll be out of business while updating and your customers have to be informed and to (eagerly?!?) expect this.

In most cases customers are quite unhappy with the notice/maintenance approach so you'll want to do the update with one of the next two approaches. They both require that old versions of your application can work together with new versions of the same code and data. If this is not the case then you either have to stick to the maintenance page or redesign your application.

If both old and new versions of your application can work together you may decide to stop/disable the new functionality until all processes are updated. This may be needed to avoid backward incompatible data reaching the database or being sent via some channel. 

This means that customers may still use the application as they used to, but some will eventually notice the new disabled functions until you roll out the new version.

If there are only minor changes (or your application can cope with the changes) you may decide to simply replace all nodes one by one and have a real zero downtime update.

Should I stop or should I start?

The rolling update and zero downtime approaches require that a new process is started before stopping an old one. This in general helps to keep the ability of your application to process a certain amount of requests. Stopping before starting would effectively scale down your application, so I would recommend start before stop.

Of course using the maintenance page approach will in most cases require you to stop the whole application without using process IDs at all.

Killing Me Softly

Before you can stop an application process you'll want to stop all incoming requests to it. We have in the pipeline the disable command to help you do this.
 
The problem most operators would face is how to understand when to stop the application or the process without affecting user sessions or data. 

To check the active sessions, you need configure JMX checks for your application by executing the following command:
neo create-jmx-check --account <your account> --application <application name> --user <e-mail or user> --name "ActiveSessions" -object name <object name of the MBean you want to call> --attribute activeSessions --host <SAP HANA Cloud host>
This check allows you to view the number of active HTTP sessions per application (per Web context, the context is part of the object name). 

An example invocation that checks for context path /demo would look like:
neo create-jmx-check -a myaccount -b demo -u s1234567 -n "ActiveSessions" -O "Catalina:type=Manager,context=/demo,host=localhost" -A activeSessions --host neo.ondemand.com
Currently the HANA Cloud support for custom maintenance page and the disable command are non-existent but are working on this.

No comments:

id_rsa.pub: invalid format, error in libcrypto

After I upgraded my Linux and got Python 3.10 by default, it turned out that Ansible 2.9 will no longer run and is unsupported together with...