Wednesday, July 24, 2013

SAP HANA Cloud: Updating productive applications

So far the update of the productive applications was entirely in the hands of the developers. Not necessarily a bad thing, but this required lots of boilerplate code that every application has to embed.

The new bi-weekly update of the HANA Cloud will introduce small feature that however will enable the customers to update their application with reduced or no-downtime at all. 

On HANA Cloud you can have one or more instances of your application and each of these instances is called an application process. Previously the allowed actions were:
  • start of a single application process
  • stop of all application processes at once

This however implies that you can increase the worker application processes (scale up), but you cannot scale down.
 
What's new is that you can finally stop a specific application process running on the HANA Cloud's compute units by specifying its process ID. If you wonder what the heck do I mean here is some glossary-style explanations:
  • compute unit - look at it as a hardware box, you anyway pay for this to get more CPUs and memory
  • application process - the software that runs on top of the hardware - basically the SAP server that in turn hosts your own application code. 
  • process ID - the unique ID associated with every  application process. Used as application process name in commands (as the term suggests).

So with this minimal change from user perspective now you can achieve:
  • scaling up & down
  • ageing
  • rolling update / zero downtime

Let's see what these three things mean...

Scaling your application
 

HANA Cloud provided the ability to scale your application from the very beginning of its existence. 

As I said customers were allowed to start new processes, but didn't have the ability to stop a single process. Now this is fixed and you can easily include the new application process id parameter in the stop command:
neo stop myapplication.properties --application-process-id <id>
The list of process IDs is displayed after you issue start command, or you can use the status command to have the list printed. In both cases you'll get something similar to the output below:

Application processes
  ID                                           Status
  a182761d75b18b6fe17ed4285089d6447ae4ab3c     STARTED
  385b2cacd896c45dd39c8f444774329869282b80     PENDING
The next step would be to copy the ID and use it in the stop command like this:
neo stop myapplication.properties --application-process-id a182761d75b18b6fe17ed4285089d6447ae4ab3c
The above command will stop the first process, leaving you only one application process to handle the incoming user requests.

Ageing

The ageing is a way to deal with applications that have issues with resource consumption. They either get too slow or consume too much memory. 

This may be due to badly written code, the use of 3rd party library that has leaks or whatever other reason you may think of. You may recognize this approach from home routers or other home appliances that have poorly written firmware, suffer from bad hardware design or most often both :)

In HANA Cloud thanks to the process id you can stop the unhealthy application processes and start new fresh ones to replace them

Rolling update or Zero Downtime

The most interesting application of the new process ID is to update your application. 

In general you can update your application in three ways:
  • without your customers notice anything (zero downtime)
  • before your customers notice anything (rolling update)
  • after your customers notice a warning (maintenance page)

The maintenance page approach includes adding a banner, window or in general something flashy to get the customer attention and inform them that from day/hour 1 to day/hour 2 they will not be able to access the application. This however is quite disruptive since you'll be out of business while updating and your customers have to be informed and to (eagerly?!?) expect this.

In most cases customers are quite unhappy with the notice/maintenance approach so you'll want to do the update with one of the next two approaches. They both require that old versions of your application can work together with new versions of the same code and data. If this is not the case then you either have to stick to the maintenance page or redesign your application.

If both old and new versions of your application can work together you may decide to stop/disable the new functionality until all processes are updated. This may be needed to avoid backward incompatible data reaching the database or being sent via some channel. 

This means that customers may still use the application as they used to, but some will eventually notice the new disabled functions until you roll out the new version.

If there are only minor changes (or your application can cope with the changes) you may decide to simply replace all nodes one by one and have a real zero downtime update.

Should I stop or should I start?

The rolling update and zero downtime approaches require that a new process is started before stopping an old one. This in general helps to keep the ability of your application to process a certain amount of requests. Stopping before starting would effectively scale down your application, so I would recommend start before stop.

Of course using the maintenance page approach will in most cases require you to stop the whole application without using process IDs at all.

Killing Me Softly

Before you can stop an application process you'll want to stop all incoming requests to it. We have in the pipeline the disable command to help you do this.
 
The problem most operators would face is how to understand when to stop the application or the process without affecting user sessions or data. 

To check the active sessions, you need configure JMX checks for your application by executing the following command:
neo create-jmx-check --account <your account> --application <application name> --user <e-mail or user> --name "ActiveSessions" -object name <object name of the MBean you want to call> --attribute activeSessions --host <SAP HANA Cloud host>
This check allows you to view the number of active HTTP sessions per application (per Web context, the context is part of the object name). 

An example invocation that checks for context path /demo would look like:
neo create-jmx-check -a myaccount -b demo -u s1234567 -n "ActiveSessions" -O "Catalina:type=Manager,context=/demo,host=localhost" -A activeSessions --host neo.ondemand.com
Currently the HANA Cloud support for custom maintenance page and the disable command are non-existent but are working on this.

Wednesday, July 03, 2013

SAP HANA Cloud: Multiple connections deployment

Recently we found out that some networks use shaping for connections to SAP HANA Cloud. Shaping is not really a surprise but what astonished us was that the speed was limited to 700 KiB per second, making deployment of large archives a problem.

For example we had a case where a 140 MiB archive was uploaded for 1 hour and 30 minutes. This brought back the times when I downloaded Apple //e disks from BBS via 300 kbps modem for 5 hours.

To solve the issue we came back with the idea to use multiple connections and workaround the issue. This required changes in the client  (NEO CLI in SAP HANA Cloud SDK) and the server. 

Once we had the implementation completed we had the following data from our tests:

Slow network


The approach we used reduced the deploy time from ~30 minutes to ~3 minutes. As we can see the network in Vancouver can handle up to 8 connections  and increasing the number of connection does not make sense since the upload time increases.

Average network

In Palo Alto we managed to improve the deployment time from ~7 minutes to ~1 minute. This network allows for a great number of connections and the maximum transfer rates were reached with 30 connections.

Fast network


The network in Bulgaria allows for up to 3 connections. Even in this network we can see that the transfer rate is increased by increasing the number of connections.

Possible problems

Some networks will terminate the connections if a limit is reached or just hold the transfer until the connections number is under some threshold. Currently this will break the deployment.

When / How can I try this?

We will use 2 connections by default but you will be able to use the --connections parameter when deploying and:
  • revert to the old behaviour by specifying 1 connection
  • either stick with the default or increase to the maximum allowed 6 connections
Please keep in mind that we will revert to one connection if your deploy archive is under 5 MiB.

We expect this new feature to appear with the next update of SAP HANA Cloud SDK. To check if it is there just try the --connections parameter :)

Monday, July 01, 2013

SAP HANA Cloud: Automating your deployments

Why command line?

Because you can automate the cloud operations using shell scripts or continuous integration servers (Jenkins/Hudson for instance).

Additionally the command line interface (CLI) allows much faster development cycles, and allows us to accumulate the customer feedback than some visual tool. This is partially due to the lack of complicated UI design reuired to get a "native" IDE look and feel.

Application context root

You may have noticed that we used "ROOT.war" as archive name in my first blog. The reason for this was that I wanted to use the "/" context root and access the application without context path.

The name of the application determines the context path as mentioned in the official documentation.

Scaling your application

In SAP HANA Cloud you can scale your application:
  • vertically - adding more resources to a single application node
  • horizontally - adding more application nodes

To scale your application vertically you just need to specify the compute unit you'd like to use. HANA Cloud offers several compute unit sizes that increase in both CPU and memory.

Horizontal scaling is possible as well by deploying with two additional parameters:
  • minimum number of nodes that can handle the usual load of requests to your application
  • maximum number of nodes that you can afford to pay for :)

The horizontal scalling can be done for instance with the following deploy parameters:
neo deploy mytemplate.properties --minimum-processes 2 --maximum-processes 4
The above command line means that I want to have at least 2 application processes and my application can handle unlimited load with 4 processes (or most probably that I can afford to pay for just 4 compute units).

Please note that the trial landscape does not support neither vertical nor horizontal scaling as mentioned in the account types description.

Start / Stop / Restart

The start command actually provides you with an application process from the deployed binaries. You can repeat the start as many times as the number of your compute units. In other words - if you have bought 4 units you cannot start 5 application processes.

The default behaviour is that the start is asynchronous. This allows you to quickly start as many processes as you want without waiting for them to finish.

However if you want to be sure that the processes are started you can:
  • poll for their status (status command)
  • use the --synchronous flag

To stop the whole application use the stop command. At present you cannot stop a single application process but this is in the pipeline.

The restart command is a convinience shortcut and actually does the same as stop with --synchronous flag and subsequent start. The start can be synchronous or not (depends on the parameters you used).

Managing multiple applications

Probably you already know well the properties file you can use with all NEO CLI commands. As you noticed this properties file serves two purposes:
  • a placeholder for your settings
  • automation helper

What you probably missed is that the file can be used as well to manage:
  • multiple applications
  • multiple command parameters

Let's assume that the your file contains:
application = test
If you use neo start with this file you'll start one application process for the test application. However what if you used:
neo start myfile.properties --application demo
The result from the above command is that the demo application would be started. To allow this the CLI parameters take precedence over the properties file values.

The properties file can also hold parameters that are not needed by the currently executed command, but are meant to be used for subsequent commands. 

In this way we can use the same file for most of our needs and amend the parameters on the command line only in some rare cases.

Undeploy

Undeploy command does what its name says (I hope) - it simply deletes the binaries you uploaded with deploy command.

To successfully execute the command however you have to first stop your application. This is a way to protect not only ourselves from requests such as "I deleted my application and now I cannot start it any-more", but our customers as well from making such stupid mistakes.

Passwords

The password required for all of the above commands:
  • cannot be stored in the template properties file
  • can be passed as CLI parameter
We currently accept plain-text password, so we don't want to force users to store it in properties file. This is the shortest road to email the password to someone accidentally.

In the same time to enable automation we accept the --password parameter. Since this is a CLI parameter this has an important implication - the password requires some pre-processing if it contains special characters depending on the shell or command interpreter used.  

For example:
  • the ! character has to be escaped as ^! in Windows
  • if you have space in the password (pass word) you have to quote it ("pass word")

We redirect users to "your console/shell user guide on how to use special characters as command line arguments", but with Windows this seems to be a nightmare since there is almost no official resources, rather than for the retired Windows XP.

Proxy

The proxy settings can also be a pain so my advice here is to use the console to set the proxy. Setting proxy globally for the system, using a fancy UI, almost certainly means that the console is being left out of scope by the developers of this UI :)

You can check our minimal effort to explain how to use proxy with NEO CLI. Have in mind that the continuous integration servers, VCS tools or other executables may also require proxy set for the user you are running them with.

id_rsa.pub: invalid format, error in libcrypto

After I upgraded my Linux and got Python 3.10 by default, it turned out that Ansible 2.9 will no longer run and is unsupported together with...