Sunday, October 7, 2012

wssh: Handy utility for WebSocket testing

WebSockets are a pretty darn cool evolution of the web, but with any new technology comes some challenges providing adequate tools for testing and exploration.  For HTTP, we have curl.  For plain sockets, we have netcat.  WebSockets now have wssh.  I stumbled upon this very useful tool recently in starting up my own advanced WebSocket project.

Unfortunately, there were some glaring bugs and annoyances, in particular in the area of automation and general netcat compliance.  So, I brushed off my Python skills and I'm happy to report that after the weekend was over, the tool had matured quite a lot and was ready for some test automation.

Hoping to move the project to some QA folks for automation, I decided to whip up a quick recipe that demonstrates how the tool can be used to verify both client and server behaviour with minimal headache:

A simple echo server test

#!/bin/bash

set -e
(echo foo; echo bar; echo baz) > testinput
cat <(cat testinput | while read a; do echo $a; sleep 1; done) | wssh -q 0 ws://echo.websocket.org/ > testoutput
cmp testinput testoutput
rm -f testinput testoutput

This script tests ws://echo.websocket.org under "real-world" usage simulating a 1 second delay between each line of input.  This concept can be extended much further with some basic bash-fu, creating some fairly sophisticated unit tests.

Monday, September 17, 2012

Significant keep-alive performance flaw in iOS' NSURLConnection

The problem

I've recently done a lot of research to understand a perplexing performance problem haunting the iOS port of a commercial component I maintain at work (originally developed for Android).  Specifically, the problem presents itself when the user navigates repeatedly to a specific screen in the application which issues an HTTP HEAD request (via HTTPS) to determine if a particular static resource has been modified.  Normally, you would think, a quick and light-weight operation.

However, the performance penalties we were seeing in this simple usage of NSURLConnection was a measurable delay of approximately 600ms, despite a round-trip time to our servers of only ~80ms over our local Wi-Fi network.  Now I'm sure at this point a well informed reader is going to exclaim: but wait, this is HTTPS, there's a well known and significant handshaking penalty versus HTTP!  Of course, this same penalty exists for our Android version of the application, however this workflow on Android does not present any significant delay.

So, what's going on?

The short answer: the connection has been closed.  While NSURLConnection does support HTTP keep-alive, it has a client-imposed timeout of just 12 seconds even for HTTPS.  This may seem like a reasonable timeout when you look at the problem with 1995 goggles: loading a complete web page efficiently without requiring a separate connection for each resource on the page.  However, in today's world the model is much richer.  Web services dominate the landscape of mobile applications (even when the browser is the client), using the web for more discrete data exchange where appropriate and moving the rest client-side.

And this data exchange frequently happens over HTTPS.  Borrowing from Google's own SPDY proposal: the future of the web depends on a secure network connection.  SPDY requires SSL and yet is designed for efficiency.  A paradox only if you don't embrace long keep-alive timeouts.

Does it matter?

The practical implication of NSURLConnection's short timeouts for users is a noticeable but not immediately reproducible delay experienced within applications that utilize web services over HTTPS.  A common user scenario making for an easy repro case on a large number of mobile apps out there (go ahead, try yours) is the login screen.  Logging in implies a secure connection, and often users are expected to take some time to awkwardly type their cat's name: Mr!Cuddl3s.  I timed myself with this exercise: 13 seconds.  In this amount of time, any HTTPS connections already opened to your application's domain have been closed and so here comes the 600ms+ SSL connection penalty before the user sees a frustrating message: "Your username or password does not match".  Returning to the input field, you recheck your username, delete the password, and start again.  If you were fast enough, you're quickly ushered off to the main screen for the application.  If, however, you took more than 12 seconds again to retype things, you will see that penalty one more time.  This is likely to repeat many times in the user's interaction with your application: idling on various screens reading content, setting the phone down for just a moment, etc.

TCP/SSL handshaking overhead shown in the Amazon Mobile app (Android  left, iOS right)
The above image and linked YouTube video demonstrates the impact from the user's perspective of the 12 second connection timeout.  As you see, the first request is much slower on iOS because Android has re-used the previously established HTTPS connection set-up through prior interaction with the app.  Subsequent requests are identical in performance until after the user idles for 12 seconds again, bringing the favor back to Android once again.

But what about optimization best practices?

There is significant precedence on the web suggesting that this is not a reasonable default.  A variety of libraries that I sampled do not exhibit this behaviour (Apache HttpClient, Android's implementation of HttpURLConnection, and serf).  None had such a short timeout, let alone even a client-initiated shutdown at all.  The same conclusion holds true for HTTPS servers in production: a sampling of Google, Facebook, and Amazon servers suggest they are all willing to let HTTPS connections linger for minutes, not seconds.

Surely there is some reasoned thinking behind this behaviour, right?  Well, perhaps.  I can find little material publicly available except a brief thread on Apple's Mac Network Programming mailing list where an Apple engineer draws the conclusion that the closure is to support the radio's ability to enter an idle state.  While it is true that cellular radios have complex power saving state management (searching the web for material on Radio Resource Control [RRC] or Fast Dormancy has no shortage of white papers), I question the argument that the connection closure truly is in alignment with the details of the radio.

The first piece of evidence against this comes from the nature of RRC's lack of specificity or standard on timing windows.  Apple simply cannot know whether the network would ask the device to enter this idle state in 2 seconds, 10 seconds, or 15 seconds.  So a naive, hardcoded timeout of 12 seconds is seemingly just as likely to incur the worst possible performance as it is the best.

Furthermore, the hardcoded timeout is consistent between 3G and Wi-Fi connections which of course have very different radio stacks and performance optimizations.  In the case of Wi-Fi, this appears to have even changed significantly from iOS 4 to iOS 5, with no change to NSURLConnection's behaviour.  In my  extremely informal testing on iOS 5, I found that the radio entered a low power state in just a few seconds, indicating that the radio is to resume normal operation to handle the connection closure if the user switches off the screen just a few seconds after an HTTP request.  Hardly an edge case.

Admittedly, we're getting into white paper territory ourselves here to properly and convincingly prove that this timeout harms battery life (let alone carrier network performance!).  I'm not prepared to go through that much rigor just yet.  Instead, I'd like to simply make a call to Apple to reconsider this behaviour and re-evaluate their own internal research.  And, if you're reading Apple, please feel free to reach out to me if there is interest in a thorough and more academic study on this topic.  I'd be happy to help.

What next?

NSURLConnection is intentionally opaque, offering no features to customize the underlying mechanics, making it easy for Apple to adjust them without fear of breaking backward compatibility.  Unfortunately this means that there's no convenient way to work around the problem if your app uses NSURLConnection.  While it is possible to switch to a third party such as ASIHTTPRequest, I don't personally recommend this approach as this transition makes it more difficult for Apple to implement or enforce reasonable connection management policies.

The API is designed for Apple to implement best practices, so let's ask that they do exactly that.

Tuesday, April 13, 2010

Logcat, Improved.

Logcat is a staple for most Android developers out there and I'm certainly no exception. I often have at least one terminal dedicated to logcat with sometimes many more with various combinations of options to control how I'm filtering the output.

Recently it occurred to me that most of this work is designed to separate my program from the noise of the entire platform. The numeric argument printed after the tag in the logcat output is the pid responsible for that log line which I had used in the past to do this sort of filtering but it was a pain when the app crashed or was reinstalled because the pid would change.

Enter my proclogcat script. This script tracks the pid as the process is killed and restarted and takes care of automating the adb shell ps | grep <process> logic on first launch. The best part is the script can be combined with Jeffrey Sharkey's excellent coloredlogcat script (or my modified version of it) for beautiful results.



Download: proclogcat

To use, simply copy it somewhere in your PATH and invoke either manually as adb logcat | proclogcat <process> or in a function as is discussed in the script source code.

Tuesday, December 15, 2009

Gracefully supporting multiple Android platform versions in a single release

The Android platform has been aggressively updating since version 1.0 and now we're starting to a see a much more interesting mix of device types, manufacturers, and even platform versions out in the wild. Unfortunately sometimes this can be frustrating for developers wanting to look forward to support new features and conveniences, but to still support devices that are on longer update cycles (like with the G1).

The pattern shown here will deal with multiple platform versions although can easily be applied in other situations. First of all, let's start with a preface about minSdkVersion, targetSdkVersion, and the Eclipse target platform. The *SdkVersion attributes are defined in the manifest <uses-sdk> tag and define the minimum platform version your app can be installed onto (and tested on!), and the highest version that you tested to and were aware of during development. It is important that you test your application on all versions between and including min and target. The Eclipse target platform is the specific version that Eclipse will be compiling against, this is what permits us to compile code that actually does link specifically against the newer platform features. This is usually set the same as your targetSdkVersion.

Now let's consider a practical example of a music player application which needs to implement a service in the foreground state during playback. Prior to API level 5, this was done with the Service.setForeground call, but level 5 and beyond deprecated this method due to widespread abuse. Instead, a new method was introduced (Service.startForeground) which can be used to achieve this affect as well as setting an ongoing notification in the status bar. In many ways this is handy as the notification and foreground state were naturally already tied together, now there's an API combining them. But problems start when you try to test new code using this method on platform versions below 2.0 (API level 5). Specifically, Dalvik will throw a VerifyError when attempting to initialize the class containing the call to startForeground for the first time, even if the call is in a conditional statement. This method does not exist on pre-2.0 devices, and so cannot be included in your code in this way.

A naive approach would be to simply use reflection to test for and execute startForeground, but thankfully Java offers a much more elegant design pattern for just this sort of thing. The basic idea is to create an abstract API that the rest of your application can access which hides the specific implementation of what's being performed, and does so in such a way that prevents the VM from initializing an unsupported class on an older platform. So you might try defining something like this:


public abstract class PlayerNotification {
public static PlayerNotification getInstance() {
if (Integer.parseInt(Build.VERSION.SDK) <= 4)
return PreEclair.Holder.sInstance;
else
return EclairAndBeyond.Holder.sInstance;
}

public abstract void showNotification(Service context, int id, Notification notification);
public abstract void hideNotification(Service context, int id);

private static class PreEclair extends PlayerNotification {
...
}

private static class EclairAndBeyond extends PlayerNotification {
...
}
}


Now your service could be modified to make use of this new abstract API as such:


public MyService extends Service {
private static final int NOTIF_PLAYING = 1;
private final PlayerNotification mNotification =
PlayerNotification.getInstance();

...

public void setForegroundAndShowNotification(Notification n) {
mNotification.showNotification(this, NOTIF_PLAYING, n);
}

public void stopForegroundAndHideNotification() {
mNotification.hideNotification(this, NOTIF_PLAYING);
}
}


Great, this sounds very simple and easy to follow. Let's return to the full implementation of PlayerNotification:


private static class PreEclair extends PlayerNotification {
private static class Holder {
private static final PreEclair sInstance = new PreEclair();
}
private NotificationManager getNotificationManager(Context context) {
return (NotificationManager)context.getSystemService(Context.NOTIFICATION_SERVICE);
}
public void showNotification(Service context, int id, Notification n) {
context.setForeground(true);
getNotificationManager(context).notify(id, n);
}
public void hideNotification(Service context, int id) {
context.setForeground(false);
getNotificationManager(context).cancel(id);
}
}

private static class EclairAndBeyond extends PlayerNotification {
private static class Holder {
private static final EclairAndBeyond sInstance = new EclairAndBeyond();
}
public void showNotification(Service context, int id, Notification n) {
context.startForeground(id, n);
}
public void hideNotification(Service context, int id) {
context.stopForeground(id);
}
}


And that's it as far as code goes! Assuming that you have already updated your AndroidManifest.xml to include the appropriate <uses-sdk> attributes, you're ready to start testing. Use the Android SDK tools to create AVDs for each of the major platform releases from your minimum supported version to your current target and deploy your app on each to make sure you have not made any mistakes.

For further reading about how Java guarantees this approach, read about the initialization on demand holder idiom. This is what allows us to prevent the wrong implementing class from initializing in the VM (and thus causing verification errors).

You can find two working examples of this pattern in my Five app: one using reflection and one matching the explained example

Tuesday, March 17, 2009

Building, running, and debugging Android source

There is a lot of confusion surrounding the work flow in the Android source tree, so allow me to simplify:
  1. Follow the initial instructions for downloading the source at:

    http://source.android.com/download

  2. Set up your environment to build the engineering build for the generic device and generic product. This is similar to the SDK, but with a few pieces missing.

    $ source build/envsetup.sh
    $ lunch 1

  3. To build for the first time:

    $ make

    If you have a multi-core system, you can build with make -jN where N is twice the number of cores on your machine. This should speed up the first build considerably.

  4. To launch the emulator from your build:

    $ ./out/host/<your-machine-type>/bin/emulator

    On my system <your-machine-type> is linux-x86.

    NOTE: The emulator knows where to find system and data images as a result of running lunch 1 above. This sets the environment variable ANDROID_PRODUCT_OUT to point to the target directory. For this example, it should be out/target/product/generic/.

  5. If you wish to make changes to the source code, there are handy utilities that have been exposed to your environment by source build/envsetup.sh above. For example, if you modify the Email app and just want to rebuild it:

    $ mmm packages/apps/Email

  6. To see your changes in the emulator you can run:

    $ adb remount
    $ adb sync


    Which will copy the regenerated Email.apk file into the emulator's /system/app folder, triggering the PackageManager to automatically reinstall it.

  7. Or if you change framework resources in frameworks/base/core/res/res/ you could regenerate framework-res.apk with:

    $ mmm frameworks/base/core/res

    Or if you modified even the framework itself you could run:

    $ mmm frameworks/base

    To sync these changes you must restart the running framework and sync, as with this handy sequence:

    $ adb remount
    $ adb shell stop
    $ adb sync
    $ adb shell start

  8. Finally, to debug your changes you can use the DDMS tool to select a process for debug and then attach Eclipse to it. If you have the Eclipse Android Development plugin installed, there is a special DDMS perspective which you can use to choose the process for debug. To attach Eclipse to it, see these instructions:

    http://source.android.com/using-eclipse

    This document also describes how to use Eclipse for development. Any IDE should work with the proper finagling though. Just note that the IDE won't really be an integrated environment: the final output of APKs, system.img, and even the generation of R.java files will have to be done by make!

    A note about the processes in Android:

    • system_process houses all things under frameworks/base/services. This includes the PackageManagerService, StatusBarService, etc. It has many, many threads (one for each service, and then one main UI thread), so be wary when debugging.
    • com.android.acore hosts Launcher (home), Contacts, etc. You can determine the apps/providers that run here by looking for android:process="android.process.acore" in the various AndroidManifest.xml files in packages/.

    Also remember that the "framework" (under frameworks/base/core/java) is not hosted by any one process. It is a library used by most processes, so to debug code there you can usually use a simple demo app that takes advantage of whatever you changed and debug that app's process. A useful trick for setting up your debug connection is to call Debug.waitForDebugger() during some startup part of an application or system service.

UPDATE 2009-07-24: The original ONE_SHOT_MAKEFILE line I gave for rebuilding the framework has been deprecated. mmm frameworks/base is now the recommended way to rebuild the framework code.

Wednesday, January 7, 2009

Push services: Implementing persistent mobile TCP connections

As a result of my work on IMAP IDLE support in Android's default mail application, I have been experimenting with various strategies for implementing long-lived services and persistent connections that operate efficiently in a variety of circumstances. Several quirks about Android and mobile devices in general arose that could be of value to anyone implementing similar services.

For most protocols, you will need to implement some type of client-initiated keep alive at the application layer. For my purposes with IMAP I simply complied with the RFC and elected to leave IDLE mode then re-enter after 28 minutes of inactivity. On Android, you must use the AlarmManager service to wake the CPU for this task. You might be tempted to use a Handler for timing or even a simple thread with a looped sleep() however it should be noted that unless your application otherwise holds a WakeLock you cannot rely on any timing mechanism other than the AlarmManager. Once the screen goes blank, the CPU may sleep and once it does other timing mechanisms will block until the CPU wakes up again, regardless of any timeout paramters you supply.

After running my test for several days I noticed Android was mysteriously killing processes, claiming that the services implemented in them have "died", then restarting them just a few minutes later. No call to the service's onDestroy method will occur, and even on service restart you will only see a call to onCreate and not onStart. In order to compensate for this you are expected to store your state persistently and check for a discrepency during onCreate and then invoke startService for yourself if necessary. The SharedPreferences system can be handy for this.

Source code for a functional demonstration on this topic can be found at my android-random project page, under the module TestKeepAlive.

Thursday, October 23, 2008

Working on IMAP IDLE support for Android's Email app

For all those screaming about the lack of push e-mail updates for regular IMAP accounts with the basic Email app in Android, I have been digging into the code for the past few days and am preparing a patch for IMAP IDLE support. For those that do not know, IMAP IDLE allows a client to maintain a persistent, light connection to the server that is then notified of new messages as they arrive. Many IMAP servers support this extension, including Microsoft Exchange which would thus allow push e-mail for those that need corporate e-mail options on the G1.

I will post an APK for folks to try once I have the support working well. Stay tuned...