Java and openVZ 2.6.32 - Futex issue

1. The context

We are currently in the process of upgrading all our servers.

Thanks to our `HAVEN High Availability architecture`:http://www.personalized-software.ie/Services#Hosting we can migrate virtual servers from one host to its slave with virtually no down time. So once the new servers were installed we started testing the migration of containers.

HAVEN being based on openVZ we were using kernels from Proxmox as they backport all fixes from RedHat 6. The only difference between the hosts was that the new servers had to use the 2.6.32 kernel, whereas the old ones were deployed with 2.6.24.

2. The issue

During the migration testing we noticed that one of our Java application (BigBlueButton) was crashing. It looks as if it was running fine but it wasn't opening its ports and you'd have to do a kill -9 to stop it.

Some quick investigation with strace showed that one of BigBlueButton requirements (ActiveMQ) was waiting on one of its children which was stuck in a infinite loop :
  • strace of the parent process
    strace -p PID
    Process PID attached - interrupt to quit
    futex(0xb77dfbd8, FUTEX_WAIT, FIRST_CHILD_PID, NULL^C <unfinished ...>
    Process PID detached
    
  • Get all child processes
    ps -efL | grep PID
    
  • strace-ing through the children list I found one that was looping infinitely :
    strace -p CHILD_PID
    futex(0x998e028, FUTEX_WAKE_PRIVATE, 1) = 0
    gettimeofday({1338478277, 139155}, NULL) = 0
    clock_gettime(CLOCK_REALTIME, {1338478277, 139659879}) = 0
    futex(0x998e044, FUTEX_WAIT_PRIVATE, 1, {0, 999495121}) = -1 ETIMEDOUT (Connection timed out)
    

A quick search on the openVZ bugzilla showed that it was indeed a `known issue`:http://bugzilla.openvz.org/show_bug.cgi?id=2206 affecting Java application with openVZ 2.6.32 kernels BUT there is no bugfix, nor workaround to the problem. Comments from other bug-posters were also less than encouraging.

Receiving no answer from the openVZ devs I decided to do more investigations on the issue.

3. Investigation and workaround

I first tried to reproduce the problem on a local virtual machine by installing Proxmox 1.9 and creating a container with BigBlueButton in it.

At first I couldn't reproduce the issue, even though I was reproducing it on a test container on a remote server where it would fail on any threaded Java program.

Because it is a futex related problem I wondered if it was due to SMP so I added a new processor to my virtual machine and this time I reproduced it.

All our containers in production have the CPUS= parameter set but for some reasons containers running on 2.6.24 where still seeing all the hosts CPUs even if only 1 was in the configuration file. This seems to have been corrected in 2.6.32 and this is probably the reason why Java is now crashing.

Java is already suffering of very annoying memory issues when running inside containers that oblige us to run everything with the -Xmx,-Xms, XX:MaxPermSize etc. parameters and it seems that even if the container has only 1 CPU it tries to use more.

Java does not provide any CPU affinity options as the process scheduler is part of the OS. Fortunately openVZ has a very handy settings called CPUMASK that allows you to force a Container to run on only one specific CPU.

After trying a vzctl set XXX --cpumask 0 --save on my test environment the issue disappeared !

A quick test show that it also work for containers that requires multiple CPUs like this :

vzctl set XXX --cpus 2 --cpumask 0 --save

Also assigning more than 1 CPU to a container work around the problem.

I cannot guarantee that it will work for you but at least for us we have no more problem since implementing either of these tricks. We can now continue our migration testing.

4. Summary and Conclusion

The issue is apparently affecting only :
  • hosts with several cores/CPUs running any version of the 2.6.32-openvz kernel (tested with debian squeeze, proxmox 1.9, proxmox 2.0 and vanilla patched kernel).
  • (debian) guests with only one CPU
Solution/workaround :
  • Affect more than 1 CPU to the guest
  • Give CPU affinity (--cpumask) to the guest

This was quite tricky to debug so I hope this might help other people stuck with the same problem. Unfortunately once you know what the solution is you always find people who `found the same`:http://forum.openvz.org/index.php?t=msg&th=10025&goto=43571&#msg_43571 In any case it cannot hurt to have more documentation about this :o)