Openfire becomes unresponsive

Hi,

I am building a messaging system for my application using openfire. Recently we started running load tests on the application and observed that openfire became inresponsive after handling aroud 2000 messages. On analysing the vm profiles after that we saw these issues

  1. Openfire having high usage of oldgen heap space (~90%).
  2. We also observed that perm space grows to ~99% and we saw a lot of BOSH(7071) connection in CLOSE_WAIT state. (We reran the tests after bumping the max perm space to 64M after that it is under 60% and the CLOSE_WAIT connections also goes away but other issues remained).
  3. Failure while sending messages often exhibited by SASL authentication exception and server time outs which is preceded by stray NULL pointers (from PEP service and SASLauthenticationclasses).

In the scenario we have one message generator and 3 message receiver, we have as many nodes as the number of receivers. The receivers will be listening on BOSH-tunnel and will logout from the tunnel after receiving aroud 10 events and will login again. Message sender is using smack library for connection, node creation, subscription and sending messages. The openfire version is the 3.7.1 for linux.

On restarting openfire, things start looking good again but it again becomes unresponsive after handling around 2000 messages.

We have hit a roadblock with this, any help on how to analyze it further or to fix these issues would be of great help.

Thanks and Regards,

Praveen

Openfire up to and including version 3.6.4 (and looks like 3.7.0 too) suffers from a memory leak in its PEP component. If your Openfire server is crashing with OutOfMemoryExceptions, you might be having this problem.

As a workaround, you can disable PEP, by setting the Openfire property xmpp.pep.enabled to false.

More information can be found in this discussion: Openfire 3.6.4 memory leak with Empathy

Does this help?

Hi LG,

Thanks a lot for your fast response.

I have tried disabling PEP service, after doing that pubsub doesn’t work. I also installed the latest 3.7.1 and tried my load runs, it again failed with same issues.

Thanks and Regards,

Praveen

If you are able to, try making your own build from trunk.

Strange that pubsub didn’t work though, that property only affects pep, not the whole pubsub module.

I have found the issue. The leak is created when the PEPService is removed from the DefaultCache. The PublishedItemTask is never cancelled. The build up of tasks causes a leak. In case it helps, I am including herewith the patch and the binary.

The patch has a fix for frequent connection disconnects as well.

Thanks and Regards,

Praveen

Thanks for your effort in tracking down that but, but unforturnately your patch is written against some very old code. Pubsub (and to some degree PEP) has been heavily refactored and as far as we know, the memory leak has been fixed along with it. The task you are referring to doesn’t actually exist any more.

You should try using trunk to confirm it fixes any issues you may have been having.

NOTE: In the future, please put your patches as attachments to a discussion (like here) instead of as a document. It allows for better discussion/feedback as it is not actually a document.