Openfire 3.6.4 memory leak with Empathy

Hey mikeycmccarthy,

Any news on the test? Pulling the network cable should be detected, even without the patch. Doesn’t hurt to give it a try, though.

There doesn’t appear to be a tag for 3.6.4, but there’s a branch: http://www.igniterealtime.org/fisheye/browse/svn-org/openfire/branches/openfire_ 3_6_4

I guess I was not clear:

-we have a problem when we connect an Empathy client on our production server. 60 users, mainly Pidgin + Spark. Once we connect/disconnect an Empathy client, the server becomes unusable. Not clear yet if it’s a memory leak or something else. Processor goes 100%, machine unusable.

-I have installed a new server on a VM to test the patch. But I can’t reproduce the problem there. It’s working with the fresh install and Empathy client. So I don’t know if this patch solves my problem. The only way to test is on the production server, but i won’t do that during the day to avoid disturbing the users.

Hi Francois,

On first glance, your problem doesn’t look like a memory leak. These problems usually take some time to develop into a real issue - they don’t usually pop up immediately after one client connects or disconnects.

Perhaps you could install the monitoring plugin described in the blogpost New Openfire monitoring plugin. This will give you a bit more details of the overall health of your environment.

Are there any entries in the logs around the time the client connects that causes the instability?

I didn’t check the logs. I will do more tests when I have some time, and I will use the monitor plugin. Thanks a lot for your help!

Hi Guus,

We started our load test at 2pm yesterday using the patched version of Openfire. I was going to try the disconnect when I got in this morning but Openfire died at about 2am, 12 hours later.

The test gradually ramps up to 4000 people chatting on the server, across 5 rooms, with a throughput of about 3.6 chats / second. Heap memory rises slowly til it’s just over 1gb at 1:45, then suddenly it spikes up and Openfire dies.

I have lots of graphs, all logs etc, would really appreciate it if we could go through these together. I’m a bit worried that the Openfire load test stats on this website only show a short period of testing. Do you find you need to restart Openfire often for Nimbuzz and what is the typical load?

Many thanks

Michael

Bummer. I had hoped we’d tackled this thing.

You can send me all of the raw data (graphs, logs, etc) to my private address (see my profile). I’m also interested in your test environment. Could you send me the details of that too, please?

Non-disclosure forbids me to be exact, but the amount of users that Nimbuzz processes is a lot, lot higher than what you’re processing. They do have occasional problems, but restarts were typically required after a few weeks, not hours.

I have moved the software on another machine with more memory. No more CPU at 100% now, but i see the Java heap memory growing slowly. It’s a standard installation, I didn’t try the patched file yet. Nothing really interesting in the logs, and the new monitoring doesn’t tell me much, but mainly coz I don’t know how to read it

I think I will try the patch file to see if it improves anything.

Could you perhaps attach or send us some of the graphs that Java-Monitor gives you?

Hi folks,

I just noticed the same problem with 3.6.4. I was using 3.6.3 before this until the pubsub problem (I’ve started a thread on this) surfaced. I disabled pubsub in 3.6.4 and gave it a 1024M max heap. With ~200 sessions, the heap was up to 1000M within a few hours. None of the users are on Empathy though.

Thank god I found this thread - ever since I moved to Empathy in Ubuntu 9.10, my server has been crashing with 100% CPU usage. Changed server-to-server setting, increased java memory size, but no good. I just switched back to Linux Spark.

I’ve been having this problem for awhile, making it more difficult to use Openfire in a Ubuntu environment. Is there any development being made on this?

Most of the people that are responding to this thread are reporting that Openfire runs out of memory (causing OutOfMemoryExceptions) after at least one user started to use the Empathy client. I’ve created a new bug report for the Empathy issue specificly: OF-82

So far, I’ve not been able to identify where things go wrong. We need your help!

I would like to receive a couple of thread dumps (or possibly, a memory dump) of an instance of Openfire that is about to run out of memory. If you can provide these dumps, please contact me (contact details can be found in my profile).

Most likely, the memory leak is linked to specific functionality. Are there any clues as to what functionality causes this problem?

I’ll set up an Openfire instance in a Solaris zone on our private network specifically to test this. Keen to help if I can.

Cheers,

Dave

Additionally, a list of all plugins that your system is running (if you didn’t provide this list yet) could also be helpful.

I’m noting now that at a number of users that are experiencing this problem are using the monitoring plugin. This could be coincidence, of course, but lets check, to be sure. Can you guys reproduce the problem without the monitoring plugin? Can you stop the problem by unloading the monitoring plugin (keep an eye on java-monitors memory usage graphs after you do this!)

Similar checks can be used to eliminate other plugins too.

Right, I have the following set up on OpenSolaris snv_127

Server Properties
Version: Openfire 3.6.4

Environment
Java Version: 1.6.0_15 Sun Microsystems Inc. – Java HotSpot™ Server VM
Appserver: jetty-6.1.x
OS / Hardware: SunOS / x86
Java Memory 36.20 MB of 494.69 MB (7.3%) used

Using the embedded DB and local accounts only.

I’ve disabled anonymous login, inbound account registration, and the server-to-server service. All other settings are at defaults.

The only plug-in present is:

Search Provides support for Jabber Search (XEP-0055) 1.4.3

Next step, log in using Empathy on an Ubuntu 9.10 Virtualbox VM to see what happens…

Hi all,

I think I may have started some of this - I’m on holiday at the moment so this may be the last post for a while but just to sum up some of our findings when running quite an intensive load test on Openfire:

  • We needed to add the stalled session property that Guus has mentioned. When our clients dropped off for various reasons (out of memory exceptions on clients, network errors etc) Openfire did not deal with it that efficiently.

  • We set our heap size wrong on the Openfire server itself, we were under the impression it had more memory than it did so set our heap size too high.

  • The settings for the logging of room conversations by default does not really lend itself to heavy MUC usage (but fine for IM). We were generating pretty high traffic (2000 users, something like 3 chats per second). If you look at the source code I believe the default is to log in batches of 50. I can’t remember the interval off the top of my head but basically our log of messages stored in memory was way too high. We’ve now upped the batch size and shortened the interval size and it’s behaving much better, logging 250 messages per minute, although as it’s not a real SQL batch statement (it’s individual inserts) I would imagine the database may be getting a bit of a hammering.

Anyway, I believe our Openfire is up for the time being - thanks for all your help all : )

Hi Guss,

Everything appears fine on the nascent test system I set up (above)…so out of curiosity I logged in to our production server using Empathy 2.28.1.1 from an Ubuntu 9.10 virtual machine. And bingo - it’s all on. Empathy indeed appears to be the kiss of death for OpenFire.

I can confirm that what I’m seeing is without the monitoring plugin installed.

The production system details:

Version: Openfire 3.6.4

Environment
Java Version: 1.6.0_07 Sun Microsystems Inc. – Java HotSpot™ Server VM
Appserver: jetty-6.1.x
OS / Hardware: Linux / i386
Java Memory 171.62 MB of 253.19 MB (67.8%) used

Other differences from the above system I posted:

  • uses an external MySQL DB

  • users authenticate using our coporate AD installation

I haven’t got JConsole hooked up so am monitoring by refreshing the OF admin console page

Over the course of an hour I have observed JVM memory usage creep from 20 percent with half a dozen logged on users to at the time of writing 80 percent consumed.

All other users are on Spark 2.5.8 or Pidgin. I am the only user with Empathy.

The only plug-ins installed on this system are:

Red5 v0.1.11

Search v1.4.3

We are not using Red5 Sparkweb functionality - i.e we have it set on a trial basis for users if they wish to tinker, but as yet no-one is actually using it.

Another observation - if you really want to exacerbate the behaviour, quit Empathy and fire it back up in rapid sucession…

I’m not going to be able to get you thread dumps with this particular setup (sorry), but will return to my test system tomorrow and try and get some concrete data for you.

Cheers!
Dave

EDIT: JVM heap size just hit 90 percent of capacity…so I’m gonna do a quick bounce of the server before anyone notices…

Message was edited by: davenz

Hi Guus,

Matter of fact I did get a thread dump on our production system, easy when you know how I guess

I’ve replied to Gato in this thread:

And I’m attaching the dump as well.

In this case I was able to expend JVM memory completely in about five (5) minutes from bringing the OF server up - simply by repeatedly exiting and launching Empathy rapidly. This was with one user (me), with the remaining three or so users (at this late time of the day hehe) using Spark or Pidgin. If I am reading this right then I can easily see how in large deployments several Empathy users would create headaches.

JVM memory status: 252.80 MB of 253.19 MB (99.8%) used

ps -ef | grep -i java

daemon 30532 1 14 19:05 pts/0 00:01:33 /usr/lib/jvm/java-1.6.0-sun-1.6.0.u7/jre/bin/java -server -Xms128m -Xmx256m -DopenfireHome=/opt/openfire -Dopenfire.lib.dir=/opt/openfire/lib -classpath /opt/openfire/lib/startup.jar -jar /opt/openfire/lib/startup.jar

kill -3 30532

cd /opt/openfire/logs/

more nohup.out

Output in the attached.

Lemme know if this helps at all
nohup-out-20091117.txt.zip (5706 Bytes)

Same problem here, but we are not using empathy. Our server only allow connections from Spark and Pidgin clients.

We don’t use Sun JDK, but OpenJDK instead

Java Version:
1.6.0_0 Sun Microsystems Inc. – OpenJDK Server VM
Appserver:
jetty-6.1.x
Host Name:
openfire-im
OS / Hardware:
Linux / i386
Locale / Timezone:
en / Central European Time (1 GMT)

These are our enabled plugins:

Client Control 1.0.3

Email Listener 1.0.0

Kraken IM 1.1.2 (with Yahoo, MSN and Gtalk enabled)

Monitoring Service 1.1.1

Registration 1.4.1

Search 1.4.3

User Import/Export 2.2.0

Out of memory happens randomly. It can take up to 15 days… or 15 hours.