Openfire 3.6.4 memory leak with Empathy

Hi all,

We’ve just installed Opefire 3.6.4 using the RPM from this site (with the bundled JRE). We have about 60 users, using a mixture of Spark, Empathy 2.28.0.1, Pidgin 2.5.5 and something called SJPhone.

We seem to be getting a leak, where memory just keeps going up until the server falls over. The machine has 1GB of RAM, with 768MB Max for the JVM.

The “Environment” section currently looks like this:

Java Version:
1.6.0_03 Sun Microsystems Inc. – Java HotSpot™ Server VM
Appserver:
jetty-6.1.x
Host Name:
cammesg01.city-link.co.uk
OS / Hardware:
Linux / i386
Locale / Timezone:
en / Greenwich Mean Time (0 GMT)
Java Memory

657.56 MB of 759.50 MB (86.6%) used

This seems unacceptably high. The GC from Java releases a little each time, but after about a day or so it’s usually all used up and the server falls over.

The plugins installed are: Broadcast, Client Control, Monitoring, Presence, User Search, User Import/Export

Database is MySQL. Remote Services Disabled. Connection Manager Disabled. Media Proxy Disabled.

Current Users

Low: 0
59
High: 60

Active Conversations

Low: 0
7
High: 13

Packets Per Minute

Low: 8
191
High: 461

I’d like to start using this server for more users via the http-bind functionality and strophejs, but need to get this leaky problem sorted out first. Any ideas why this could be happening?

All I can think of doing is upgrading the Java version but I get a feeling this won’t solve the issue.

Many thanks in advance

Dave
conversation.pdf (3610 Bytes)

We are having the same issue. What is interesting is that we did not have this problem until I started using Empathy. The rest of the staff use PSI or Pidgin. I was the only user using Empathy that comes with the Ubuntu 9.10 edition.

It seems to have been around the same time as I upgraded to Empathy here too. Which is worrying seeing that Empathy will be default for a lot of Ubuntu users in the next couple of weeks!

I’ve disabled my client for the time being but that isn’t a solution in the scheme of things!

From the looks of this ticket it accuses Openfire of not correctly implementing the XMPP protocol:

*Pidgin sends XEP-0199 pings as keepalives.

It sounds like you may be encountering the issue where Openfire just completely ignores some IQ packets (in clear violation of the XMPP specs). There are some older tickets relating to that (I don’t have the #s handy).

Please file a new issue for this and attach the Help->Debug Window output from a computer being disconnected from openfire.*

I wonder if Empathy/Telepathy has the same issue? I wonder if it is worth reporting to their issue trackers as well?

It probably is worth reporting. to the Empathy team as well.

I’ve reported it to the Ubuntu Empathy team on Launchpad: https://bugs.launchpad.net/ubuntu/+source/empathy/+bug/450184

If you could keep an eye on that and provide additional information if required (to prove I’m not mad!) that’d help. Thanks

Hi Dave,

Are you sure it’s Empathy? We’re also trying to investigate a potential memory leak when we put Openfire under load…

Thanks,

Michael

I can confirm that we did not have an issue with 3.6.4 until I began using Empathy on Ubnutu 9.10. I was also the only user using Empathy at the time.

I think so, yes. It’s only me running Empathy and I’ve had it turned off all day (and night) after a restart yesterday evening. Memory usage on the server is back to normal, about 50MB.

We are also experiencing a memory leak issue.

We have 3GB dedicated to Openfire and the memory just slowly leaks away. Right now Openfire is using 1.7GB and the cache shows that it has cached about 400MB. There are only 2400 people online (relatively quiet).

We are running on Ubuntu server. In the evenings when the site is busy (around 7000 people online) the memory is totally used up and the GC is running continuously. It will crash after a few days as there won’t be enough RAM. If I restart openfire the memory usage will drop right back to a few hundred MB like it should be.

Can anyone advise a good way to hunt down what is going on with all our memory?

Thanks

Daniel

A guy in the team thinks he has found the problem from analyzing a dump - we’ll know a bit more tomorrow so I’ll let you know if anything comes up.

I’ve read in a few places that Smack has a few memory leaks, and there are some patches floating around JIRA to fix them. In our most recent load tests the load test agents are crashing before the server really gets any good load, possibly due to this Smack bug. Obviously it’s a client problem not a server problem but we need to fix that before we can do any more investigation. And hope XIFF doesn’t have similar problems because that’s what our real clients will be using…

We’re using Grinder to load test but if people using Tsung seem to be pretty happy with it from what I’ve read.

We are also using XIFF - but how could that be causing memory leaks in Openfire? Could it have something to do with how Openfire responds to keep-alive. I’ll have a look into it also.

Hi, this is what the guy in the team has found.

  • Openfire uses MINA which handles a session per socket. Each session has a queue of messages with no limit on this. Everytime you write a message to a session, it gets appended to the queue waiting for the client to process.

  • If the client hangs and can’t process the message (i.e. send an TCP/IP ACK back), the queue grows and it looks like an Openfire leak. Openfire eventually runs out of memory.

This would mean that any client machine, using any XMPP implementation (XIFF/Smack) could potentially bring down Openfire if it doesn’t send that ACK.

I’d like to hear an Openfire developer’s take on this to see if they agree.

That is very interesting. I assume the connection would timeout though if it doesn’t get a response?

Nothing to do with the ACK but is anyone using the User Service Plugin? I just restarted this and our memory usage surprisingly dropped by about 300MB? I will have a look at the source code in this also to see if I can see any potential issues.

I’ve filed a bug upstream against Empathy too: https://bugzilla.gnome.org/show_bug.cgi?id=598522

We are using loadrunner8.1.

We cannot solve the problem of memory leak.

The tsung is just a tool.

Why the team don’t look at this thread!

moviebat wrote:

Why the team don’t look at this thread!

Official JiveSoftware developers dont work actively on this project anymore, so there is only a group of users who try to fix issues and answer basic questions. It seems that there is only a small number of users doing load tests, so there is not much information about this.

We’re now using Tsung to load test rather than Grinder and we’re getting the same issues.

Does anyone know of an Openfire installation being used in any large-ish scale production environment? We’re at the stage where I think we’re going to have to abandon Openfire and use something like ejabberd : (

Reported to Empathy mailing list now too.

Open-source can be frustrating sometimes. I’ve now reported this in Ubuntu Launchpad (try upstream), Gnome Bugzilla (try upstream) and now Freedesktop.org. I wish there was only one place to do this.

http://lists.freedesktop.org/archives/telepathy/2009-October/003941.html

If anyone can help with the inner workings of the server for the guys at Telepathy that’d be really useful. Java isn’t my strong point.

mikeycmccarthy wrote:

(…)
Does anyone know of an Openfire installation being used in any large-ish scale production environment?

Nimbuzz currently uses Openfire. Earlier this year, they passed the 10 million download mark of one of their clients.