Openfire 3.6.4 memory leak with Empathy

We are also using XIFF - but how could that be causing memory leaks in Openfire? Could it have something to do with how Openfire responds to keep-alive. I’ll have a look into it also.

Hi, this is what the guy in the team has found.

  • Openfire uses MINA which handles a session per socket. Each session has a queue of messages with no limit on this. Everytime you write a message to a session, it gets appended to the queue waiting for the client to process.

  • If the client hangs and can’t process the message (i.e. send an TCP/IP ACK back), the queue grows and it looks like an Openfire leak. Openfire eventually runs out of memory.

This would mean that any client machine, using any XMPP implementation (XIFF/Smack) could potentially bring down Openfire if it doesn’t send that ACK.

I’d like to hear an Openfire developer’s take on this to see if they agree.

That is very interesting. I assume the connection would timeout though if it doesn’t get a response?

Nothing to do with the ACK but is anyone using the User Service Plugin? I just restarted this and our memory usage surprisingly dropped by about 300MB? I will have a look at the source code in this also to see if I can see any potential issues.

I’ve filed a bug upstream against Empathy too: https://bugzilla.gnome.org/show_bug.cgi?id=598522

We are using loadrunner8.1.

We cannot solve the problem of memory leak.

The tsung is just a tool.

Why the team don’t look at this thread!

moviebat wrote:

Why the team don’t look at this thread!

Official JiveSoftware developers dont work actively on this project anymore, so there is only a group of users who try to fix issues and answer basic questions. It seems that there is only a small number of users doing load tests, so there is not much information about this.

We’re now using Tsung to load test rather than Grinder and we’re getting the same issues.

Does anyone know of an Openfire installation being used in any large-ish scale production environment? We’re at the stage where I think we’re going to have to abandon Openfire and use something like ejabberd : (

Reported to Empathy mailing list now too.

Open-source can be frustrating sometimes. I’ve now reported this in Ubuntu Launchpad (try upstream), Gnome Bugzilla (try upstream) and now Freedesktop.org. I wish there was only one place to do this.

http://lists.freedesktop.org/archives/telepathy/2009-October/003941.html

If anyone can help with the inner workings of the server for the guys at Telepathy that’d be really useful. Java isn’t my strong point.

mikeycmccarthy wrote:

(…)
Does anyone know of an Openfire installation being used in any large-ish scale production environment?

Nimbuzz currently uses Openfire. Earlier this year, they passed the 10 million download mark of one of their clients.

I’ve created a new JIRA issue (OF-70) to track this issue.

I wouldn’t be surprised if this was introduced when JM-1066 was fixed. Is anyone able/willing to test a patch, if I provide it as a diff?

(said diff can be found in the JIRA issue)

I will apply this patch and see how it goes. If it fixes the memory leak it will be very obvious in our environment.

Thanks for looking into this issue,

Daniel

I have applied this patch to our live server and we’ll know during peak time in about 12 hours if it fixes the issue.

Thanks

I had to roll this back again.

The patch resulted in everyone getting logged out after about 5 minutes of being connected.

There must be more to it?

Thanks

My patch disconnected all idle clients (clients that had not been sent any data for a while). As you’ve found, that’s not the best of solutions. Instead, the code should detect write timeouts. Luckily, MINA appears to offer that functionality.

I’ve modified the patch to detect write-timeouts. I haven’t been able to test this yet, but could one of you give it a try?

The patch can be found in JIRA issue OF-70.

I will test this new patch and let you know how we go.

Thanks

We ran the patch during peak time but unfortunately we are still leaking memory. Its very frustrating.

Saying that it doesn’t mean your patch hasn’t fixed other issues, it just hasn’t solved the issue we are experiencing.

And this latest patch hasn’t caused any new problems which is good.

It would be good if some others with memory leaks could also test this.

Thanks

Daniel

Hi Guus

I’m ok to test it, but I just don’t know how to apply the patch. Would you mind explaining me how? Thanks!

We’re testing now. To run the patch simply check out the Openfire source, add the relevant line (or use Eclipse -> Patch -> Apply) then build the openfire jar using the Ant script.