Nov 17, 2009 2:44 AM
Openfire 3.6.4 memory leak with Empathy
-
Like (0)
Hi all,
We've just installed Opefire 3.6.4 using the RPM from this site (with the bundled JRE). We have about 60 users, using a mixture of Spark, Empathy 2.28.0.1, Pidgin 2.5.5 and something called SJPhone.
We seem to be getting a leak, where memory just keeps going up until the server falls over. The machine has 1GB of RAM, with 768MB Max for the JVM.
The "Environment" section currently looks like this:
| Java Version: | 1.6.0_03 Sun Microsystems Inc. -- Java HotSpot(TM) Server VM | ||
| Appserver: | jetty-6.1.x | ||
| Host Name: | cammesg01.city-link.co.uk | ||
| OS / Hardware: | Linux / i386 | ||
| Locale / Timezone: | en / Greenwich Mean Time (0 GMT) | ||
| Java Memory |
|
This seems unacceptably high. The GC from Java releases a little each time, but after about a day or so it's usually all used up and the server falls over.
The plugins installed are: Broadcast, Client Control, Monitoring, Presence, User Search, User Import/Export
Database is MySQL. Remote Services Disabled. Connection Manager Disabled. Media Proxy Disabled.
| Current Users | ||||
| Low: 0 | 59 | High: 60 | ||
| Active Conversations | ||||
| Low: 0 | 7 | High: 13 | ||
| Packets Per Minute | ||||
| Low: 8 | 191 | High: 461 | ||
I'd like to start using this server for more users via the http-bind functionality and strophejs, but need to get this leaky problem sorted out first. Any ideas why this could be happening?
All I can think of doing is upgrading the Java version but I get a feeling this won't solve the issue.
Many thanks in advance
Dave
We are having the same issue. What is interesting is that we did not have this problem until I started using Empathy. The rest of the staff use PSI or Pidgin. I was the only user using Empathy that comes with the Ubuntu 9.10 edition.
It seems to have been around the same time as I upgraded to Empathy here too. Which is worrying seeing that Empathy will be default for a *lot* of Ubuntu users in the next couple of weeks!
I've disabled my client for the time being but that isn't a solution in the scheme of things!
From the looks of this ticket it accuses Openfire of not correctly implementing the XMPP protocol:
Pidgin sends XEP-0199 pings as keepalives.
It sounds like you may be encountering the issue where Openfire just completely ignores some IQ packets (in clear violation of the XMPP specs). There are some older tickets relating to that (I don't have the #s handy).
Please file a new issue for this and attach the Help->Debug Window output from a computer being disconnected from openfire.
I wonder if Empathy/Telepathy has the same issue? I wonder if it is worth reporting to their issue trackers as well?
It probably is worth reporting. to the Empathy team as well.
I've reported it to the Ubuntu Empathy team on Launchpad: https://bugs.launchpad.net/ubuntu/+source/empathy/+bug/450184
If you could keep an eye on that and provide additional information if required (to prove I'm not mad!) that'd help. Thanks ![]()
Hi Dave,
Are you sure it's Empathy? We're also trying to investigate a potential memory leak when we put Openfire under load...
Thanks,
Michael
I can confirm that we did not have an issue with 3.6.4 until I began using Empathy on Ubnutu 9.10. I was also the only user using Empathy at the time.
I think so, yes. It's only me running Empathy and I've had it turned off all day (and night) after a restart yesterday evening. Memory usage on the server is back to normal, about 50MB.
We are also experiencing a memory leak issue.
We have 3GB dedicated to Openfire and the memory just slowly leaks away. Right now Openfire is using 1.7GB and the cache shows that it has cached about 400MB. There are only 2400 people online (relatively quiet).
We are running on Ubuntu server. In the evenings when the site is busy (around 7000 people online) the memory is totally used up and the GC is running continuously. It will crash after a few days as there won't be enough RAM. If I restart openfire the memory usage will drop right back to a few hundred MB like it should be.
Can anyone advise a good way to hunt down what is going on with all our memory?
Thanks
Daniel
A guy in the team thinks he has found the problem from analyzing a dump - we'll know a bit more tomorrow so I'll let you know if anything comes up.
I've read in a few places that Smack has a few memory leaks, and there are some patches floating around JIRA to fix them. In our most recent load tests the load test agents are crashing before the server really gets any good load, possibly due to this Smack bug. Obviously it's a client problem not a server problem but we need to fix that before we can do any more investigation. And hope XIFF doesn't have similar problems because that's what our real clients will be using...
We're using Grinder to load test but if people using Tsung seem to be pretty happy with it from what I've read.
We are also using XIFF - but how could that be causing memory leaks in Openfire? Could it have something to do with how Openfire responds to keep-alive. I'll have a look into it also.
Hi, this is what the guy in the team has found.
- Openfire uses MINA which handles a session per socket. Each session has a queue of messages with no limit on this. Everytime you write a message to a session, it gets appended to the queue waiting for the client to process.
- If the client hangs and can't process the message (i.e. send an TCP/IP ACK back), the queue grows and it looks like an Openfire leak. Openfire eventually runs out of memory.
This would mean that any client machine, using any XMPP implementation (XIFF/Smack) could potentially bring down Openfire if it doesn't send that ACK.
I'd like to hear an Openfire developer's take on this to see if they agree.
That is very interesting. I assume the connection would timeout though if it doesn't get a response?
Nothing to do with the ACK but is anyone using the User Service Plugin? I just restarted this and our memory usage surprisingly dropped by about 300MB? I will have a look at the source code in this also to see if I can see any potential issues.
(said diff can be found in the JIRA issue)
I will apply this patch and see how it goes. If it fixes the memory leak it will be very obvious in our environment.
Thanks for looking into this issue,
Daniel
I have applied this patch to our live server and we'll know during peak time in about 12 hours if it fixes the issue.
Thanks
I had to roll this back again.
The patch resulted in everyone getting logged out after about 5 minutes of being connected.
There must be more to it?
Thanks
My patch disconnected all idle clients (clients that had not been sent any data for a while). As you've found, that's not the best of solutions. Instead, the code should detect write timeouts. Luckily, MINA appears to offer that functionality.
I've modified the patch to detect write-timeouts. I haven't been able to test this yet, but could one of you give it a try?
The patch can be found in JIRA issue
OF-70.
I will test this new patch and let you know how we go.
Thanks
We ran the patch during peak time but unfortunately we are still leaking memory. Its very frustrating.
Saying that it doesn't mean your patch hasn't fixed other issues, it just hasn't solved the issue we are experiencing.
And this latest patch hasn't caused any new problems which is good.
It would be good if some others with memory leaks could also test this.
Thanks
Daniel
Hi Guus
I'm ok to test it, but I just don't know how to apply the patch. Would you mind explaining me how? Thanks!
We're testing now. To run the patch simply check out the Openfire source, add the relevant line (or use Eclipse -> Patch -> Apply) then build the openfire jar using the Ant script.
Just as an aside, what we're planning to do at some point is pull a network cable on our load test clients and see how Openfire deals with the sudden disconnection. If the patch works presumably it shouldn't run out of memory.
PS - where in SVN is the tag of Openfire 3.6.4??!
mikeycmccarthy wrote:
Just as an aside, what we're planning to do at some point is pull a network cable on our load test clients and see how Openfire deals with the sudden disconnection.
Speaking about sudden disconnection. There are more complaints about that and not about memory leaks. Such broken sessions stay online, keep updating their last activity (somehow
) and confuses users alot, when they contacts are not replying while being shown as online all the day, while they laptops are not at the desk for hours.
Hey mikeycmccarthy,
Any news on the test?
Pulling the network cable should be detected, even without the patch. Doesn't hurt to give it a try, though.
There doesn't appear to be a tag for 3.6.4, but there's a branch: http://www.igniterealtime.org/fisheye/browse/svn-org/openfire/branches/openfire_ 3_6_4
Oups, already too complicated for me
I never compiled a Java application, so I guess I will need to read some documentation first ![]()
Not quite sure how I'd do it but I can send you the jar with the changes compiled into it. You'd only need to put this into the libs directory of your Openfire server and restart.
That would be much easier for me indeed.
I don't know if it matters, but it's on a 32bits windows.
I will send you my email address by private message.
I got the file, thanks. I tried it on a virtual machine with a freshly installed server, but I don't see any difference, I can't get any memory leak for the moment... Only our real server seems to have problems. I don't want to test it with too many people connected, so if this VM can't help, I will try the patch one of these evenings.
Would you be able to install/run Ubuntu 9.10RC: http://releases.ubuntu.com/releases/9.10/ and try connecting to your server from Empathy? (I'm sure even from the LiveCD would be ok.)
This was the probable cause of our memory issues.
I was! I know it's the problem. But no way to reproduce the leak on this new server. But as soon as I connect an Empathy client on the main server and disconnect it, it happens. I will try again tomorrow on the VM with more clients.
Hello Francois,
Thanks for helping us out! I'm a bit confused. Did the problem disappear on the environment where you are testing the patch, or do you not have the experience there, even without the patch?
I guess I was not clear:
-we have a problem when we connect an Empathy client on our production server. 60 users, mainly Pidgin + Spark. Once we connect/disconnect an Empathy client, the server becomes unusable. Not clear yet if it's a memory leak or something else. Processor goes 100%, machine unusable.
-I have installed a new server on a VM to test the patch. But I can't reproduce the problem there. It's working with the fresh install and Empathy client. So I don't know if this patch solves my problem. The only way to test is on the production server, but i won't do that during the day to avoid disturbing the users.
Hi Francois,
On first glance, your problem doesn't look like a memory leak. These problems usually take some time to develop into a real issue - they don't usually pop up immediately after one client connects or disconnects.
Perhaps you could install the monitoring plugin described in the blogpost New Openfire monitoring plugin. This will give you a bit more details of the overall health of your environment.
Are there any entries in the logs around the time the client connects that causes the instability?
I didn't check the logs. I will do more tests when I have some time, and I will use the monitor plugin. Thanks a lot for your help!
I have moved the software on another machine with more memory. No more CPU at 100% now, but i see the Java heap memory growing slowly. It's a standard installation, I didn't try the patched file yet. Nothing really interesting in the logs, and the new monitoring doesn't tell me much, but mainly coz I don't know how to read it ![]()
I think I will try the patch file to see if it improves anything.
Could you perhaps attach or send us some of the graphs that Java-Monitor gives you?
Hi Daniel,
Sorry to hear the patch didn't solve your issue. The patch was based on the description that (the guy in the team of) mikeycmccartarthy gave. Either I got the fix wrong, or your suffering from another problem.
I've got the feeling that several issues are being discussed in this thread. Things appear to get mixed up a bit. Perhaps we should discuss the issue that you're experiencing one-on-one. Could you send me a message or chat to me offline? You'll find my contact details in my profile.
Hi Guus,
We started our load test at 2pm yesterday using the patched version of Openfire. I was going to try the disconnect when I got in this morning but Openfire died at about 2am, 12 hours later.
The test gradually ramps up to 4000 people chatting on the server, across 5 rooms, with a throughput of about 3.6 chats / second. Heap memory rises slowly til it's just over 1gb at 1:45, then suddenly it spikes up and Openfire dies.
I have lots of graphs, all logs etc, would really appreciate it if we could go through these together. I'm a bit worried that the Openfire load test stats on this website only show a short period of testing. Do you find you need to restart Openfire often for Nimbuzz and what is the typical load?
Many thanks
Michael
Bummer. I had hoped we'd tackled this thing.
You can send me all of the raw data (graphs, logs, etc) to my private address (see my profile). I'm also interested in your test environment. Could you send me the details of that too, please?
Non-disclosure forbids me to be exact, but the amount of users that Nimbuzz processes is a lot, lot higher than what you're processing. They do have occasional problems, but restarts were typically required after a few weeks, not hours.
We are using loadrunner8.1.
We cannot solve the problem of memory leak.
The tsung is just a tool.
Why the team don't look at this thread!
moviebat wrote:
Why the team don't look at this thread!
Official JiveSoftware developers dont work actively on this project anymore, so there is only a group of users who try to fix issues and answer basic questions. It seems that there is only a small number of users doing load tests, so there is not much information about this.
We're now using Tsung to load test rather than Grinder and we're getting the same issues.
Does anyone know of an Openfire installation being used in any large-ish scale production environment? We're at the stage where I think we're going to have to abandon Openfire and use something like ejabberd : (
mikeycmccarthy wrote:
(...)
Does anyone know of an Openfire installation being used in any large-ish scale production environment?
Nimbuzz currently uses Openfire. Earlier this year, they passed the 10 million download mark of one of their clients.
I've filed a bug upstream against Empathy too: https://bugzilla.gnome.org/show_bug.cgi?id=598522
Reported to Empathy mailing list now too.
Open-source can be frustrating sometimes. I've now reported this in Ubuntu Launchpad (try upstream), Gnome Bugzilla (try upstream) and now Freedesktop.org. I wish there was only one place to do this.
http://lists.freedesktop.org/archives/telepathy/2009-October/003941.html
If anyone can help with the inner workings of the server for the guys at Telepathy that'd be really useful. Java isn't my strong point.
Hi folks,
I just noticed the same problem with 3.6.4. I was using 3.6.3 before this until the pubsub problem (I've started a thread on this) surfaced. I disabled pubsub in 3.6.4 and gave it a 1024M max heap. With ~200 sessions, the heap was up to 1000M within a few hours. None of the users are on Empathy though.
Thank god I found this thread - ever since I moved to Empathy in Ubuntu 9.10, my server has been crashing with 100% CPU usage. Changed server-to-server setting, increased java memory size, but no good. I just switched back to Linux Spark.
I've been having this problem for awhile, making it more difficult to use Openfire in a Ubuntu environment. Is there any development being made on this?
Most of the people that are responding to this thread are reporting that Openfire runs out of memory (causing OutOfMemoryExceptions) after at least one user started to use the Empathy client. I've created a new bug report for the Empathy issue specificly:
OF-82
So far, I've not been able to identify where things go wrong. We need your help!
I would like to receive a couple of thread dumps (or possibly, a memory dump) of an instance of Openfire that is about to run out of memory. If you can provide these dumps, please contact me (contact details can be found in my profile).
Most likely, the memory leak is linked to specific functionality. Are there any clues as to what functionality causes this problem?
I'll set up an Openfire instance in a Solaris zone on our private network specifically to test this. Keen to help if I can.
Cheers,
Dave
Right, I have the following set up on OpenSolaris snv_127
Server Properties
Version: Openfire 3.6.4
Environment
Java Version: 1.6.0_15 Sun Microsystems Inc. -- Java HotSpot(TM) Server VM
Appserver: jetty-6.1.x
OS / Hardware: SunOS / x86
Java Memory 36.20 MB of 494.69 MB (7.3%) used
Using the embedded DB and local accounts only.
I've disabled anonymous login, inbound account registration, and the server-to-server service. All other settings are at defaults.
The only plug-in present is:
Search Provides support for Jabber Search (XEP-0055) 1.4.3
Next step, log in using Empathy on an Ubuntu 9.10 Virtualbox VM to see what happens....
Additionally, a list of all plugins that your system is running (if you didn't provide this list yet) could also be helpful.
I'm noting now that at a number of users that are experiencing this problem are using the monitoring plugin. This could be coincidence, of course, but lets check, to be sure. Can you guys reproduce the problem without the monitoring plugin? Can you stop the problem by unloading the monitoring plugin (keep an eye on java-monitors memory usage graphs after you do this!)
Similar checks can be used to eliminate other plugins too.
Hi all,
I think I may have started some of this - I'm on holiday at the moment so this may be the last post for a while but just to sum up some of our findings when running quite an intensive load test on Openfire:
- We needed to add the stalled session property that Guus has mentioned. When our clients dropped off for various reasons (out of memory exceptions on clients, network errors etc) Openfire did not deal with it that efficiently.
- We set our heap size wrong on the Openfire server itself, we were under the impression it had more memory than it did so set our heap size too high.
- The settings for the logging of room conversations by default does not really lend itself to heavy MUC usage (but fine for IM). We were generating pretty high traffic (2000 users, something like 3 chats per second). If you look at the source code I believe the default is to log in batches of 50. I can't remember the interval off the top of my head but basically our log of messages stored in memory was way too high. We've now upped the batch size and shortened the interval size and it's behaving much better, logging 250 messages per minute, although as it's not a real SQL batch statement (it's individual inserts) I would imagine the database may be getting a bit of a hammering.
Anyway, I believe our Openfire is up for the time being - thanks for all your help all : )
Hi Guss,
Everything appears fine on the nascent test system I set up (above)....so out of curiosity I logged in to our production server using Empathy 2.28.1.1 from an Ubuntu 9.10 virtual machine. And bingo - it's all on. Empathy indeed appears to be the kiss of death for OpenFire.
I can confirm that what I'm seeing is without the monitoring plugin installed.
The production system details:
Version: Openfire 3.6.4
Environment
Java Version: 1.6.0_07 Sun Microsystems Inc. -- Java HotSpot(TM) Server VM
Appserver: jetty-6.1.x
OS / Hardware: Linux / i386
Java Memory 171.62 MB of 253.19 MB (67.8%) used
Other differences from the above system I posted:
- uses an external MySQL DB
- users authenticate using our coporate AD installation
I haven't got JConsole hooked up so am monitoring by refreshing the OF admin console page ![]()
Over the course of an hour I have observed JVM memory usage creep from 20 percent with half a dozen logged on users to at the time of writing 80 percent consumed.
All other users are on Spark 2.5.8 or Pidgin. I am the only user with Empathy.
The only plug-ins installed on this system are:
Red5 v0.1.11
Search v1.4.3
We are not using Red5 Sparkweb functionality - i.e we have it set on a trial basis for users if they wish to tinker, but as yet no-one is actually using it.
Another observation - if you really want to exacerbate the behaviour, quit Empathy and fire it back up in rapid sucession...
I'm not going to be able to get you thread dumps with this particular setup (sorry), but will return to my test system tomorrow and try and get some concrete data for you.
Cheers!
Dave
EDIT: JVM heap size just hit 90 percent of capacity...so I'm gonna do a quick bounce of the server before anyone notices... ![]()
Message was edited by: davenz
I too am happy to find this thread.
We have a very modest Openfire installation, just 46 users, which ran untended and uninterupted for hundreds of days until November 5th, when we had the first "out of Java memory" crash. We took the enforced downtime as an opportunity to upgrade to 3.6.4.
It too crashed seven days later - out of Java resources..
Today (five days later) Java memory usage was at nearly 90% (of 1GB) so we restarted Openfire to preempt another crash.
Most of our clients are Pidgin (on Windows & Ubuntu), plus a smattering of OS X iChat. A couple of the Ubuntu users switched to Empathy on release of Ubuntu 9.10 (29 October). This does seem to have coincided the onset of our problems.
We have requested Empathy users to switch back to Pidgin and are monitoring closely.
Regards,
Hi Guus,
Matter of fact I did get a thread dump on our production system, easy when you know how I guess ![]()
I've replied to Gato in this thread:
http://www.igniterealtime.org/community/message/198160#198160
And I'm attaching the dump as well.
In this case I was able to expend JVM memory completely in about five (5) minutes from bringing the OF server up - simply by repeatedly exiting and launching Empathy rapidly. This was with one user (me), with the remaining three or so users (at this late time of the day hehe) using Spark or Pidgin. If I am reading this right then I can easily see how in large deployments several Empathy users would create headaches.
JVM memory status: 252.80 MB of 253.19 MB (99.8%) used
# ps -ef | grep -i java
daemon 30532 1 14 19:05 pts/0 00:01:33 /usr/lib/jvm/java-1.6.0-sun-1.6.0.u7/jre/bin/java -server -Xms128m -Xmx256m -DopenfireHome=/opt/openfire -Dopenfire.lib.dir=/opt/openfire/lib -classpath /opt/openfire/lib/startup.jar -jar /opt/openfire/lib/startup.jar
# kill -3 30532
# cd /opt/openfire/logs/
# more nohup.out
Output in the attached.
Lemme know if this helps at all ![]()
Same problem here, but we are not using empathy. Our server only allow connections from Spark and Pidgin clients.
We don't use Sun JDK, but OpenJDK instead
| Java Version: | 1.6.0_0 Sun Microsystems Inc. -- OpenJDK Server VM |
| Appserver: | jetty-6.1.x |
| Host Name: | openfire-im |
| OS / Hardware: | Linux / i386 |
| Locale / Timezone: | en / Central European Time (1 GMT) |
These are our enabled plugins:
Client Control 1.0.3
Email Listener 1.0.0
Kraken IM 1.1.2 (with Yahoo, MSN and Gtalk enabled)
Monitoring Service 1.1.1
Registration 1.4.1
Search 1.4.3
User Import/Export 2.2.0
Out of memory happens randomly. It can take up to 15 days... or 15 hours.
Not necessarily the same problem then - there seems to be a growing body of evidence implicating Empathy in this particular issue, and in my case it appears I have a solid test case I can reproduce this consistently on.
I agree with Dave here. Given the amount of noise surrounding the Empathy client, something must be up there. You are probably not running into the same problem, but into something different with similar effects.
Yes, it could be empathy, of course.
But I have sent you my own problem in order to find a possible pattern not related with empathy.
It probably will be empathy and our problem is another one, but perhaps is a plugin we are all using. I'm only trying to apply a "Dr House differential diagnostic". ;-)
I've also found some exceptions in log regarding email listener and some others related to Gtalk personal groups.
I think there are at least 3-4 different issues discussed on this thread and this gets very confusing, for Guus too, i think. Would be better to discuss Empathy issue on Empathy thread. For random out of memory issues a separate threads should be started too.
I've changed the thread title to more acurately reflect the issue being discussed here.
I am running Openfire 3.6.2 on Ubuntu Server 8.10.
| Java Version: | 1.6.0_0 Sun Microsystems Inc. -- OpenJDK Server VM |
| Appserver: | jetty-6.1.x |
I started having this same exact issue after I setup a laptop with Ubuntu Desktop 9.10 with the Empathy IM client. Once I saw this post I turned off the Ubuntu laptop and rebooted the Ubuntu Server. The problem has not come back since.
Could this be related? http://www.igniterealtime.org/community/thread/40410?tstart=0
I have noticed those connections too. Empathy queries possible proxy servers, which is why you see a lot of server-to-server connections being created. Although not very nice, it should be of no concequence. I'm running some quick tests to make sure.
This bug has been fixed the day before yesterday by the telepathy-gabble developers. An update should roll out soon.
The fix for http://bugs.freedesktop.org/show_bug.cgi?id=21151 (Should only query SOCK5 proxies when needed) has made it into Ubuntu Jaunty-Proposed repositories now.
It'll stop the random servers being called from Empathy, although doesn't seem related to Guus's PEP theory.
To enable Proposed updates in Ubuntu, goto "Software Sources" in Administration, goto the updates tab, and selected "Proposed". Then reload your sources. telepathy-gabble should then appear in update-manager.
I think I've uncovered the cause of the problem. It appears to be Openfire's implementation of XEP-0163 "Personal Eventing Protocol."
As a workaround, you can disable PEP by adding this Openfire System Property (you can add/modify properties through the Openfire Admin console):
The property xmpp.pep.enabled should be set to: false
Good news! I have added this setting to our server, I will monitor the memory use now. Thanks a lot for your work!
What I forgot to mention: you'll most likely have to restart the server for the setting to take effect.
I've been experiencing memory utilization problems as well: http://www.igniterealtime.org/community/message/197613#197613
I've seen usage of Empathy go up and instead of running out of memory once a month, it's almost daily now. I can make my heap dumps available to the developers if they are interested in profiling it.
Couple Empathy clients connected, no problem so far, the memory use doesn't show any suddent increase like before. I guess u did it Guus, great job! I will try to add some more Empathy clients to see if it's stable.
Hi Guus,
Are there plans for this to be fixed in the next Openfire release (and if so is there a timescale on this)? Or will we have to stick with disabling the PEP?
Thanks for your help.
Hi Dave,
Sorry that it has taken so long to get back to you.
I've been trying to identify the cause of the problem, but so far, have been unable to do so. I've asked other developers to have a look too, but none of them have been successful either. Sadly, my attempts are severely hindered by lack of time - I'm doing this in my spare time, which is limited.
I'm not comfortable at all releasing the next version of Openfire with this bug in it, but there's going to be a point in the near future where I feel we should be pragmatic, and move forward. The release has been postponed for to long now.
In the meantime, the lead developer of the Empathy client has confirmed that updates have been released that should dramatically reduce the impact of the bug. Thanks, Sjoerd! I haven't tested the new client yet though.
Although I'm having trouble identifying the exact cause, I did manage to make some progress. While investigating, I've discovered a number of smaller and bigger issues in the PEP / pubsub routines of Openfire. There's no obvious direct link between these issues and the memory leak that we've been discussing here, but the issues that I'm addressing now are likely candidates, in the sense that these kind of (concurrency-related) bugs are known to cause these kind of problems. I'm currently busy rewriting parts of the PEP routines. I'm hoping that my general improvements make this problem go away (or at least help me to identify the cause). This borders on the edges of educated guessing and wishful thinking, but hey, it's the holiday season. ![]()
Regards,
Guus
Thanks for the detailed update Guus. Much appreciated. If I could buy you a beer I would ![]()
Have a great holiday ![]()
Hi, Guus and All!
My configuration:
LDAP (AD) users and groups (around 120 users), rosters are distributed via shared groups, around six various plugins (I've been trying disabling all of them while digging this problem, so the list is not important).
My environment:
Server (current): Openfire 3.7.0 release (started using with Openfire 3.6.4 release and further through some nightly builds and beta).
Clients (current): Miranda IM 0.9.19.0 (started using with Miranda IM 0.9.10.0 and further through stable releases).
I've started suffering from daily memory runouts since putting 3.6.4 in production, Java memory configuration was default, min=max=64 MB. Then, I've tried increasing memory to min=64 MB max=256 MB, but it extended total memory runout period only (to about 2 days). My temporary solution was scheduling a nightly server restart - and even if it was pretty acceptable in my (corporative) environment, it is not good for any server at all.
I've read about Empathy/PEP/Java-memory some time ago, but: firstly, we don't use Empathy at all; and secondly, it was not clear to me what actually PEP is. So, I've been waiting for Openfire 3.7.0 release, hoping that it would solve my memory problem - and when 3.7.0 was finally released - and the problem was not solved - I've started analyzing and digging actively. I must say, that Java Monitor is really a great tool for troubleshooting Java apps & servers (thank Guus for telling about it), it helped to see what's going on inside my server (and keeps telling me that).
When influence of all of my plugins has been excluded, I've recalled this PEP thing, and have carefully read about it. I've found out that it actually intercrosses with my other issue, but no one (not even guru's
) have pointed that out to me. Now, I confirm that disabling PEP solved my long-suffered memory runouts (leak) problem, and here are a couple of related findings:
1. I think that this problem is not related to Empathy only, I suppose it is related to any client, that supports PEP (i.e. sending/receiving Mood/Activity/Tunes information or other "personal events") - and this aspect should be pointed out here or in some other place - so that people would be clearly aware of it. If "personal events" are not actually being sent by clients (i.e. clients are not capable of doing so) - it doesn't matter if PEP is enabled or not, the problem comes out only when PEP is actually used. And, until PEP realisation in Openfire is a known memory hog (until bugs are found and fixed) - wouldn't making it disabled by default (out-of-box) be a good idea? I guess there are very few folks that use it consciously (and very few clients that support it).
2. As far as I understand setting property xmpp.pep.enabled to false doesn't actually disable PEP capability advertising by server, it simply disables processing of events of this given type - becase after setting this property and restarting server I still get (while connecting):
<iq type="result" id="****" from="****" to="****@****/****">
<query xmlns="http://jabber.org/protocol/disco#info">
<identity category="server" name="Openfire Server" type="im" />
<identity category="pubsub" type="pep" />
<feature var="http://jabber.org/protocol/pubsub#manage-subscriptions" />...
<feature var="http://jabber.org/protocol/rsm" />
</query>
</iq>
So, is there a way to disable PEP-advertising (i.e. completely disable PEP)? If there is no such way - it would be desirable.
The point is that Miranda IM clients (I don't know about other clients, including Empathy) show "personal events" selecting/setting menus only when this capability is advertised by server, and if it's not - there are simply no such menus (I've found this out long ago, by trying to connecting to some public non-PEP non-Openfire servers). Thus, it's not obvious to people why they have these menus and they can't set "personal events".
I have attached a patch that should fix this (and related) problems in the JIRA issue (
OF-82). I'd be greatful if someone would give it a test.
Anyone?
Your patch did not patch cleanly against the 3.6.4 source. At least not for me.
jsu2,
Did you get the same error I mentioned in the Jira ticket? If so, you need to convert two of the files from CRLF to unix (dos2unix) and then try to apply the patch
daryl
I have committed my patch - all of the code is in SVN trunk right now. This should simplify getting things started considerably.
We are having serious memory leak issues that are NOT related to Empathy or pep. I started another thread for that incase anyone else is experiencing these issues: http://www.igniterealtime.org/community/thread/40791
Thanks
Daniel
I figured out how to capture a memory leak report on the latest build (September 30, 2011). Memory usage was heavily concentrated in PEPService, TaskQueue, and the language modules.
I can confirm that setting the property xmpp.pep.enabled set to false prevents a memory leak in openfire 3.7.
We had some folks using a few different clients (adium, pidgin, etc) and at least one of them was definitely causing a memory leak and we'd have to restart openfire at least once a week with a heap size of around 1 GB.
Now that we've put that property in place, it's been smooth sailing ever since. BTW, we have 3 other servers where we tightly controlled which clients were allowed to connect and they did not suffer from the memory leak presumably because the client used didn't use PEP features.
In my opinion, this is a MAJOR bug with openfire - a client program should not be able to take down your server by generating a memory leak. The server should correctly handle whatever the client does to prevent this.
I'm rather surprised that this issue has persisted for so long and that it hasn't been addressed...
It was addressed, but this project doesn't how much manpower (developers) to do patches and this issue seems to be complicated one. Patches are welcome.
I believe that this is in fact fixed with the changes I made to pubsub about a month ago. The problem was in fact in pubsub, not PEP specifically.
I would encourage you to try the nightly build if at all possible as I would like have some help in verifying that it is indeed no longer an issue.
I want to report that the memory leak still exists on 3.7.1 and setting xmpp.pep.enabled=false seems to solve the problem.
Yes, that is already known. I was referring to the next version, the as of yet unreleased 3.7.2, which is why I was mentioning that you would need to download the nightly build.
There are instructions related to that in the bottom of this thread. Unfortunately, the nightly build from the downloads page has not been updated since December, so that isn't of any use.
Not sure if this will help anyone but I was able to fix memory issues I was having by pointing openfire to a different, 64 bit JRE. The OS is RHEL 6.1. Here is my config file from /etc/sysconfig/openfire.
JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre
OPENFIRE_OPTS="-Xms256m -Xmx1024m -Xss128k -Xoss128k -XX:ThreadStackSize=128 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:NewRatio=2 -XX:+PrintGCDetails -Xloggc:/opt/openfire/logs/gc.log"
I run openfire 3.7.0. Other than a few administrative changes yesterday, I haven't had to restart Openfire in many many months after using that JRE. I keep the gc log enabled since it barely takes up any space (now that there's no memory errors
). That was how I was able to figure out where the problem was originally.