Re: Openfire 3.9.3 severe memory leak?!

jmap -dump:format=b,file=/1gb-fs/file.bin openfire-pid

Maybe you need to run this as daemon.

openfire-pid would be java pid? I’m running Openfire as a daemon openfired, which runs /bin/openfire.sh

Jmap is not recognized as a command. Anyway, i wasn’t able to take a dump. Downgraded already.

jmap included in the Jdk. I assume you use the OF Jre.

Actually i’m using the one bundled with Spark.tar.gz version Arch linux removed Oracle’s java from repositories, so i decided to update my Java manually just replacing the old java folder.

I believe the change for OF-764 may be adversely affecting memory utilization in some circumstances. Can you confirm whether you have significant MUC traffic (or history) with rooms that have conversation logging enabled?

The change I made was to load full history for MUC rooms by default rather than limiting it to two days, and also provided a new system property (“xmpp.muc.history.reload.limit”) to optionally limit the history to a given number of days. I am planning to modify this slightly for the next release to restore the original limit of two days by default, while still allowing an override via the system property.

Note that this issue would only affect deployments that use MUC rooms with conversation logging enabled.

We don’t actually have high usage of rooms. There are only 3 persistent rooms which are rarely in use (and probably a few sporadic dynamic rooms now and then). Service is configured to show 15 last messages. Default setting is to log rooms. Actually i like the OF-764 change. Because without it after a few days the room would show nothing (2 days limit). Now it shows 15 last messages no matter how many days has passed. Changing this setting is ok, but it won’t help the ones who need it and they will face the memory hog. Though i doubt it is related.

We didn’t use rooms and still saw the issue. Strangely enough, Messages on OS X Mavericks became completely unusable and had to be force quit.

OK - agree, this seems unrelated to the MUC change.

[FYI when we do ship the next update, you will still be able to set a limit for the number of days (180, 365, 1000) you’re willing to load using the new system property (“xmpp.muc.history.reload.limit”). However, for existing installations it seems prudent to keep the original default in place. Daryl mentioned one test system he had that had a very large MUC history, and Openfire failed to start (after 90 minutes or so) because it was trying to load everything into memory.]

In any case, seems like we still have a memory consumption issue to deal with. I’ll be nosing around, but feel free to post any additional clues you might come up with.

Wondering if this feature can be more flexible/dynamic. Maybe it can check the “show x messages” setting of the service and load only such number of messages to the memory instead of the whole 1000 days history, when only maybe 15 last messages are actually pushed to the client. Then it would be admin’w who sets “show 1000 messages” problem to start a server

Can we treat this report a real issue in the code and not a setup related problem? I could not find a related bug in the issue tracker. Is a heap dump available for debugging purposes? I am reluctant to move my user base into 3.9.3 and being hit by this bug. On the other hand: If we do not have a suitable heap dump, this might be a way to go…

There is no ticket or heap dump available. As usual with memory leaks it’s very hard to pinpoint the cause. So you can just wait or try upgrading and watch memory consumption (i’ve been checking JVM peak memory once per day and it grew to 400 MB in a few days, in 6 days it stalled, when it was running fine for 30 days on 3.9.1, though still coming to a stall after a month). There was a leak in 3.9.1 (or even earlier version), but now it is leaking faster. Maybe it is even the same leak, but now its effect is tripled because of some recent changes in the code.

Maybe we should file a ticket for this? Though i’m not sure what versions to call the affected ones. As i said, the problem started long ago (at least for me). Though i can’t point exactly when it started and whether it started after one of the Openfire upgrade almost a year ago. It coincided with our move to Spark. So’ i’m thinking it can be related to some Spark activity which on bigger scale draws server’s memory.

Ryan, what client do you use and how many users?

We are an overwhelming Mac environement (Mavericks 10.9.3) with a few instances of PCs running Pidgin.

I know that OpenFire 3.9.1 didn’t have this memory leak issue. I didn’t try 3.9.2 as 3.9.3 was already out when I went to upgrade.

Nobody replied, so i went ahead and created a Blocker ticket for this OF-813

Is this maybe related to this report?

Do you see a huge amout of (dead?) sessions, when memory is high?

Can’t tell now (using 3.9.1 on production server). Though i don’t remember seeing unusual sessions count when looking at Sessions tab occassionally. It will probably be hard to reproduce on a test server.

Also, our users use only one client, so there is almost no chance for them to have multiple resources in the sessions.

Follow up. This Monday i have moved my server to a virtual Windows Server 2008 R2 x64 host (with 2 GB of RAM). I have installed Openfire 3.9.3 for a test. It is running as a service. I’m using the latest Java 1.8.0.20 x86. I have also set vmoptions for the service to 512-1024 MB. And so far everything runs smoothly. At first JVM memory climbed to 300-400 MB and i thought next day it will run out of memory, but next day it was just 100-200 and it usually is staying around 200-300. Around 150 connected users every day.

I think maybe this issue is platform related. Maybe on linux it is behaving a bit differently. Or maybe vmoptions helped (i had it on linux set to 256-768 i think). Or maybe latest Java is helping. On my old Arch linux box (also virtual) i had problems updating Java as they have removed it from repositories, so i was running on some 1.7.0.0 version. Also the Arch itself was old as i couldn’t catch up with them changing how system is starting/working fundamentally (one of the reasons why i have ditched it). Anyway, i will continue to watch memory usage and report if memory problems arise again. Last time on the old server Openfire 3.9.3 halted after running half a day or less. Maybe that also explains why there were so few memory leak option selections on the recent poll.

Another bonus is that logs are now working (on linux there is some bug with it, Daryl have found and fixed it in some version, but it wasn’t working for me still). I see lots of warning that roster and vcard cache is reaching its limit and is reduced to 90%. I’m planning to try increasing these caches a bit. This might affect the memory usage also.

Another observation. Installing and removing a plugin drops JVM memory drastically, to 50-60 MB and then it starts to grow again slowly. I’m thinking, maybe it was one of the plugins responsible for the leak. I’m not only using Broadcast, Client Control and ServerInfo. On the old server i was probably also using Just married and Kraken XMPP gateway at some point, though not recently. Not sure which plugins were in use when i’ve experienced a leak.