Openfire hangs after X days; requires restart

Hello all,

I am hoping to get some direction with my Openfire installation. I have a CentOS 6.3 x64 install that runs, among other things, Openfire 3.7.1. I have approximately 150 users, but only 50 or 60 are connected at any one time. The problem is, after so many days, Openfire hangs and everyone loses their connection. The only way I can get it up and running again is to kill the java process Openfire runs in, remove the openfire pid file, and restart Openfire.

I’ve tried a couple of things. One, I tried removing the webchat plugin because there were error messages in the log files, I am currently not using the plugin. That didn’t help. Two, I tried switching Openfire from the included 32-bit JRE to the 64-bit OpenJDK Java environment that came with the the operating system (the system default). That didn’t help either. Three, I increased the available memory to the Java JRE to more than a gigabyte of memory. Unfortunately, I’m still having trouble.

Here are my startup parameters in /etc/sysconfig/openfire

Set this to the path where openfire lives.

If this is not set the script will look for /usr/local/openfire, then

/opt/openfire.

OPENFIRE_HOME=“/opt/openfire”

If there is a different user you would like to run openfire as,

change the following line.

OPENFIRE_USER=“daemon”

If you wish to change the location of the openfire pid file,

change the following line.

OPENFIRE_PIDFILE=“/var/run/openfire.pid”

If you wish to set any specific options to pass to the JVM, you can

set them with the following variable.

OPENFIRE_OPTS=“-Xmx1024m”

If you wish to override the auto-detected JAVA_HOME variable, uncomment

and change the following line.

JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/jre/

This is what I was seeing in my Openfire error log:

2012.11.11 07:20:11 org.jivesoftware.openfire.container.PluginManager - Error loading plugin: /opt/openfire/plugins/webchat

java.lang.NoClassDefFoundError: org/mortbay/jetty/servlet/Context

at java.lang.Class.getDeclaredConstructors0(Native Method)

at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406)

at java.lang.Class.getConstructor0(Class.java:2716)

at java.lang.Class.newInstance0(Class.java:343)

at java.lang.Class.newInstance(Class.java:325)

at org.jivesoftware.openfire.container.PluginManager.loadPlugin(PluginManager.java :420)

at org.jivesoftware.openfire.container.PluginManager.access$300(PluginManager.java :80)

at org.jivesoftware.openfire.container.PluginManager$PluginMonitor.run(PluginManag er.java:1067)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201 (ScheduledThreadPoolExecutor.java:165)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Schedu ledThreadPoolExecutor.java:267)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at java.lang.Thread.run(Thread.java:679)

Caused by: java.lang.ClassNotFoundException: org.mortbay.jetty.servlet.Context

at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

… 16 more

I have since removed the plugin, however, and I’m still seeing the Openfire connection drop. I have wondered if there is a memory leak somewhere but I’m not terribly familiar with administering java so I’m not sure how I would look at that.

One thing I need to mention is that I am using MySQL for storage.

I also see this in my warning log (does this indicate some kind of a problem?):

2012.11.12 10:34:22 org.jivesoftware.util.cache.DefaultCache - Cache Roster was full, shrinked to 90% in 0ms.

I’m not seeing anything else in my logs that look suspect, so I’m at a loss at this point. Any help is most appreciated.

There is a known memory leak (check the Announcements at the top of the community home page) related to PEP. You should disable it if you are not actually using it.

Some discussion here.