Clustering plugin for Openfire is now open source

We are happy to announce that the clustering plugin is now available as an open source plugin. The clustering plugin adds support for running multiple redundant Openfire servers together in a cluster. By running Openfire in a cluster, you can distribute the load amongst a number of servers, as well as having some form of redundancy in the event that one of your servers dies.

By making this functionality open source we now made 100% of the old Enterprise plugin open source. The reason why the clustering plugin came last is that it relies on Oracle Coherence, that is a commercial product, so to make it open source was a little tricky. At the end what we did was to open source our implemented functionality but to use this plugin you will need to get a valid Oracle Coherence license. The readme file explains the steps to follow to install this plugin. Moreover, it also explains how to setup your environment if you plan to develop new versions of the plugin.

Have fun,

– Gato

That is great to hear Gato. I’m excited to see how all the other clustering efforts pan out as well.

There are several Oracle Coherence flavors available. Which one is recommended here?

We used to bundle the Enterprise Edition. But I would say that the other 2 editions (Standard and Grid) should also work fine.

Sounds good. I’m cheap, so the standard looks like a good place to start. By the way, I’ve set up the 1st node with the free development version and get the following exception when clicking on the Users/Groups tag in the admin UI.

HTTP ERROR: 500

org.jivesoftware.util.cache.DefaultCache cannot be cast to com.jivesoftware.util.cache.ClusteredCache

RequestURI=/user-summary.jsp

Caused by:

java.lang.ClassCastException: org.jivesoftware.util.cache.DefaultCache cannot be cast to com.jivesoftware.util.cache.ClusteredCache
     at com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.getLock(CoherenceClusteredCacheFactory.java:351)
     at org.jivesoftware.util.cache.CacheFactory.getLock(CacheFactory.java:363)
     at org.jivesoftware.openfire.spi.PresenceManagerImpl.loadOfflinePresence(PresenceManagerImpl.java:526)
     at org.jivesoftware.openfire.spi.PresenceManagerImpl.getLastActivity(PresenceManagerImpl.java:155)
     at org.jivesoftware.openfire.admin.user_002dsummary_jsp._jspService(user_002dsummary_jsp.java:361)
     at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
     at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
     at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487)
     at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1093)
     at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:118)
     at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)
     at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
     at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:66)
     at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
     at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:42)
     at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
     at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:70)
     at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
     at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:146)
     at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1084)
     at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
     at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
     at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
     at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:726)
     at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
     at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
     at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
     at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
     at org.mortbay.jetty.Server.handle(Server.java:324)
     at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
     at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829)
     at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:514)
     at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
     at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
     at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
     at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)

Nice release Gato.

At least one can look at the plugin implementation and modify it to work with Terracotta or any other free clustering solution.

LG

This is a very welcome release for the Openfire community, thank you!

One clarifying question I have though… does each server that’s going to join the cluster connect to their own instance of the database, or do they all connect to the same instance? The documentation and various posts in the forums don’t make this very clear.

jerwillk, from what I’ve read on the forums, all the nodes of an openfire cluster will connect to the same database, unless you will create a database cluster as well. This way you will avoid a single point of failure.

Does anybody know how to fix the casting issue mentioned above? I get it when clicking on the Users/Groups and also when somebody tries to connect to my server.

Hey Yuriy,

You are 100% correct about the database setup. All nodes need to be connected to the same DB. Running a clustered DB avoid a single point of failure if you need such requirements.

About the ClassCastException I think I saw it when I quickly tried with the very latest version of Coherence. In the readme.html I mentioned with which version I tested clustering and works fine. Having said that, iirc I had to remove an xml from a coherence jar file. When I have some free time I will post instructions on how you need to modify the jar file and/or fix that ClassCastException.

– Gato

Just wanted to note for those who may be struggling with the 3.5 Oracle Coherence Java libs that they are not a straight drop-in replacement for the 3.3 libs. I tried to build the plugin using the latest 3.5 libs and it failed. Based on the build errors and a review of the API docs for 3.3 and 3.5, the changes seemed to be a bit more than I was prepared to address and test at this time. I ended up pulling the 3.3 libs from one of our existing clustered setups to get the plugin to compile, since I couldn’t find the 3.3 version of the libs on Oracle’s site anywhere.

Not sure if this has any bearing, but we’re also running an older 3.6 OF build and are not experiencing the 500 errors as noted above.

The clustering plugin seems to be working fine, although we haven’t finished testing yet.

Thanks Gato for releasing the clustering plugin code. It will certainly help with our efforts going forward.

Thomas, would you be able to share the 3.3 libs somewhere, please? I can’t get hold of them anywhere on the website either.

I’m with Yuriy on this one.

Are they available somewhere else? It seems Oracle only offers their latest 3.5.2 release.

Yes being able to look at the base implemenation here has helped speed up the development of the clustering plugin we have been working on.

I have started a new branch for the jboss cache clustering solution, it is available here:

http://github.com/macdiesel/openfire-jboss-clustering/tree/optimization

I’ve run load tests today against a 3 node cluster connecting over 63,000 users. It’s starting to look very promising and with some additional work could be a great contribution to the community.

About the ClassCastException, last weekend I was playing with Coherence 3.5.2 and how to use it with the plugin and figured out what needs to be done. I will update the documentation to explain how to install a brand new coherence.jar files and use them with the plugin. Follow these steps for now:

  • Download coherence’s jar files as explained in the readme.html
  • Back up coherence.jar (since we are going to modify it)
  • Edit coherence.jar and remove the coherence-cache-config.xml file
  • Use the modified coherence.jar in the Openfire’s lib folder

Hope that helps. Let me know how it went.

Regards,

– Gato

Works like a charm!

Thanks Gato

So, what it takes to add a new node to a cluster? Just create another instance of openfire and make sure that it uses the same database as the primary cluster node?

Gato, so do you still need to use connection managers in conjunction with the clustering solution or this would be excessive and counterintuitive to the whole notion of clustering?

Thanks Gato. Now clustering is working, but users can’t connect to any server.

“Can’t connect to server: invalid name or server not reachable”

What may be wrong?

Same result here. Apparently it is working, but you can’t connect because none of the Openfire servers of the cluster is listening to the xmpp ports anymore…

I tried restarting it (with the clustering option deactivated via openfire.xml), and then turning clustering back on, that way it will listen to the xmpp petitions, but will give a error 500 and null pointer exception as soon as you login with a user (using any xmpp client).

Thank you. This helped me to get started!

Nicely done, Xenz. I have done pretty much the same as you following your instructions: a couple VM’s using Debian 5.0 with openfire installed, with the database hosted in another machine.

It works great: users see each other, messages arrive, and the caches work flawlessly.

Anyway, I wonder if step 8 is mandatory. I already noticed that the coherence-cache-config included in coherence and the one included in the plugin are completely different, but I supose that the plugin works with the xml provided with the coherence plugin, and overrides the JAR one.

I hope someone can throw in a bit of light around this.

Thanks to all for the nice work.

Sergio,

You are right. Step 8 is not required. However as I said in this thread this is what you have to do

    • Edit coherence.jar and remove the coherence-cache-config.xml file
    • Use the modified coherence.jar in the Openfire’s lib folder

Coherence will find and use the coherence-cache-config.xml file that we provide as part of the clustering plugin.

Regards,

– Gato