1.2.1_beta Clustering Plugin

With the newest releast of Openfire (3.7.0) it was found that the clustering plugin would no longer work. Although understandable it made my life pretty difficult. So I decided to dive in and get it working with 3.7.0. Attached the the JAR file that I’ve compiled to get things working once again with clustering. However, I am still finding uncommon issues, which I’ve expressed below (so use the plugin at your own risk).

Notes:

It’s important to note that this has been tested with a Linux environment only! Although this is the case I don’t believe it will be a huge issue, but further testing will show from the user community.

Pre-req’s:

It’s important to note that there are few pre-req setups you must do prior to getting this up an running. They are expressed here:

  • You must take the “tangosol-coherence-override.xml” file found within the plugin and place it within the root of your server (Example: “/” within Linux) otherwise coherence will use it’s default configuration, which yields Openfire errors.
  • You must take the “coherence-cache-config.xml” file found within the plugin and place it within Openfire’s {openfire_home}/lib directory otherwise coherence will use it’s default configuration, which yields Openfire errors.
    Neither of the above pre-req’s are desirable. If anyone has any advice on how to enable coherence to use the exiting xml files already within the plugin without having to push them to other locations upon the server, then that would be awesome. I am guessing the reason on why this needs to happen has something to do with how Openfire currently loads classes into it’s associated plugin classloaders?

Issues:

  • The sessions page at times (quite randomly in fact) complains that it cannot find the information provided within with the session class (Address information). Although this might simply be a rendering issue at best, it does give way to the fact that there could potentially be syncronization issues with coherence. BUT, we have not have any user complaints, so I am going to weigh this against the fact that the session either expired or something else under the Openfire hood is simply not being caught. I’d love some feedback on this one.

This plugin has been within our production environment for a few weeks living within a cluster of 4 Openfire servers.

I hope this helps those who need this plugin as well as promote further feedback in regards the issues I’ve expressed.

Also, since this is a pre-existing plugin I am more than happy to commit the source to the repo for review. What’s the process to get something like this underway?

Cheers,

-Pat
p1.zip (421 Bytes)
clustering.jar (121437 Bytes)
clustering_src.zip (94100 Bytes)

1 Like

Hi Pat,

We are desperate for more openfire developers and will do about anything to get more people committing code Do you have a Jira account? If so, a first step is to get you elevated perms there and then review a few patches, and wa la! Sounds good?

daryl

Sounds great Darly, sign me up!

I encourage people that have issues with this plugin to let me know what they experience. I will be happy to help investigate wherever possible.

Hi,

Great! What’s your Jira account?

daryl

patweb99 is my account.

Your all set, open some tickets, submit a few patches!

Hello Pat.

What version of coherence do you use? I use coherence 3.4.2b411 but has error:

ERROR com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.startCluster(Coheren ceClusteredCacheFactory.java:161) - Unable to start clustering - continuing in local mode

(Wrapped: Failed to load the factory) java.lang.ClassNotFoundException: com.jivesoftware.util.cache.JiveConfigurableCacheFactory

at java.net.URLClassLoader$1.run(URLClassLoader.java:202)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:190)

at java.lang.ClassLoader.loadClass(ClassLoader.java:307)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)

at java.lang.ClassLoader.loadClass(ClassLoader.java:248)

when enabling cluster.

hi santora

you jar file is compiled with jdk 1.7*,how can i get along with jdk 1.6*,so can you compile it for me or provide us with an address to dowload the source file?

Hi Pat

I am currently testing your beta version off the plugin.

The problem with the Openfire Cluster plugin that I have been experiencing is the following.

First: We are using the PubSub feature in XMPP heavily. A server is sending a PubSub to up to 10 PubSub Topics, every 10 seconds and somtimes in a busrt.

Openfire it self can handle this well, but the problems start when I use Custering.

The Setup

2 node cluster (node1 and node2)

1 loadbalancer (round robin setup to node1 and node2)

1 pubsub server that connects to node1

Clients, a rampup test with up to 2500 clients subscribing to 1 to 2 pubsubs.

The problem.

50% of clients connect to none1, 50% connect to node2

Ramping up 2500 users and every thing runs fine, when I start to remove the users, the problem starts.

The problem seams to be the following (can be seen when debug mode is enabled).

The node1 that has the connection from the PubSub server does always try to send the pubsub to the clients on the node2 server, even if they have disconnected. Leaving the node1 in heavy cpu usage. It seams to be that node1 still holds the sessions from disconnected clients from node2.

Can you provide Support for The Clustering Plugin?

Can you share the source off you Clustering Plugin?

Hi,

The trunk repo is situated at: http://svn.igniterealtime.org/svn/repos/openfire/trunk/src/plugins/clustering/ .

I’ll have a look at the session code, there might come something up… :slight_smile:

BTW: have you tried to configure the node?.. XEP-060 (pubsub) supports different configurations for the sessions/users and publishing of events.

E.g. pubsub#purge_offline, pubsub#presence_based_delivery could be important pointers to this problem. :wink:

It determine whether to deliver notifications to available users only.

-Cheers!

/Steffen

I guess Openfire does not support: pubsub#purge_offline

I have thought of “pubsub#presence_based_delivery” I tried to get it to work but. It seams to be that in that case no pubsubs are deliverd. I might be doing somthing wrong there. I just changed the node configuration to presence based delivery in the database. And as usual connected and sent out pubsub, but users that are online with presence and subscripbed to a pubsub do not get any messages.

Debug log: Node 1 trying to pubsub to a disconnected user that was connected to node 2. If 100 users where on node 2 and disconnected this it is repeated to all of them…

{code}

May 26 10:45:19.659 localhost DEBUG: 2011-05-26 10:45:19,659 (client-3:) [org.jivesoftware.openfire.spi.RoutingTableImpl] **Unable to route packet. No session is available so store offline. <message from=“pubsub.openfire.betware.comto="fia@openfire.betware.com" **id=“home/openfire.betware.com/livebetting/TOPIC_LiveBettingEventUpdateEngine_EVE NT_DTO_ALL__fia@openfire.betware.com__ckC8k”>eAGdl1ts21QYx3N1m0yl69Ikq9Zu7ja6jrVenN5WwdTbujVla6el3QCNgRufpGkdO9hO2g0x4QISCK Q9c3tAXJ7QXtAkHtgDIMTLeOGFV/YIEhKwSYiLBt+52E6yZEvxtNSXc77v+//O/xwfh1tmsgraFArIMK RcXs0JeU2Y0nXp8oymKChj5jU15PeGX51ZzGgFYQWZG5KOBFkrSHlVQG…

May 26 10:45:19.659 localhost …WkmoJsasL0xiw+P7G02FHUtZwO4aIKknIltCAVUPtmQTkhmdIOI6P:

May 26 10:45:19.659 localhost …v/uQ3X0yUpU5ckFTssZKSc/h/ruhf6t32vAlpT9f9nBdFQwn4L7+1:

May 26 10:45:19.659 localhost …vyAjXPJdc8EPHhPCYjtvtoADqwbjGOeWg0QufmL8/W0HHbU/O8MQ4N2MLj0MEvPf8xoGN1kgV7je FooXNpTODJpiMLr4HPdVhsUBY+nXhRYBt6tod78HaBTWPOWqM4fFB/c682skLNOR07oajGmyqrzquNZa xeom/+9Bn4JOK: xmlns=“http://jabber.org/protocol/shim”>j0pVu4oHbXH74OXC0H9ywB696oC9lb4c1BisL3jd

May 26 10:45:19.659 localhost DEBUG: 2011-05-26 10:45:19,659 (client-3:) [org.jivesoftware.openfire.spi.RoutingTableImpl] RoutingTableImpl:** Failed to route packet to JID: fia@openfire.betware.com packet: <message from=“pubsub.openfire.betware.com” **to=“fia@openfire.betware.com” id=“home/openfire.betware.com/livebetting/TOPIC_LiveBettingEventUpdateEngine_EV ENT_DTO_ALL__fia@openfire.betware.com__ckC8k”>eAGdl1ts21QYx3N1m0yl69Ikq9Zu7ja6jrVenN5WwdTbujVla6el3QCNgRufpGkdO9hO2g0x4QISCK Q9c3tAXJ7QXtAkHtgDIMTLeOGFV/YIEhKwSYiLBt+52E6yZEvxtNSXc77v+//O/xwfh1tmsgraFArIMK RcXs0JeU2Y0nXp8oymKChj5jU15PeGX51ZzGgFYQWZG5…

May 26 10:45:19.659 localhost …KOBFkrSHlVQGWkmoJsasL0xiw+P7G02FHUtZwO4aIKknIltCAVUPtmQTk:

May 26 10:45:19.659 localhost …4axOztpZv/uQ3X0yUpU5ckFTssZKSc/h/ruhf6t32vAlpT9f9nBdFQwn:

May 26 10:45:19.659 localhost …bbwOEvyAjXPJdc8EPHhPCYjtvtoADqwbjGOeWg0QufmL8/W0HHbU/O8MQ4N2MLj0MEvPf8xoGN1k gV7jeFooXNpTODJpiMLr4HPdVhsUBY+nXhRYBt6tod78HaBTWPOWqM4fFB/c682skLNOR07oajGmyqrz quNZaxeom/+9Bn4JOK: xmlns=“http://jabber.org/protocol/shim”>j0pVu4oHbXH74OXC0H9ywB696oC9lb4c1BisL3jd

{code}

My appologies for not keeping up on this thread. I’ve been literally swampt. I’ve went ahead and attached the plugin src code so you can see how it works/compile it under a diff java version.

This plugin was soley experimental and my time to look into it further has almost literally vanished.

To be frank and honest I was hoping to hear something from the Openfire folks about an updated plugin by now and am dissapointed that there is nothing to support the 3.7.0 version.

Can anyone from the Openfire team comment on this?

Hello,

There is no openfire “team”. Guus kindly helps out when he can and there really isn’t anybody else. We need help!

daryl

Hi Pat,

I’m using your new clustering plugin and did other instrauctions you have specified.

I’m using OpenFire 3.7.0 and coherence 3.3.1( should this version of coherence be used? ).

When enabling the cluster i’ve got the follwing exception:

2011.11.06 16:02:59 Unable to start clustering - continuing in local mode

(Wrapped: Failed to load the factory) java.lang.ClassNotFoundException: com.jivesoftware.util.cache.JiveConfigurableCacheFactory

at java.net.URLClassLoader$1.run(URLClassLoader.java:217)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:205)

at java.lang.ClassLoader.loadClass(ClassLoader.java:321)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)

at java.lang.ClassLoader.loadClass(ClassLoader.java:266)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:186)

at com.tangosol.net.CacheFactory.getConfigurableCacheFactory(CacheFactory.java:602 )

at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:689)

at com.tangosol.net.CacheFactory.getCache(CacheFactory.java:667)

at com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.startCluster(Coheren ceClusteredCacheFactory.java:112)

at org.jivesoftware.util.cache.CacheFactory.startClustering(CacheFactory.java:580)

at org.jivesoftware.openfire.cluster.ClusterManager.startup(ClusterManager.java:27 0)

at org.jivesoftware.openfire.cluster.ClusterManager.setClusteringEnabled(ClusterMa nager.java:320)

at org.jivesoftware.openfire.admin.system_002dclustering_jsp._jspService(system_00 2dclustering_jsp.java:103)

at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:530)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1216)

at com.opensymphony.module.sitemesh.filter.PageFilter.parsePage(PageFilter.java:11 8)

at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:52)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:74)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingF ilter.java:50)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:78)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:164)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:425)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:494)

at org.eclipse.jetty.server.session.SessionHandler.handle(SessionHandler.java:182)

at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:93 3)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:362)

at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:867 )

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)

at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandler Collection.java:245)

at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.jav a:126)

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)

at org.eclipse.jetty.server.Server.handle(Server.java:334)

at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:559)

at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.j ava:1007)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:747)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:209)

at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:406)

at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:4 62)

at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)

at java.lang.Thread.run(Thread.java:636)

Any Idea,

Eitan

I’ve recompiled the plugin with Java 1.6 using the atttached code, and got the same problem

Eitan

Great to see development on the clustering support.

Do you know if there are any issues with the beta plugin and the 3.7.1 patch release?

Thanks,

Nick

I can enable the clustering pluging (1.2.1beta) but it doesnt see my other server. I have the firewall turned off and i have clustering enabled on the other server. The admin console also breaks on service restart when the cluster plugin is enabled. Is there anyway to fix this?

I get a Java Exception when i try to restart and login to the admin console

Exception:

java.lang.ExceptionInInitializerError

at org.jivesoftware.openfire.lockout.LockOutManager.getInstance(LockOutManager.jav a:73)

at org.jivesoftware.openfire.auth.AuthFactory.authenticate(AuthFactory.java:172)

at org.jivesoftware.openfire.admin.login_jsp._jspService(login_jsp.java:149)

at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:530)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1216)

at com.opensymphony.module.sitemesh.filter.PageFilter.doFilter(PageFilter.java:39)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.util.LocaleFilter.doFilter(LocaleFilter.java:74)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.util.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingF ilter.java:50)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.admin.PluginFilter.doFilter(PluginFilter.java:78)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.jivesoftware.admin.AuthCheckFilter.doFilter(AuthCheckFilter.java:164)

at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.ja va:1187)

at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:425)

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:494)

at org.eclipse.jetty.server.session.SessionHandler.handle(SessionHandler.java:182)

at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:93 3)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:362) …

Here is the error log message i get

org.jivesoftware.openfire.cluster.ClusterManager - Unable to access backing cache for Routing User Sessions. BackingMapManager is a com.tangosol.net.DefaultConfigurableCacheFactory$Manager and backing map is com.tangosol.net.cache.LocalCache

java.lang.IllegalStateException: Unable to access backing cache for Routing User Sessions. BackingMapManager is a com.tangosol.net.DefaultConfigurableCacheFactory$Manager and backing map is com.tangosol.net.cache.LocalCache

Any suggestions?

It sounds like you may have the Coherence confuration files in the wrong locations meaning that the cluster is starting up in a default mannor and not configuring the cluster containers properly for Openfire.

Within the plugin there there 2 files that I’ve expressed above that need to moved to specific locations. Can you confirm that this has been done?

Pre-req’s:

It’s important to note that there are few pre-req setups you must do prior to getting this up an running. They are expressed here:

  • You must take the “tangosol-coherence-override.xml” file found within the plugin and place it within the root of your server (Example: “/” within Linux) otherwise coherence will use it’s default configuration, which yields Openfire errors.
  • You must take the “coherence-cache-config.xml” file found within the plugin and place it within Openfire’s {openfire_home}/lib directory otherwise coherence will use it’s default configuration, which yields Openfire errors.