Introducing Hazelcast ... a new way to cluster Openfire!

To ensure that the member nodes in an Openfire cluster are able to find each other, try using TCP-based discovery in lieu of the default UDP/multicast configuration (described above) and see if you get a better result. Upside for this approach is a reliable point-to-point communication path between servers; the main drawback for using TCP is the need for static configuration for the well-known cluster member(s).

As for your database cluster, in theory it should work just fine, but I personally have not used a multi-master setup for MySQL. Perhaps you could give it a whirl and report back with your findings.

Please also note that the new Hazelcast clustering plugin is only compatible with Openfire 3.7.2 Beta (not yet released) and newer. If you would like to try it using an older Openfire, try using this custom build which has been back-ported by Dele (one of our helpful @community advocates).

I met a critical bug while using hazelcast based cluster.

I always got the following exception. Then openfire server hangs, which means the process still exists but it does not work.

I use openfire 3.7.2 beta, hazelcast 2.3.1 (or hazelcast 2.4 still has this error.)

2012.10.29 14:57:47org.jivesoftware.util.cache.CacheFactory - Hazelcast Instance is not active!

java.lang.IllegalStateException:Hazelcast Instance is not active!

atcom.hazelcast.impl.FactoryImpl.initialChecks(FactoryImpl.java:711)

atcom.hazelcast.impl.MProxyImpl.beforeCall(MProxyImpl.java:102)

atcom.hazelcast.impl.MProxyImpl.access$000(MProxyImpl.java:49)

atcom.hazelcast.impl.MProxyImpl$DynamicInvoker.invoke(MProxyImpl.java:64)

at$Proxy0.getLocalMapStats(Unknown Source)

atcom.hazelcast.impl.MProxyImpl.getLocalMapStats(MProxyImpl.java:258)

atcom.jivesoftware.util.cache.ClusteredCache.getCacheSize(ClusteredCache.java:1 40)

atorg.jivesoftware.util.cache.CacheWrapper.getCacheSize(CacheWrapper.java:73)

atcom.jivesoftware.util.cache.ClusteredCacheFactory.updateCacheStats(ClusteredC acheFactory.java:344)

atorg.jivesoftware.util.cache.CacheFactory$1.run(CacheFactory.java:636)

Can anybody give my some advice?

I succed to see the stdout via a script I created.

Both nodes can see each other:

INFO: [192.168.1.11]:5701 [openfire]

Members [2] {

Member [192.168.1.11]:5701 this

Member [192.168.1.12]:5701

}

No clue about my first problem ? the fact I can’t see online contact when I enabled the plugin (even with only one server) ?

Strange, once I got the two nodes seeing each other, users, sessions, MUC rooms, all started working correctly. What does the openfire console say? Does it list both servers?

Does it mean if the clustering plugin is enabled, we have to have two node at least for OF can work ?

We have successfully tested our cluster using a single node, so that should not be an issue. Do you have any other plugins installed that may be conflicting with the cluster’s cache configuration?

I don’t know what appened, but OF crashed then each time I tried to go to the admin panel it asked me to setup OF, so I installed OF from scratch and now there is no plugin but the clustering one installed and enabled.

No, I guess it can work with a single node, too (you obviously have two nodes now, as indicated in the output). I meant that when everything is ok, you can see both nodes in the OF’s console. That’s a good way of checking whether everything is in order or not. It is possible hazelcast instances can see each other, but OF didn’t connect them properly, for some reason.

Based on this exception, it appears that the cluster failed to start, but the cache statistics thread started anyway. Were there any other errors reported in the console or error logs? My guess is that you have a configuration issue, perhaps short on memory, or maybe a classpath conflict. Are there any other plugins deployed in your test cluster?

I can see the both node in the admin console of OF. I can join MUC from other servers but I can’t see anyone, as in my roster: I can see only contacts from my own server.

Maybe all connections goes to one server due to DNS setting ?

I think the “master” don’t share the connections with the other node when I recieved one.

No other plugin except admin. The configure is rather simple too, only 2 multicasted nodes with default configuration. And the cluster can be started correctly, but one or two nodes would hang after compress test.

I also settled a cluster with 4 nodes (all nodes started successfully) but without compressing test, all 4 nodes hanged after one or two days.

Stupid question: are all users in the same group? I mean, I had that once, users didn’t see each other, only to discover they weren’t in each other’s roster to begin with.

I’m not sure to understand.

How user can be in the same group if they are on different servers ?

Well, there’s a single DB regardless. It’s like users on Yahoo messenger: they may be online, but until you add them to your own roster, you can’t see them.

What you need to do is add each user to the other’s roster first.

Well, it’s not necessary. For exemple, I can see online people in MUC if I don’t enabled the clustering plugin.

In my roster, I can see users from google, jabber.org, etc. are online when the plugin is disabled

Ok, so users are in the same group. If I understood correctly, the same users stop seeing each other once you enable clustering and they connect to different servers. I’m out of ideas, this worked fine when I tried it.

Yes, if the plugin is enabled, users of my server can see the presence of the others users in their roster of my server, but not the one of users on other servers.

Edit: There is another issue, PubSub doesn’t work when the plugin is enabled.

I’ve encountered an issue in using MUC which I think could either be a misconfiguration on my part, or possibly a bug?

I’ve got 2 nodes setup with the clustering plugin running on OF 3.7.1, and talking to each other over direct TCP rather than multicast. There’s only 2 users on node 2 so far, with all the other 2000 users connecting to node 1.

In a MUC room, when either of us on node 2 join the room, we only get a partial list of current members of the room. Those of us on node 2 can still see anything that anyone on node 1 says in the room, but until they leave and rejoin, they don’t appear to be in the room. If those of us on node 2 leave/rejoin the room, then the list of current members in the room shrinks back down again.

Does this sound like a configuration issue? Or is it a problem with the members of rooms not being synced across the nodes?

I haven’t used the MUC components, but it sounds like a glitch with the cached presence information that is shared across the cluster for multi-user chat. Have you noticed any error messages in your log files that might help isolate the issue?

I wonder if your case would work using the older Coherence-based clustering plugin … might be worth a try if you have the time/inclination to set it up. That would also help clarify whether the issue is with the MUC component or the Hazelcast plugin.

Hello

Can I used Connection manager before Openfire with Hazelcast???

Now, I tested it. But it’s failed. Client can’t login.

Thanks

Nick