Introducing Hazelcast ... a new way to cluster Openfire!

Anyhow deleting nodes is a completely different thing than adding nodes. So I would not mix these. Likely it’s not possible to try to delete an non-existing node, so there will never be an exception during deletion.
We need to straighten out the terminology here, we are talking about items not nodes (nodes are what the items are published to, the messaging queue if you will). Anyway, functionally speaking, they are obviously different, but that is not really the issue here. The code in question is really just a means of flushing the in memory cache to the persistent store, and in that case we are actually doing the same db operations. That being said, at the bottom you will see that I propose we change that to something inline with what you are suggesting.

@ “The fact is, if an exception occurs when we do a flush, we will lose any new items that have been added, up to the size of the cache.” I wonder whether this is in sync with the XEP.
This sort of thing isn’t covered in the spec as this is an implementation detail. Of course we don’t want to ever lose data, but I am not sure that can be accomplished with a simple in memory cache and generic database access as OF currently supports (I don’t claim to be an expert in this regard though). We can, of course, set the cache size to 0 as I said earlier, thus forcing db access on every publish and delete. This would probably work fine for many cases actually, and maybe should be the default. It just becomes a problem when we get into massive amounts of publishing as the IO will become a bottleneck. This would then put the onus on the system owner to configure the cache with the full knowledge that the boost in performance comes at a cost.

To be honest, when this refactoring occurred, the intent was to make it much more scalable then it was before by eliminating the memory leak and of course making it clusterable. With regards to the guarentee of 0 data loss, it wasn’t really taken into consideration. In that respect the new code happens to be better than the old (with 0 cache size) but the current defaults suffer the problem as the old version, which has the same issues with respect to data loss.

So, as I have been writing this reply, I have been thinking and this is what I propose.

  1. Have two separate cache size properties, one for inserts and a separate one for deletes. I propose this because I would suspect that if someone will want to use a cache to improve performance, the vast majority of use cases would only require it on the publish, not the delete. This also means of course that flushing would be separate operations for delete and add.
  2. We set the default cache size to 0, thus forcing a db call for every publish and delete, meaning 0 possibility of dataloss.
  3. Refactor the persistence calls for publish and delete to throw an exception to the caller (the node) so an appropriate error can be relayed back to the user on the failed request.
  4. Document the side effect of setting the cache size properties.

As a possible future enhancement, we could also allow the persistence manager to maintain its own pool of connections, either create them outright or “borrow” them from the db connection manager. I don’t think there are any other components in OF that have the same potentially heavy db access needs as pubsub may require. Thus it makes sense that it could have its own dedicated db connections so it won’t have to compete with other modules/components.

1 Like

Sorry for the item / node confusion, you are of course right.

Currently it’s hard to tell how this will impact the performance of the server and the database. Also the cluster talk does need some time. So there may be the need to have small local queues/caches or other asynchronous operations to avoid a bottleneck.

One could write a small test program to measure SQL performance for a one 1000-lines transaction and 1000 1-line transactions (without reads) - this should help to decide whether one can completely remove the cache or set the size to 0.

@“3. … so an appropriate error can be relayed back to the user on the failed request.” I don’t think that this is really possible. There is no error defined for such a failure and the user did likely already receive an OK message.

But there is also the flush timer, maybe it is fine to set it to one second. This would not emiminate the potential data loss but in most cases it is acceptable to lose the last second. Is this a reasonable alternative?

So we could also keep the current cache sizes.

The performance tests are a great idea. I actually have a small testing project I was using when I initially refactored pubsub. I was mainly aiming at the memory usage, but it will also work quite well for testing throughput as well.

Even without testing though, I can’t imagine that caching will make any difference in a use case where there are only a couple of items published per second, or less. Now when we get into 10s or 100s per second, that is a completely different story. In any case, some metrics would be valuable, I will have to see what I can do.

The error case should be easy enough, as the db call becomes part of the flow of processing the publish anyway, thus it’s failure would bubble up and can be reported as an **internal-server-error **to the publisher. The db call is already part of the processing chain if the item being published actually triggers a flush if it exceeds the cache limit.

Without the cache, the timer would not be required, so it would simply not be turned on. With a cache, having an extremely short timer would cause the same problem as having no cache, overly frequent db access. In a constantly busy system, the timer is actually not needed as we would be constantly flushing due to the size restriction. I think the only use case where the timer would be needed is when we have bursts of traffic, where the burst would tirgger flushing due to size, but when the burst is over and there are still items in the cache, then the timer would take care of persisting them (and any subsequent low volume publishing between bursts). This is assuming that no cache is the best option in a low volume system.

I have attached to this post a slighlty modified version of hazelcast that is backwards compatible with Openfire 3.6.4 and Openfire 3.7.1. I had to rename the plugin and some classes.

Unzip and copy the clustering.jar file to your plugins folder.
hazelcast-364-1.0.0.zip (1680594 Bytes)

I’ve just taken the plugin for a spin and it seems to be in good working condition. Tried session replication, presence changes, MUC, pubsub, all worked across nodes.

Now, the question is, is the version of the plugin that works with Openfire 3.7.1 available from an official source?

What qualifies as an official source?

It is the same source code. I just renamed the plugin and some classes to appease Opennfire 3.7.1

@@Dele Olajide ; @@Alex : Please tell me what did you do to make this plugin work. I installed it on my 2 Openfire servers and it doesn’t show other cluster members than current host. Thanks in advance !

The default cluster configuration uses multicast (UDP) to discover member nodes, which can be problematic in certain deployments. If you are unable to use multicast due to your network configuration, you can configure the unicast (TCP) settings as documented in the Hazelcast plugin’s readme file:


*The Hazelcast plugin uses the XML configuration builder to initialize the cluster from the XML configuration file (hazelcast-cache-config.xml). By default the cluster members will attempt to discover each other via multicast at the following location: *

  • IP Address: 224.2.2.3
  • Port: 54327
  • Note that these values can be overridden in the plugin’s /classes/hazelcast-cache-config.xml file (via the multicast-group and multicast-port elements). Many other initialization and discovery options exist, as documented in the Hazelcast configuration docs noted above. For example, to set up a two-node cluster using well-known DNS name/port values, try the following alternative: *
*...
<join>
   <multicast enabled="false"/>
   <tcp-ip enabled="true">
     <hostname>of-node-a.example.com:5701</hostname>
     <hostname>of-node-b.example.com:5701</hostname>
   </tcp-ip>
   <aws enabled="false"/>
</join>
...
*

Hope that helps!

2 Likes

Thank you very much for your reply ! I did the configuration you suggested in *hazelcast-cache-config.xml *file on both servers. Unfortunately it still doesn’t work for me. This is weird tough because there is no firewall in between and I can connect using telnet to port tcp 5701 and ping hostnames from both servers.

L.E.: finnaly got the plugin working by enabling multicast in local network.

I was thinking about something we could check out and integrate into a build process. I have added the JAR for now, hopefully we won’t have to look at the source code…

I have a problem with that: the configuration file must be inside the plugin JAR file. Thus, when nodes are added, removed, we have to edit the file inside the JAR. Openfire exposes a property to specify a different configuration file, but this seems to be useless, because the plugin class loader will not look outside the JAR in the first place.

Am I missing something here? Is it possible to use a file outside the JAR?

I installed the plugin this morning for my domain. When I enable the plugin, users from my server don’t see online contacts of the other servers. Besides, they can’t join MUC from other servers. Finaly, PubSub doesn’t work, we can’t fetch our network to see the messages but we can see publication from other servers.

I am wondering how the traffic is distribued. For now, all connections go to one server, the first I installed, never to the second one.

It costs more than two times more memory when I enable the plugin.

I was in a similar situation, but was able to fix it by doing two things:

  1. enable clustering from OF console (it’s disabled by default, even if you have the plugin installed)

  2. make sure all servers are using the same DB (cluster), as some session info is saved there (e.g. offline status)

There are a few options for configuring Hazelcast:

  1. The default configuration for the Hazelcast plugin allows cluster members to join and leave the cluster dynamically, using multicast messages (UDP) to announce these membership changes. However, as described above this approach may no be ideal for every deployment.
  2. After the Hazelcast plugin is installed, a custom configuration file can be copied into the plugin’s /classes/ directory. Note that the “hazelcast.config.xml.filename” system property should be set to the name of this file.
  3. The Hazelcast clustering plugin uses a custom class loader that searches the combined classpaths of all installed plugins for classes and other resources. As such, a simple plugin that includes a custom cluster configuration could be deployed. This is in fact the approach we use in our environment. Our plugin further customizes its plugin class loader to add an external directory to the classpath where we manage our various configurations (e.g. test vs. prod).

Hope that helps!

I tried #2, but the folder is erased and recreated on each restart. Copying a configuration file inside the JAR isn’t very promising either. I should probably take a closer look at #3, but it still looks like the configuration file must be placed inside a JAR (or not, if the plugin can add a folder to the classpath). Maybe it makes more sense to patch Openfire to include the conf folder to the plugins classpaths? It would certainly be much cleaner than placing configuration files inside archives.

Edit: It seems Hazelcast already offers an alternative: set the configuration file by setting the hazelcast.config system property. Unfortunately, this is bypassed by the clustering plugin, which goes directly to ClasspathXmlConfig.

The plugin is enable on both OF nodes and they use the same DB : the list of users is the same on both servers.

Other ideas ?

Edit: it’s better without French in the message ^^

Not sure what ‘plugin is enabled’ means, but I was talking about going into OF’s console (the one that runs on ports 9094/9095) and making sure clustering is enabled. Simply copying the JAR under plugins folder won’t do it.

Ne t’inquiete pas, je compred un peu

I meant I uploaded the jar file then enabled the clustering in the tab “Clustering” (By default, OF admin panel runs on ports 9090 and 9091).

Check nohup.out. You should find something like this:

INFO: [10.46.37.118]:5701 [OF37-cluster]

Members [1] {

Member [AA.BB.CC.DD]:5701 this

}

If nodes can see each other, there should be more than one member in the group.

I don’t find this file.