Skip navigation
24621 Views 76 Replies Latest reply: Aug 19, 2013 8:12 PM by kentzen RSS
Tom Evans Silver 113 posts since
Oct 4, 2011
Currently Being Moderated

Sep 22, 2012 8:35 PM

Introducing Hazelcast ... a new way to cluster Openfire!

A few of you more intrepid Openfire fans may have noticed a bit of recent activity in one of the branches of the Openfire SVN repository. Well, some of your fellow developers have been working behind the scenes to provide clustering support for PubSub, perhaps one of the lesser-known modules of our beloved real-time collaboration server. PubSub is an implementation of the XEP-0060 specification which extends the XMPP standard to add publish-subscribe functionality to the XMPP Core. However, if you have ever tried to use this module in Openfire, you may have been disappointed to discover that it was not designed to work in a clustered deployment. In fact, PubSub was forcibly disabled when deployed in a cluster! The main focus of the development effort was to address OF-205 and implement clustering support for the PubSub module. This work is now complete and the PubSub module is cluster-enabled and ready for action.

 

My Kingdom for  a Cluster!

 

However, during the course of this development effort, the team also took a critical look at the current clustering implementation itself (the "clustering" plugin). This solution is currently the only way to run Openfire in a clustered configuration (where multiple servers share the load). Unfortunately this plugin is inextricably tied to Oracle Coherence, an enterprise class (and enterprise priced) middleware component. A recent quote from Oracle put the price of Coherence (EE) at well over $300K for a smallish deployment ... clearly an untenable solution and incompatible partnership with the Openfire project.

 

We looked around for clustering alternatives that would have better affinity with Openfire, and landed with Hazelcast (Community Edition). Hazelcast is an open source clustering and highly scalable data distribution platform for Java. It enjoys a large deployment base and is licensed under the community-friendly Apache 2.0 license. There are also commercial licensing options available for deployments where professional support and enterprise security (among other features) are must-haves. This looked like a perfect fit for our needs, and likely for the Openfire community as well.

 

Where Two or Three are Gathered...

 

We are pleased to annouce the immediate availabilty of a new Hazelcast-based clustering plugin for Openfire. Starting today in the trunk of the Openfire SVN repository you will find the new plugin (/src/plugins/hazelcast/). Note that you will need to also setup the latest version of the Openfire core (currently 3.7.2-Beta) to use the new plugin.

 

We are looking for a few brave Openfire afficionados who can take the latest build and give it a whirl with your various deployment scenarios:

  • How many users and/or cluster member nodes do you have?
  • Which modules/components of Openfire are you using?
  • What is your typical JVM configuration? Preferred OS? Network topology (load balancer, LAN/WAN, etc.)?

Your feedback is very important and will help ensure that this new clustering solution will be a robust and stable component in the next Openfire release.

 

Those who have wrestled with the existing clustering plugin will hopefully find the new solution to be much simpler to configure and deploy ... and certainly much lower cost! There is a README file included with the new Hazelcast plugin that documents the basic steps for setting up an Openfire cluster, including links to the supporting Hazelcast documentation (if needed).

 

Testing ... Testing ... Is this thing on?

 

Please take the new build for a spin and report your feedback here. We will be posting an article to the main community page before long, but would love to have some initial feedback from the core developers before engaging a wider audience. No doubt there will be some bugs and configuration glitches ... can you help us find and fix them?

 

Thanks in advance for your consideration and assistance.

  • Dele Olajide KeyContributor 1,182 posts since
    Apr 10, 2006
  • Dele Olajide KeyContributor 1,182 posts since
    Apr 10, 2006

    I checked out the latest SVN changes to pubsub and and ran into an issue.

     

    Even though XEP-0060 does say that when an item is published to a node, it MUST have a unique key, Openfire pubsub implementation in fact did check for duplicates and removed them. This was a very useful feature as it enables the use of Openfire pubsub as an application datastore with realtime replication. I use a backbone.js xmpp datastore that works well with Openfire.

     

    The problem is in org.jivesoftware.openfire.pubsub.LeafNode.java

     

     

        // Check and remove any existing items that have the matching ID,
        // generated ID's won't match since we already checked.
        PublishedItem duplicate = itemsByID.get(newItem.getID());
     
     
        if (duplicate != null)
        {
                    removeItem(findIndexById(duplicate.getID()));
        }
     
    

     

    This code is gone and now data integrity exception errors occur because of the attempt to write a duplicate database record.

     

    I appreciate the code is being optimized for clustering, but is there any chance of keeping backward compatibility by reusing node item ids. I do not wish to fork the code

    • rcollier KeyContributor 982 posts since
      Mar 4, 2009

      I will take a look as well, since I actually removed that. 

       

      What is supposed to happen when a duplicate ID occurs is that the original record gets overwritten, there should be no error since this is a valid usecase.  This is how items get updated in pubsub.

       

      The original code was changed since it relied on every persisted item in every node being in memory, hence the reason I logged a Jira task about this as a memory leak and refactored how it worked as a precursor to the clustering work.  So the code in question was removed before clustering was actually started, since I changed how pubsub actually did its in memory caching and persistence.

      • Dele Olajide KeyContributor 1,182 posts since
        Apr 10, 2006

        Thanks :-)

         

        I noticed the memory map was gone, but could not work out how to discover duplication without it. I'll wait for you to fix it.

         

        I have been running 3.7.2 alpha for a while for some of my projects, but because I needed CORS, I have been using Pubsub mostly in a custom build of 3.7.1 and had just returned to 3.7.2 trunk for Pubsub after the CORS fix went in.

         

        The changes look very good and should be a performance boost for Openfire Pubsub. Excellent work

      • Dele Olajide KeyContributor 1,182 posts since
        Apr 10, 2006

        Thanks Tom,

         

        I did patch my copy of the code to do that and also swap the postion of the batch deletions and additions code.

         

        I am inclined to to go with Rob's suggestion to perform a direct DB update instead of a delete/add. It sounds more efficient. But then I am not a DB expert. As we are performing the SQL operations in batch, maybe it does not matter

      • Dele Olajide KeyContributor 1,182 posts since
        Apr 10, 2006

        The changes work fine Thank you

      • LG KeyContributor 6,403 posts since
        Dec 13, 2005

        Looking at 13303 I wonder how good it is to use two transactions or the new line "itemsToDelete.addLast(item); // delete stored duplicate (if any)". Now there are two places where items are added to the itemsToDelete list. This makes things much more complicated.

        a) Items which were added using removePublishedItem() should log an error and rollback the transaction.

        b) Items which were added by savePublishedItem() should not log an error and there is also no need to rollback the transaction.

         

        My suggestion:

        1) remove the "itemsToDelete.addLast(item); // delete stored duplicate (if any)" line

        2) Leave the 1st trancation (delList) as it is. I assume that it is still needed. (?)

        3) Use the 2nd transaction to delete the items on the addList and then add the items. Do not log delete errors or rollback the transaction if deleting fails. If the insert fails then rollback everything to make sure that the old nodes do still exist. This is what most users expect when their update fails, the old value should still exist and not be deleted - hopefully this matches the pubsub spec.

         

        PS: In case of exceptions I prefer to log the SQL statement and a reference of the affected item - this makes resolving errors much more easy. E.g. log.error(ADD_ITEM + " (" + item.getNode().getService().getServiceID() + ", " + ... + ") " + sqle.getMessage(), sqle);

        • rcollier KeyContributor 982 posts since
          Mar 4, 2009
          Currently Being Moderated
          Sep 29, 2012 12:39 PM (in response to LG)
          Re: Introducing Hazelcast ... a new way to cluster Openfire!

          I don't think there is really any issue with the adding of the published item to the itemsToDelete, but the delete and add should definitely be in the same transaction.  This is the only way to quarentee no duplicate id problems during the add when operating in a cluster.  This would also ensure that any existing data remains untouched, since the delete and add are now part of the same transaction, thus if either fails the database will remain untouched.

           

          I am about to make some changes due to a different transaction related issue that will make fix as well (since the flush will be part of another operation).

           

          As for what to do if there is an exception.  I am not sure, logically, the only way this can fail is if we lose connection to the db or some other system related error, which makes the rollback a moot point.  The only thing then is determining if the service itself should be shutdown due to it not being able to function properly anymore. Exceptions would of course be logged. 

           

          The fact is, if an exception occurs when we do a flush, we will lose any new items that have been added, up to the size of the cache.  The only way to prevent this from happening is to set the cache size to 1, then the item will get persisted every time it is published.  Even for that case we need to change the error handling during the flush so it can get reported back to the publisher.

          • LG KeyContributor 6,403 posts since
            Dec 13, 2005

            My XMPP/XEP pubsub knowledge is very limited as I did read only parts of the XEP. Anyhow deleting nodes is a completely different thing than adding nodes. So I would not mix these. Likely it's not possible to try to delete an non-existing node, so there will never be an exception during deletion. But we all have seen unbelievable things which were compensated by the brain to avoid that we get mad. That's why I would like to separate these two tasks.

             

            @ "The fact is, if an exception occurs when we do a flush, we will lose any new items that have been added, up to the size of the cache." I wonder whether this is in sync with the XEP. May be another reason not to mix deletes and updates. And for me it looks like one wants to set the write cache size to 1 (without yet notifying the user, while this would be nice). Actually Openfire should be able to recover from database exceptions and power losses without losing information.

             

            So one does not need to care about reinserting failed items, multiple retries, ... I thought of writing all inserts and deletes to the hard disc so one can re-execute them but keeping these things synchronized with the database within a cluster may be very hard. For me it seems that no write cache or a cache size of 1 is the much better approach.

             

            Inserting 1000 rows in one transaction is usually faster than getting 1000 database connections for single row transactions - but maybe the performance impact is acceptable if one can simplify the code.

            • rcollier KeyContributor 982 posts since
              Mar 4, 2009
              Currently Being Moderated
              Sep 30, 2012 5:26 PM (in response to LG)
              Re: Introducing Hazelcast ... a new way to cluster Openfire!
              Anyhow deleting nodes is a completely different thing than adding nodes. So I would not mix these. Likely it's not possible to try to delete an non-existing node, so there will never be an exception during deletion.

              We need to straighten out the terminology here, we are talking about items not nodes (nodes are what the items are published to, the messaging queue if you will).  Anyway, functionally speaking, they are obviously different, but that is not really the issue here.  The code in question is really just a means of flushing the in memory cache to the persistent store, and in that case we are actually doing the same db operations.  That being said, at the bottom you will see that I propose we change that to something inline with what you are suggesting.

               

              @ "The fact is, if an exception occurs when we do a flush, we will lose any new items that have been added, up to the size of the cache." I wonder whether this is in sync with the XEP.

              This sort of thing isn't covered in the spec as this is an implementation detail.  Of course we don't want to ever lose data, but I am not sure that can be accomplished with a simple in memory cache and generic database access as OF currently supports (I don't claim to be an expert in this regard though).  We can, of course, set the cache size to 0 as I said earlier, thus forcing db access on every publish and delete.  This would probably work fine for many cases actually, and maybe should be the default.  It just becomes a problem when we get into massive amounts of publishing as the IO will become a bottleneck.  This would then put the onus on the system owner to configure the cache with the full knowledge that the boost in performance comes at a cost.

               

              To be honest, when this refactoring occurred, the intent was to make it much more scalable then it was before by eliminating the memory leak and of course making it clusterable.  With regards to the guarentee of 0 data loss, it wasn't really taken into consideration.  In that respect the new code happens to be better than the old (with 0 cache size) but the current defaults suffer the problem as the old version, which has the same issues with respect to data loss.

               

              So, as I have been writing this reply, I have been thinking and this is what I propose.

               

               

              1. Have two separate cache size properties, one for inserts and a separate one for deletes.  I propose this because I would suspect that if someone will want to use a cache to improve performance, the vast majority of use cases would only require it on the publish, not the delete.  This also means of course that flushing would be separate operations for delete and add.
              2. We set the default cache size to 0, thus forcing a db call for every publish and delete, meaning 0 possibility of dataloss.
              3. Refactor the persistence calls for publish and delete to throw an exception to the caller (the node) so an appropriate error can be relayed back to the user on the failed request.
              4. Document the side effect of setting the cache size properties.

               

              As a possible future enhancement, we could also allow the persistence manager to maintain its own pool of connections, either create them outright or "borrow" them from the db connection manager.  I don't think there are any other components in OF that have the same potentially heavy db access needs as pubsub may require.  Thus it makes sense that it could have its own dedicated db connections so it won't have to compete with other modules/components.

              • LG KeyContributor 6,403 posts since
                Dec 13, 2005

                Sorry for the item / node confusion, you are of course right.

                Currently it's hard to tell how this will impact the performance of the server and the database. Also the cluster talk does need some time. So there may be the need to have small local queues/caches or other asynchronous operations to avoid a bottleneck.

                One could write a small test program to measure SQL performance for a one 1000-lines transaction and 1000 1-line transactions (without reads) - this should help to decide whether one can completely remove the cache or set the size to 0.

                 

                @"3. .. so an appropriate error can be relayed back to the user on the failed request." I don't think that this is really possible. There is no error defined for such a failure and the user did likely already receive an OK message.

                 

                But there is also the flush timer, maybe it is fine to set it to one second. This would not emiminate the potential data loss but in most cases it is acceptable to lose the last second. Is this a reasonable alternative?

                So we could also keep the current cache sizes.

                • rcollier KeyContributor 982 posts since
                  Mar 4, 2009
                  Currently Being Moderated
                  Oct 3, 2012 5:02 PM (in response to LG)
                  Introducing Hazelcast ... a new way to cluster Openfire!

                  The performance tests are a great idea.  I actually have a small testing project I was using when I initially refactored pubsub.  I was mainly aiming at the memory usage, but it will also work quite well for testing throughput as well.

                   

                  Even without testing though, I can't imagine that caching will make any difference in a use case where there are only a couple of items published per second, or less.  Now when we get into 10s or 100s per second, that is a completely different story.  In any case, some metrics would be valuable, I will have to see what I can do.

                   

                  The error case should be easy enough, as the db call becomes part of the flow of processing the publish anyway, thus it's failure would bubble up and can be reported as an internal-server-error to the publisher.  The db call is already part of the processing chain if the item being published actually triggers a flush if it exceeds the cache limit.

                   

                  Without the cache, the timer would not be required, so it would simply not be turned on.  With a cache, having an extremely short timer would cause the same problem as having no cache, overly frequent db access.  In a constantly busy system, the timer is actually not needed as we would be constantly flushing due to the size restriction.  I think the only use case where the timer would be needed is when we have bursts of traffic, where the burst would tirgger flushing due to size, but when the burst is over and there are still items in the cache, then the timer would take care of persisting them (and any subsequent low volume publishing between bursts).  This is assuming that no cache is the best option in a low volume system.

  • Dele Olajide KeyContributor 1,182 posts since
    Apr 10, 2006

    I have attached to this post a slighlty modified version of hazelcast that is backwards compatible with Openfire 3.6.4 and Openfire 3.7.1. I had to rename the plugin and some classes.

     

    Unzip and copy the clustering.jar file to your plugins folder.

    Attachments:
  • Alex Mateescu Bronze 12 posts since
    Jun 22, 2012

    I've just taken the plugin for a spin and it seems to be in good working condition. Tried session replication, presence changes, MUC, pubsub, all worked across nodes.

    Now, the question is, is the version of the plugin that works with Openfire 3.7.1 available from an official source?

    • Dele Olajide KeyContributor 1,182 posts since
      Apr 10, 2006

      What qualifies as an official source?

       

      It is the same source code. I just renamed the plugin and some classes to appease Opennfire 3.7.1

    • don cipo Bronze 2 posts since
      Oct 16, 2012

      @Dele Olajide ; @Alex : Please tell me what did you do to make this plugin work. I installed it on my 2 Openfire servers and it doesn't show other cluster members than current host. Thanks in advance !

        • don cipo Bronze 2 posts since
          Oct 16, 2012

          Thank you very much for your reply ! I did the configuration you suggested in hazelcast-cache-config.xml file on both servers. Unfortunately it still doesn't work for me. This is weird tough because there is no firewall in between and I can connect using telnet to port tcp 5701 and ping hostnames from both servers.

           

          L.E.: finnaly got the plugin working by enabling multicast in local network.

        • Alex Mateescu Bronze 12 posts since
          Jun 22, 2012

          I have a problem with that: the configuration file must be inside the plugin JAR file. Thus, when nodes are added, removed, we have to edit the file inside the JAR. Openfire exposes a property to specify a different configuration file, but this seems to be useless, because the plugin class loader will not look outside the JAR in the first place.

          Am I missing something here? Is it possible to use a file outside the JAR?

            • Alex Mateescu Bronze 12 posts since
              Jun 22, 2012

              I tried #2, but the folder is erased and recreated on each restart. Copying a configuration file inside the JAR isn't very promising either. I should probably take a closer look at #3, but it still looks like the configuration file must be placed inside a JAR (or not, if the plugin can add a folder to the classpath). Maybe it makes more sense to patch Openfire to include the conf folder to the plugins classpaths? It would certainly be much cleaner than placing configuration files inside archives.

              Edit: It seems Hazelcast already offers an alternative: set the configuration file by setting the hazelcast.config system property. Unfortunately, this is bypassed by the clustering plugin, which goes directly to ClasspathXmlConfig.

        • switchtower Bronze 9 posts since
          Dec 5, 2011

          Like others, I'm having similar issues with the tcp-ip settings in hazelcast working properly.  I have added both of my openfire servers to the config as so:

           

                  <join>

                      <multicast enabled="false"/>

                      <tcp-ip enabled="true">

                          <hostname>jabbertest.switchtower.com:5701</hostname>

                          <hostname>jabbertest2.switchtower.com:5701</hostname>

                      </tcp-ip>

                      <aws enabled="false"/>

                  </join>

           

          I've verified the server is listening on the 5701 port:

              netstat -lan | grep 5701

          tcp        0      0 0.0.0.0:5701                0.0.0.0:*                   LISTEN     

           

          Doing a tcpdump between the servers, I see no communication with the servers on that port.  There are no iptables rules and the HW firewalls these servers sit behind are configured to allow full IP communciation.

  • Clément Bronze 24 posts since
    Jan 12, 2010

    I installed the plugin this morning for my domain. When I enable the plugin, users from my server don't see online contacts of the other servers. Besides, they can't join MUC from other servers. Finaly, PubSub doesn't work, we can't fetch our network to see the messages but we can see publication from other servers.

     

    I am wondering how the traffic is distribued. For now, all connections go to one server, the first I installed, never to the second one.

     

    It costs more than two times more memory when I enable the plugin.

  • Mickey Bronze 8 posts since
    Sep 6, 2012

    Hi,

    I have been looking for a clustering plugin for some time now and it is was really good news to see this post the other day.

     

    2 questions:

     

    So I have 2 Openfire Servers with exacty the same setup, I have installed the clustering plugin on both and I see the local node on each, but I don't see a second node in any of them!

     

    What do I need to configure else to see the other openfire node?

     

    Currently they use the same database that is pointing to the same data node in a MySQL Cluster Replication (synchronized - Galera).

    But at some point I want one openfire server to point to one data node and another openfire server to point at another data node so in case the data node fails I am still in business.

     

    Is this kind of setup with a database cluster setup possible with the clustering plugin?

     

    I really appreciate any feedback.

     

    Thank you!

  • weixing_zou Bronze 2 posts since
    Oct 29, 2012

    I met a critical bug while using hazelcast based cluster.

    I always got the following exception. Then openfire server hangs, which means the process still exists but it does not work.

    I use openfire 3.7.2 beta, hazelcast 2.3.1 (or hazelcast 2.4 still has this error.)

     

    2012.10.29 14:57:47org.jivesoftware.util.cache.CacheFactory - Hazelcast Instance is not active!

    java.lang.IllegalStateException:Hazelcast Instance is not active!

                atcom.hazelcast.impl.FactoryImpl.initialChecks(FactoryImpl.java:711)

                atcom.hazelcast.impl.MProxyImpl.beforeCall(MProxyImpl.java:102)

                atcom.hazelcast.impl.MProxyImpl.access$000(MProxyImpl.java:49)

                atcom.hazelcast.impl.MProxyImpl$DynamicInvoker.invoke(MProxyImpl.java:64)

                at$Proxy0.getLocalMapStats(Unknown Source)

                atcom.hazelcast.impl.MProxyImpl.getLocalMapStats(MProxyImpl.java:258)

                atcom.jivesoftware.util.cache.ClusteredCache.getCacheSize(ClusteredCache.java:1 40)

                atorg.jivesoftware.util.cache.CacheWrapper.getCacheSize(CacheWrapper.java:73)

                atcom.jivesoftware.util.cache.ClusteredCacheFactory.updateCacheStats(ClusteredC acheFactory.java:344)

                atorg.jivesoftware.util.cache.CacheFactory$1.run(CacheFactory.java:636)

     

    Can anybody give my some advice?

      • weixing_zou Bronze 2 posts since
        Oct 29, 2012

        No other plugin except admin. The configure is rather simple too, only 2 multicasted nodes with default configuration. And the cluster can be started correctly, but one or two nodes would hang after compress test.

        I also settled a cluster with 4 nodes (all nodes started successfully) but without compressing test, all 4 nodes hanged after one or two days.

  • Simon Beale Bronze 1 posts since
    Nov 6, 2012

    I've encountered an issue in using MUC which I think could either be a misconfiguration on my part, or possibly a bug?

     

    I've got 2 nodes setup with the clustering plugin running on OF 3.7.1, and talking to each other over direct TCP rather than multicast. There's only 2 users on node 2 so far, with all the other 2000 users connecting to node 1.

     

    In a MUC room, when either of us on node 2 join the room, we only get a partial list of current members of the room. Those of us on node 2 can still see anything that anyone on node 1 says in the room, but until they leave and rejoin, they don't appear to be in the room. If those of us on node 2 leave/rejoin the room, then the list of current members in the room shrinks back down again.

     

    Does this sound like a configuration issue? Or is it a problem with the members of rooms not being synced across the nodes?

  • nickchang Bronze 4 posts since
    Sep 14, 2012

    Hello

     

    Can I used Connection manager before Openfire with Hazelcast???

     

    Now, I tested it. But it's failed. Client can't login.

     

    Thanks

    Nick

  • gsink Bronze 3 posts since
    Nov 22, 2012

    Not a developer here, just wanting to test this out.  Where can I get the hazelcast.jar file from?

     

    I'm on 3.7.2 Beta and looking to put this in place behind Microsoft NLB.

     

    I found the hazelcast folder in svn, and I'm unfamiliar with the process, but do the contents of this folder need to be built to hazelcast.jar?  I don't have the tools/environment to do this, or knowledge, if that is the case.

     

    I tried modifying a separate .jar file by removing its contents and placing in it all items from the hazelcast folder, but the plugin would not load (error messages in log). 

     

    Can anyone host or send my way the hazelcast.jar file?

    • rcollier KeyContributor 982 posts since
      Mar 4, 2009
      Currently Being Moderated
      Nov 22, 2012 6:45 AM (in response to gsink)
      Introducing Hazelcast ... a new way to cluster Openfire!

      The required jars for the plugin are already in the plugin's lib directory.  As you are not a developer, I would suggest you use a nightly build instead of trying to build it yourself. 

       

      Just select the appropriate build for your platform and drill down into the build artifacts.

      • gsink Bronze 3 posts since
        Nov 22, 2012

        Thanks for the direction, I am using the method mentioned above to specify nodes instead of multicast.

         

        I'm getting this in the logs:

         

        2012.11.22 11:03:37 org.jivesoftware.openfire.plugin.BroadcastPlugin - Hazelcast Instance is not active!

        java.lang.IllegalStateException: Hazelcast Instance is not active!

        at com.hazelcast.impl.FactoryImpl.initialChecks(FactoryImpl.java:711)

        at com.hazelcast.impl.MProxyImpl.beforeCall(MProxyImpl.java:102)

        at com.hazelcast.impl.MProxyImpl.get(MProxyImpl.java:112)

        at com.jivesoftware.util.cache.ClusteredCache.get(ClusteredCache.java:86)

        at org.jivesoftware.util.cache.CacheWrapper.get(CacheWrapper.java:121)

        at org.jivesoftware.openfire.spi.RoutingTableImpl.removeComponentRoute(RoutingTabl eImpl.java:821)

        at org.jivesoftware.openfire.component.InternalComponentManager.removeComponent(In ternalComponentManager.java:238)

        at org.jivesoftware.openfire.component.InternalComponentManager.removeComponent(In ternalComponentManager.java:211)

        at org.jivesoftware.openfire.plugin.BroadcastPlugin.destroyPlugin(BroadcastPlugin. java:120)

        at org.jivesoftware.openfire.container.PluginManager.shutdown(PluginManager.java:1 46)

        at org.jivesoftware.openfire.XMPPServer.shutdownServer(XMPPServer.java:948)

        at org.jivesoftware.openfire.XMPPServer.access$700(XMPPServer.java:145)

        at org.jivesoftware.openfire.XMPPServer$ShutdownHookThread.run(XMPPServer.java:896 )

  • olderboy Bronze 2 posts since
    Jan 21, 2013

    Hello

     

    My Openfire Version 3.7.1. I fonund this package for 3.7.1. I installed it.

    It's OK. I can login to server.

     

    I tested MUC with two openfire server and cluster package.  It's OK. Can work. no Problems.

     

    But If I added to three server. I get the error message. It's not work. Server always tell me "No response from server"

     

    It's my error log.

    org.jivesoftware.openfire.IQRouter - Could not route packet

    java.lang.ClassCastException: com.hazelcast.impl.MemberImpl cannot be cast to java.lang.Comparable

            at java.util.TreeMap.put(Unknown Source)

            at java.util.TreeSet.add(Unknown Source)

            at com.jivesoftware.util.cache.CoherenceClusteredCacheFactory.doClusterTask(Cohere nceClusteredCacheFactory.java:221)

            at org.jivesoftware.util.cache.CacheFactory.doClusterTask(CacheFactory.java:499)

            at org.jivesoftware.openfire.component.InternalComponentManager.process(InternalCo mponentManager.java:512)

            at org.jivesoftware.openfire.spi.RoutingTableImpl.routeToComponent(RoutingTableImp l.java:352)

            at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.jav a:237)

            at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:324)

            at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:121)

            at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:76)

            at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:68)

            at org.jivesoftware.openfire.component.InternalComponentManager.sendPacket(Interna lComponentManager.java:281)

            at org.jivesoftware.openfire.plugin.SearchPlugin.processPacket(SearchPlugin.java:2 43)

            at org.jivesoftware.openfire.component.InternalComponentManager.checkDiscoSupport( InternalComponentManager.java:458)

            at org.jivesoftware.openfire.component.InternalComponentManager.addComponent(Inter nalComponentManager.java:171)

            at org.jivesoftware.openfire.plugin.SearchPlugin.initializePlugin(SearchPlugin.jav a:164)

            at org.jivesoftware.openfire.container.PluginManager.loadPlugin(PluginManager.java :483)

            at org.jivesoftware.openfire.container.PluginManager.access$300(PluginManager.java :80)

            at org.jivesoftware.openfire.container.PluginManager$PluginMonitor.run(PluginManag er.java:1067)

            at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)

            at java.util.concurrent.FutureTask$Sync.innerRunAndReset(Unknown Source)

            at java.util.concurrent.FutureTask.runAndReset(Unknown Source)

            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101 (Unknown Source)

            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodi c(Unknown Source)

            at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknow n Source)

            at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)

            at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

            at java.lang.Thread.run(Unknown Source)

     

    How to slove this problem??

    Thanks for your help.

      • olderboy Bronze 2 posts since
        Jan 21, 2013

        Hello

         

        Thanks for your reply.

         

        I try re-compile openfire 3.7.1 with clustering.

        But I got this error.

         

        I'm not R&D. I don't know why it. Can you give me a suggestion?? Thanks a lot.

         

            [javac] Compiling 27 source files to /opt/openfire_src/work/plugins-dev/hazelcast/target/classes

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/openfire/sess ion/RemoteSession.java:172: cannot find symbol

            [javac] symbol  : method getClusterNodeInfo(byte[])

            [javac] location: class org.jivesoftware.util.cache.CacheFactory

            [javac]             ClusterNodeInfo info = CacheFactory.getClusterNodeInfo(nodeID);

            [javac]                                                ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/openfire/sess ion/RemoteSession.java:189: cannot find symbol

            [javac] symbol  : method getClusterNodeInfo(byte[])

            [javac] location: class org.jivesoftware.util.cache.CacheFactory

            [javac]             ClusterNodeInfo info = CacheFactory.getClusterNodeInfo(nodeID);

            [javac]                                                ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Ca cheListener.java:67: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]         NodeID nodeID = NodeID.getInstance(StringUtils.getBytes(event.getMember().getUuid()));

            [javac]                                                       ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usterListener.java:417: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]                     byte[] nodeID = StringUtils.getBytes(event.getMember().getUuid());

            [javac]                                                ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usterListener.java:442: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]                     byte[] nodeID = StringUtils.getBytes(event.getMember().getUuid());

            [javac]                                                ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usterListener.java:475: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]             byte[] nodeID = StringUtils.getBytes(event.getMember().getUuid());

            [javac]                                        ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usterListener.java:613: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]             nodePresences.put(NodeID.getInstance(StringUtils.getBytes(event.getMember().get Uuid())),

            [javac]                                                             ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usterListener.java:616: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]             ClusterManager.fireJoinedCluster(StringUtils.getBytes(event.getMember().getUuid ()), true);

            [javac]                                                         ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usterListener.java:623: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]         byte[] nodeID = StringUtils.getBytes(event.getMember().getUuid());

            [javac]                                    ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usterExternalizableUtil.java:43: com.jivesoftware.util.cache.ClusterExternalizableUtil is not abstract and does not override abstract method readSerializableMap(java.io.DataInput,java.util.Map<java.lang.String,? extends java.io.Serializable>,java.lang.ClassLoader) in org.jivesoftware.util.cache.ExternalizableUtilStrategy

            [javac] public class ClusterExternalizableUtil implements ExternalizableUtilStrategy {

            [javac]        ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usteredCacheFactory.java:229: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]             return StringUtils.getBytes(oldest.getUuid());

            [javac]                               ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usteredCacheFactory.java:238: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]             return StringUtils.getBytes(cluster.getLocalMember().getUuid());

            [javac]                               ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usteredCacheFactory.java:253: members is already defined in doClusterTask(org.jivesoftware.util.cache.ClusterTask)

            [javac]         Set<Member> members = new HashSet<Member>();

            [javac]                     ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usteredCacheFactory.java:286: cannot find symbol

            [javac] symbol  : method getString(byte[])

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]             logger.warn("Requested node " + StringUtils.getString(nodeID) + " not found in cluster");

            [javac]                                                        ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usteredCacheFactory.java:351: cannot find symbol

            [javac] symbol  : method getString(byte[])

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]             logger.warn("Requested node " + StringUtils.getString(nodeID) + " not found in cluster");

            [javac]                                                        ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cache/Cl usteredCacheFactory.java:369: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]                     if (Arrays.equals(StringUtils.getBytes(member.getUuid()), nodeID)) {

            [javac]                                                  ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cluster/ HazelcastClusterNodeInfo.java:48: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]         nodeID = NodeID.getInstance(StringUtils.getBytes(member.getUuid()));

            [javac]                                                ^

            [javac] /opt/openfire_src/src/plugins/hazelcast/src/java/com/jivesoftware/util/cluster/ HazelcastClusterNodeInfo.java:50: cannot find symbol

            [javac] symbol  : method getBytes(java.lang.String)

            [javac] location: class org.jivesoftware.util.StringUtils

            [javac]         seniorMember = ClusterManager.getSeniorClusterMember().equals(StringUtils.getBytes(member.getU uid()));

            [javac]                                                                                   ^

            [javac] Note: Some input files use or override a deprecated API.

            [javac] Note: Recompile with -Xlint:deprecation for details.

            [javac] Note: Some input files use unchecked or unsafe operations.

            [javac] Note: Recompile with -Xlint:unchecked for details.

            [javac] 18 errors

        [trycatch] Caught exception: Compile failed; see the compiler error output for details.

             [echo] Error building plugin: hazelcast. Exception:

             [echo] /opt/openfire_src/build/build.xml:1310: Compile failed; see the compiler error output for details.

  • Youngho Kim Bronze 2 posts since
    Apr 21, 2013

    Hi Tom Evans.

     

    Good to see you!

     

    I has composing Openfire HA cluster on Amazon AWS using hazelcast cluster plugin.

    Unfortunately! Amazon Service do not support direct connection between their regions.

     

     

    So, I use the hazelcast cluster plugin on openfire. It seemed to be like which did not have a problem.

     

     

    But, Increasing of hazelcast cluster node has occured big problem.

     

     

    First, this service is very unstable between cluster node.

    Second, this service has occured latency between clients message about 2~3 second. It tested just two clients.

     

     

    So, I was find replication property on hazelcast document about "wan-replication". and I was configured your hazelcast configuration xml file.

     

     

    But, this do not operate. What is problem on my configuration?

     

     

    like this:

     

     

        <wan-replication name="openfire-wan-cluster">

            <target-cluster group-name="tokyo" group-password="tokyo-pass">

                <replication-impl>com.hazelcast.impl.wan.WanNoDelayReplication</replication-imp l>

                <end-points>

                    <address>10.0.0.50:5701</address>

                </end-points>

            </target-cluster>

            <target-cluster group-name="virginia" group-password="virginia-pass">

                <replication-impl>com.hazelcast.impl.wan.WanNoDelayReplication</replication-imp l>

                <end-points>

                    <address>10.1.0.50:5701</address>

                </end-points>

            </target-cluster>

        </wan-replication>

  • kentzen Bronze 7 posts since
    Aug 6, 2013

    The issuses of after deployed cluster with Hazelcast Clustering Plugin of  openfire .

    I have three server nodes : A (10.0.1.113),B (10.0.1.176),C (10.0.1.158).

    And all of them share a same Mysql database. Initially, I installed Openfire to B & C server.After that I installed Openfire to A. Such an installation made domain of all servers same as domain of A server.All server domains are 10.0.1.113 now. I used Strophe.js accessing A node successfully via BOSH After deployed cluster with Hazelcast Clustering Plugin. But might be owing to domains of B and C node are same as A's , lead to I can not access B and C via BOSH with Strophe.js but the clients software like Spark can access all of them and works fine. Accessing B and C via BOSH made errors below:

     

    <body xmlns='http://jabber.org/protocol/httpbind'><failure xmlns="urn:ietf:params:xml:ns:xmpp-sasl"><not-authorized/></failure></body>

     

    Now my issue is that I need to access the cluster by using BOSH.Is it enough access only A node or I need to be able to access all nodes via BOSH? If I wanna access all nodes via BOSH in the meantime,how should I deploy cluster of openfire.

    Here is configuration of Hazelcast

    <join>

     

      <multicast enabled="false">

    <multicast-group>224.2.2.3</multicast-group>

    <multicast-port>54327</multicast-port>

      </multicast>

       <tcp-ip enabled="true">

    <hostname>10.0.1.113</hostname>

    <hostname>10.0.1.176</hostname>

    <hostname>10.0.1.158</hostname>

      </tcp-ip>

      <interfaces enabled="true">

    <interface>10.0.1.113</interface>

      </interfaces>

      <aws enabled="false"/>

    </join>

     

    I appreciate any help ! I hope you could forgive my awful english and understand my issue description.

More Like This

  • Retrieving data ...

Bookmarked By (0)