Scalability and concurrency best practices

I wonder if anybody has had an experience of scaling their Openfire based community into a 7 digits user base and what were the Eurika moments that you have encountered on the way? I would really appreciate any thoughts, best practices, observations and reflections on how to create and support an xmpp-based infrastructure for 1 000 000+ logged in users.

After reading the XMPP: Definitive guide and doing a lot of reading on this forum, I envision this architecture to be a cluster of Openfire servers which use Coherence as their middleware for clustering and backed by an MySQL cluster. As my project involves numerous custom iq requests being sent back and forth, I will group those logically and physically into separate external components with their redundant slave images running in the background in case of traffic spikes or system failure.

Any precautions, advices, links or case studies would be tremendously appreciated!

Thanks

Yuriy

Hi Yuriy,

I am not aware of any openfire installations of that size, you probably want to check out ejabberd.

daryl

I remember Guus has mentioned on this forum that he was maintaining a 7 digits Openfire based community in the past. So, apparently, it is achievable.

What is it that makes you recommend ejabberd? Only because it is in C and hence each installation is capable of handling more concurrent sessions under the same hardware and network conditions?

Sorry, it is not written in Erlang not in C. My bad.

Did you have personal experience with scaling ejabberd? How modular is it compared to Openfire?

It’s definitely possible to run with a pretty large user base, although it can form a challenge. How big of a challenge will greatly depend on the amount of concurrent users that are online, and the functionality that you use (including both functionality provided by Openfire as well as custom developed extensions).

The biggest problem you’re likely to hit is something that I’ve described as the Achilles’ heel of Openfire. I just published a document describing the problem: Openfires Achilles’ heel

What ARE the sizes of some of the largest OpenFire implementations and who/what has tested OpenFire for response time under different loads, etc… is any of this information around? I heard Google was using OpenFire for Wave, so if so, I am sure it’s modified, but that should still lend some credibiltiy OpenFire’s way.