since the upgrade from 3.8.x to 3.9.x, i get reproducable Problems, with Messages not getting transmitted correctly.
Problem is as follows:
you send someone a message and after some time you ask back, why he hasnt responded yet. Turns out, he never got the first message.
As we are in the same office, i checked their clients and their messagelog, there was indeed no message received.
As soon as a discussion is started, everything is fine, no messages are lost anymore.
I could reproduce this now with 4 different ppl. At first i thought it is a client problem because it mostly happend between a miranda and pidgin, but the last case was between 2 pidgin clients.
As the Clients are all on Desktop PCs with Ethernet connections, i doubt its a stream management problem.
The openfire logs doesnt show any problems around the time.
any idea how this could be fixed? This is kinda critical, as we cant be sure, anything we write is transmitted.
there is one similarity to all users with this problem. everyone of them has multiple closed sessions/ressources, which i asked about here:
Yes, it seems like in that case, Openfire routes messages to one of the old (dead) sessions.
It could be related to OF-818, but I doubt it, because the sessions likely don’t have a negative priority (which is set by sending a negative prio presence from the client, which they probably don’t do).
I have no concrete idea right now, maybe it’s also related to LDAP.
If you want, you could try with the latest nightly build:
This is simular to what I reported. I call the dead sessions - ghost sessions. The dead/ghost do not show up on the admin interface. Only way I have gotten rid of them is to restart openfire. No ldap is in use.
I can somewhat follow the problem and I think it’s related to the memory leak issues, which were reported.
There are a lot of unclosed connections/sessions, which result in leak and in your problem.
It is said, that the memory leaks don’t occur in 3.9.1 (at least they are not that severe).
Maybe you can try 3.9.1 and report back, if the problem is gone. That way, we could narrow it down to be a 3.9.2 problem and that both problems have the same cause.
As CSH said, downgrade should work (worked for me). So, i see that such ghost sessions are not even shown on the Sessions page? That can explain why i didn’t see anything unusual in Admin Console and yet it was running out of memory after a few days (comparing to 20-30 days on 3.9.1).
Interesting info about hibernating. Though strange to hear that hibernate is so widely used We don’t have such issue with ghost sessions (not using Pidgin). Our users are all on a nightly build of Spark (2.7.0 632 build or something) and as i have set a low value for xmpp.client.idle (30000 which is 30 seconds) i often see such behavior - PC goes into standby after a long idle, after some time Spark loses connection for a second, user becomes offline for a moment, then instantly online and back to away. I notice this as i’m used to use “Notify when user is available” option in Spark and it notifies that user is online, but when i check, he’s away. It looks like XMPP Ping can’t connect to a client, which is on PC in standby/sleep mode and closes the connection, but Spark is still active and restores the connection back. So it is constantly switching every 30 seconds between online and offline. Very annoying
There are two sides of the coin. I completely understand (as an admin being hit by the memory leak in 3.9.2) that this annoys, that you have to apologize your users for having downtimes in a service, doing downgrades, etc. On the other side if we just pull it down, it will be harder to pin point where the problem is. Without your reports we wouldn’t know about ghost sessions. Also it looks like only (or mostly) Pidgin users are hit by this. It’s not like we get dozens of reports on the forums about this, so for some it probably works ok. We need more reports to find out what is happening exactly.
I think i will post an announcement (which only 1% of users read ) about this issue and maybe a poll to find out how many are running 3.9.3 (which again won’t have many answers)…