Bug: This avatar causes clients to disconnect

Tested on 3.8.0, 3.8.1 + MySQL on Debian 6.0.7

I pulled this attached scaled image from the ofVCard table. When it is pushed to clients during presence updates, the clients disconnect with ‘Input/output error’ or various incarnations of the same idea. 100% reproducable at-will in my environment. Disconnects (at least) Trillian, Pidgin, and Psi. While the clients are the ones sending the TCP RST, I can’t tell if the problem actually begins within the server or not. Debug logs on Trillian and Pidgin both show no issues until the disconnect. I was unsuccessfull in forcing the clients to conenct unencrypted and unsuccessfull in decrypting the TLS stream (wireshark hatres me - decrypt_ssl3_record: no decoder available), so I don’t know what is actually being sent to the client.

I have attached both the original image and the auto-scaled one.

Ha! I had a feeling TLS failure was to blame for the disconnects. Don’t know how that’s related to the avatar, but here’s the stack trace:

2013.03.19 10:43:28 org.jivesoftware.openfire.nio.ConnectionHandler - ConnectionHandler reports IOException for session: (SOCKET, R: /10.200.4.235:54472, L: /10.200.5.65:5222, S: 0.0.0.0/0.0.0.0:5222)

javax.net.ssl.SSLException: SSLEngine error during encrypt: BUFFER_OVERFLOW src: java.nio.HeapByteBuffer[pos=16103 lim=16324 cap=32768]outNetBuffer: java.nio.DirectByteBuffer[pos=16170 lim=32768 cap=32768]

at org.apache.mina.filter.support.SSLHandler.encrypt(SSLHandler.java:377)

at org.apache.mina.filter.SSLFilter.filterWrite(SSLFilter.java:479)

at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(Ab stractIoFilterChain.java:361)

at org.apache.mina.common.support.AbstractIoFilterChain.access$1300(AbstractIoFilt erChain.java:53)

at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(Ab stractIoFilterChain.java:659)

at org.apache.mina.filter.executor.ExecutorFilter.filterWrite(ExecutorFilter.java: 255)

at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(Ab stractIoFilterChain.java:361)

at org.apache.mina.common.support.AbstractIoFilterChain.access$1300(AbstractIoFilt erChain.java:53)

at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(Ab stractIoFilterChain.java:659)

at org.apache.mina.filter.codec.ProtocolCodecFilter.filterWrite(ProtocolCodecFilte r.java:210)

at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(Ab stractIoFilterChain.java:361)

at org.apache.mina.common.support.AbstractIoFilterChain.access$1300(AbstractIoFilt erChain.java:53)

at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(Ab stractIoFilterChain.java:659)

at org.apache.mina.common.IoFilterAdapter.filterWrite(IoFilterAdapter.java:90)

at org.jivesoftware.openfire.net.StalledSessionsFilter.filterWrite(StalledSessions Filter.java:62)

at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(Ab stractIoFilterChain.java:361)

at org.apache.mina.common.support.AbstractIoFilterChain.access$1300(AbstractIoFilt erChain.java:53)

at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(Ab stractIoFilterChain.java:659)

at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.filterWrite(Abs tractIoFilterChain.java:587)

at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(Ab stractIoFilterChain.java:361)

at org.apache.mina.common.support.AbstractIoFilterChain.fireFilterWrite(AbstractIo FilterChain.java:355)

at org.apache.mina.transport.socket.nio.SocketSessionImpl.write0(SocketSessionImpl .java:166)

at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:177)

at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:168)

at org.jivesoftware.openfire.nio.NIOConnection.deliver(NIOConnection.java:263)

at org.jivesoftware.openfire.session.LocalClientSession.deliver(LocalClientSession .java:843)

at org.jivesoftware.openfire.session.LocalSession.process(LocalSession.java:281)

at org.jivesoftware.openfire.spi.RoutingTableImpl.routeToLocalDomain(RoutingTableI mpl.java:306)

at org.jivesoftware.openfire.spi.RoutingTableImpl.routePacket(RoutingTableImpl.jav a:234)

at org.jivesoftware.openfire.net.SocketPacketWriteHandler.process(SocketPacketWrit eHandler.java:68)

at org.jivesoftware.openfire.spi.PacketDelivererImpl.deliver(PacketDelivererImpl.j ava:56)

at org.jivesoftware.openfire.handler.IQHandler.process(IQHandler.java:67)

at org.jivesoftware.openfire.IQRouter.handle(IQRouter.java:374)

at org.jivesoftware.openfire.IQRouter.route(IQRouter.java:121)

at org.jivesoftware.openfire.spi.PacketRouterImpl.route(PacketRouterImpl.java:76)

at org.jivesoftware.openfire.net.StanzaHandler.processIQ(StanzaHandler.java:337)

at org.jivesoftware.openfire.net.ClientStanzaHandler.processIQ(ClientStanzaHandler .java:93)

at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:302)

at org.jivesoftware.openfire.net.StanzaHandler.process(StanzaHandler.java:194)

at org.jivesoftware.openfire.nio.ConnectionHandler.messageReceived(ConnectionHandl er.java:181)

at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.messageReceived (AbstractIoFilterChain.java:570)

at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(Ab stractIoFilterChain.java:299)

at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilt erChain.java:53)

at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceive d(AbstractIoFilterChain.java:648)

at org.apache.mina.common.IoFilterAdapter.messageReceived(IoFilterAdapter.java:80)

at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(Ab stractIoFilterChain.java:299)

at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilt erChain.java:53)

at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceive d(AbstractIoFilterChain.java:648)

at org.apache.mina.filter.codec.support.SimpleProtocolDecoderOutput.flush(SimplePr otocolDecoderOutput.java:58)

at org.apache.mina.filter.codec.ProtocolCodecFilter.messageReceived(ProtocolCodecF ilter.java:185)

at org.apache.mina.common.support.AbstractIoFilterChain.callNextMessageReceived(Ab stractIoFilterChain.java:299)

at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilt erChain.java:53)

at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.messageReceive d(AbstractIoFilterChain.java:648)

at org.apache.mina.filter.executor.ExecutorFilter.processEvent(ExecutorFilter.java :239)

at org.apache.mina.filter.executor.ExecutorFilter$ProcessEventsRunnable.run(Execut orFilter.java:283)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51)

at java.lang.Thread.run(Thread.java:636)

We have pretty much the same issue here. 3.8.1 Openfire running in AWS. Anytime a user makes their avatar a specific image, including the 2 you posted here, it won’t let that user connect. What is very bizare though, is that it also breaks connections for other users at the same time, even though they haven’t changed anything at all. This too is consistent and repeatable for us. So in other words, a few users by changing their avatars, can bring down 5-10 other users also. What did fix it was completely disabling any form of secure encryption - Disable TLS and SSL completely. Of course we don’t want to leave it set that way, and were finally able to enforce an avatar policy, but it still doesn’t explain what’s going on with the avatar bug. Please fix!

Peter, I have come across several more images which cause the problem as well. In my case, it’s not 5-10 users,it’s more like 60. The avatar is published to users during presence updates. When the data for the image hits java.nio, it screws up and causes a buffer overflow in java.net.ssl of SSLHandler.java. This exception causes the process thread running the innocent user’s SSL connection to abnormally terminate. I also found disdabling encryption corrected the probelm, which makes sense as the issue is SSL breaking.

I use OpenJDK-6, what do you use? I don’t know if the issue is OpenFire handing bad data off to the SSL encrypter or if it’s the JDK itself.

Interesting. We too were getting the buffer overflow errors. We use OpenJDK 1.6.0_27, and I’m not sure where things are breaking down either… hope this gets one of the devs attention, though!

Looks like I too am being affected by this. And I’ve been working on this for hours, initially I thought that my self signed cert was to be blamed. One brand new cert, and lots of fighting with keytool later, the problem still exists, I finally noticed that I was missing more information from info.log (any by the way, these things should be in error.log, not info.log)

2013.07.29 16:34:17 org.jivesoftware.openfire.nio.ConnectionHandler - ConnectionHandler reports IOException for session: (SOCKET, R: /192.168.80.21:55239, L: /192.168.77.211:5222, S: 0.0.0.0/0.0.0.0:5222)

javax.net.ssl.SSLException: SSLEngine error during encrypt: BUFFER_OVERFLOW src: java.nio.HeapByteBuffer[pos=15847 lim=15965 cap=16384]outNetBuffer: java.nio.DirectByteBuffer[pos=15914 lim=32768 cap=32768]

at org.apache.mina.filter.support.SSLHandler.encrypt(SSLHandler.java:377)

at java.lang.Thread.run(Unknown Source)

2013.07.29 16:34:17 org.jivesoftware.openfire.nio.ConnectionHandler - ConnectionHandler reports IOException for session: (SOCKET, R: /192.168.80.21:55239, L: /192.168.77.211:5222, S: 0.0.0.0/0.0.0.0:5222)

javax.net.ssl.SSLException: Received fatal alert: bad_record_mac

at sun.security.ssl.Alerts.getSSLException(Unknown Source)

at java.lang.Thread.run(Unknown Source)

2013.07.29 16:34:17 org.jivesoftware.openfire.nio.ConnectionHandler - ConnectionHandler reports IOException for session: (SOCKET, R: /192.168.80.21:55239, L: /192.168.77.211:5222, S: 0.0.0.0/0.0.0.0:5222)

javax.net.ssl.SSLException: bad record MAC

at sun.security.ssl.Alerts.getSSLException(Unknown Source)

at java.lang.Thread.run(Unknown Source)

2013.07.29 16:34:17 org.jivesoftware.openfire.nio.ConnectionHandler - ConnectionHandler reports IOException for session: (SOCKET, R: /192.168.80.21:55239, L: /192.168.77.211:5222, S: 0.0.0.0/0.0.0.0:5222)

java.io.IOException: Connection reset by peer

at sun.nio.ch.FileDispatcherImpl.read0(Native Method)

at java.lang.Thread.run(Unknown Source)

I was also affected by this, but I’m not entirely confident that it is Avatar related.

Our org recently updated their certificate, and also subsequently I know that openssl has updated from 0.98x to 1.01. I think the combination of these may have been the problem. I re-deployed our Debian 6 server as a Debian 7 Wheezy (current release) and performed our install, and things are working again for us. I don’t see the same connection handler messages anymore; i think i caught one but no others thus far.

We are getting affected by this on a regular basis. From time to time, 5-10% of our users can’t connect due to a recently uploaded avatar. The resolution consists in asking all the recently updated avatars to be reset to a standard image.

Pretty much the same issue here. I cannot log into a company im server, the very same error appears.

No idea whether it is avatar-related. (I did not change mine recently but maybe someone else did.)

Anyone knows a solution or at least a work-around?

Actually I discovered that mine wasn’t related to an Avatar at all, but was related to the roster size reaching a particular point of users. I duplicated the roster (making it larger) and this solved my problem. I just now tell users to collapse the ‘Users’ group, which contains everyone sorted by name instead of grouped by department.

The details are identified here: http://issues.igniterealtime.org/browse/OF-496

It seems I found the cause of this error, at least for my setup. Posting here in hope it could help someone else.

The class SSLHandler (from Apache Mina library) contains the following code:

if (src.remaining() > ((outNetBuffer.capacity() - outNetBuffer.position()) / 2)) {

// We have to expand outNetBuffer

// Note: there is no way to know the exact size required, but enrypted data

// shouln’t need to be larger than twice the source data size?

outNetBuffer = SSLByteBufferPool.expandBuffer(outNetBuffer, src.capacity() * 2);

It seems that for some messages, the encrypted data ARE longer than twice the source. It actually relates very well to the this “avatar” topic: Your scaled PNG image cannot be compressed (you can try, the file gets larger if you (g)ZIP it) and thus it is a good candidate to occupy much space when encrypted.

Unfortunately, I do not see any other solution that a programmatic one - though it is relatively easy. In my case, I changed the multiplying factor to 4 instead of 2 (and added some fixed number to be safer).

Then it is necessary to recompile at least that one class.

I wonder if this problem would disappear by upgrading to a newer version of Apache Mine…

Kacer,

Thanks, I’ve added this to the OF-496 ticket. To clarify, you just changed the multiplier to 4 and it resolved your issue?

daryl

Yes, it did. More exactly, my code now looks like this:

if (src.remaining() > ((outNetBuffer.capacity() - outNetBuffer.position()) / 4) - 128) {

outNetBuffer = SSLByteBufferPool.expandBuffer(outNetBuffer, src.capacity() * 4);

but I believe the 128 is not really needed.

I did not have time to experiment with this more. Some better formula should be used, since 4 is probably too much and, on the other hand, may sometimes be insufficient if the message is really short. (Just guessing, I really did not study the issue much.)

I am not sure if OF-496 is caused by the same problem.

Hope this helps.

I too am not sure if OF-496 is related (although part of the comments in that issue are). I do have the feeling that two different issues are being commented on in that ticket.

The problem that you identified is also identified in the issue tracker of the MINA project; https://issues.apache.org/jira/browse/DIRMINA-914. As the branch of the MINA project that Openfire uses is no longer being updated, I’ll manually apply the patch and build a new library to be included in Openfire. That, at the very least, will solve some of the issues that people are experiencing.

Martin, what’s your take on the fix presented in DIRMINA-914? My concern with your fix is that it’s a rather arbitrary increase. The fix in DIRMINA-914 uses a buffer size as obtained from the SSLEngine. If my interpretation of the related javadoc is correct, that might be foolproof:

The SSLEngine produces/consumes complete SSL/TLS packets only, and does not store application data internally between calls to wrap()/unwrap(). Thus input and output ByteBuffers must be sized appropriately to hold the maximum record that can be produced. Calls to SSLSession.getPacketBufferSize() and SSLSession.getApplicationBufferSize() should be used to determine the appropriate buffer sizes.
I am thinking of replacing the entire block that conditionally resizes the outNetBuffer with something like this:

``

if (outNetBuffer.remaining() < sslEngine.getSession().getPacketBufferSize()) {
   outNetBuffer = SSLByteBufferPool.expandBuffer(
               outNetBuffer,
               sslEngine.getSession().getPacketBufferSize() + outNetBuffer.position()); }

I don’t see the point in adding that code to the existing expansion code as is done in DIRMINA-914 (and instead would like to replace that bit). What are your thoughts?

I have checked in a fix based on the approach described in my last comment. Give it a try! Details can be found in the JIRA issue.

1 Like

(The site was temporarily down for maintenance when I tried to post this a couple of hours ago, so doing it again now.)

I agree completely.

My snippet is a quick hotfix, the first thought that crossed my mind (since

I was somewhat in a hurry).

If the SSL engine provides the maximum packet size, it definitely should be

used here instead.

I too do not see any reason to keep the old expansion code (as suggested in

DIRMINA-914) once the packet buffer size is used - so go ahead and replace

it with the new one. Of course, I did not study the issue very thoroughly,

so no guarantee.

Hello friend,

I have the same problem due to FIX session I have.

Disconnecting: Socket exception (/192.168.141.143:62608): javax.net.ssl.SSLException: SSLEngine error during encrypt: BUFFER_OVERFLOW src: java.nio.HeapByteBuffer[pos=15847 lim=16191 cap=16191]outNetBuffer: java.nio.DirectByteBuffer[pos=15914 lim=32382 cap=32382]

Can someone confirm a formal solution for this issue ?

I’ll thank for your answers in advance.