XMPPDecoder has a decode problem for UTF-8

A utf8 character ( as char in Java) is usually composed of 1-3 bytes ( max is 6 bytes), see http://en.wikipedia.org/wiki/UTF-8 .

From now on, I assume a character that is 3 bytes.

Openfire use mina nio process network stream, and implement a XMPPDecoder for docode bytes to String/Stanza.

When decode a bytebuffer, it’s may be incomplete bytes for a character. eg. In bytebuffer’s last few bytes, you may receive one or two or three

bytes for a character, if there’s 1 or 2 bytes then it’s incomplete. It’s Random happen incomplete state. If input long 3bytes character, the random probability significantly increased.

let’s see org.jivesoftware.openfire.nio.XMLLightweightParser ( openfire 3.6.4 ):

Charset encoder = Charset.forName(charset);

CharBuffer charBuffer = encoder.decode(byteBuffer.buf());

char[] buf = charBuffer.array();

int readByte = charBuffer.remaining();

char lastChar = buf[readByte-1];

if (lastChar >= 0xfff0) { // you think it’s incomplete, then position-1, readByte-1

byteBuffer.position(byteBuffer.position()-1); //error

readByte–; //error

}

The above code is not properly handled the case that is incomplete for UTF-8.

If a character is 3 bytes, there is incomplete for one or two bytes at the end of bytebuffer.

If one byte incomplete, bb’s position should -1. If two bytes position incomplete, bb’s position should -2.

So, if position-1 and two bytes position incomplete, this 3 bytes become the last two bytes for decode, and then be replace to two “FD”.

Or so, if position-2 and one bytes position incomplete, this 3 bytes become the 4 bytes for decode, and then there’s one more “FD” and this character.

See also java.nio.charset.CharsetDecoder , it’s two decode methods.

Notice :

decode(ByteBuffer in, CharBuffer out, boolean endOfInput) ,the last param tell decoder bb’s imcomplete or complete.

CodingErrorAction has three instances : IGNORE/REPLACE/REPORT for decoder.

Charset.encode(bb) means Charset.newEncoder() .onMalformedInput(CodingErrorAction.REPLACE) .onUnmappableCharacter(CodingErrorAction.REPLACE) .encode(bb);

My test case for java.nio.charset.CharsetDecoder.decode(ByteBuffer, CharBuffer, boolean):

public static void main(String[] args) throws Exception {

CharsetDecoder replaceDecoder = Charset.forName(“UTF-8”).newDecoder()

.onMalformedInput(CodingErrorAction.REPLACE)

.onUnmappableCharacter(CodingErrorAction.REPLACE);

CharsetDecoder ignoreDecoder = Charset.forName(“UTF-8”).newDecoder()

.onMalformedInput(CodingErrorAction.IGNORE)

.onUnmappableCharacter(CodingErrorAction.IGNORE);

String input = “你好”;

byte[] fullBytes = input.getBytes(“UTF-8”);

System.out.println("input : " + input);

System.out.print("input bytes in utf8 : ");

print(fullBytes);

System.out.println();

System.out.println("=========================================");

// decodeAndPrint(decoder, fullBytes);

byte[] bytes0_4 = Arrays.copyOfRange(fullBytes, 0, 4);

decodeAndPrint(replaceDecoder, bytes0_4, true);

decodeAndPrint(replaceDecoder, bytes0_4, false);

decodeAndPrint(ignoreDecoder, bytes0_4, true);

decodeAndPrint(ignoreDecoder, bytes0_4, false);

byte[] bytes0_5 = Arrays.copyOfRange(fullBytes, 0, 5);

decodeAndPrint(replaceDecoder, bytes0_5, true);

decodeAndPrint(replaceDecoder, bytes0_5, false);

decodeAndPrint(ignoreDecoder, bytes0_5, true);

decodeAndPrint(ignoreDecoder, bytes0_5, false);

}

private static void print(byte[] bytes) {

for (byte b : bytes) {

System.out.print(Integer.toString(b & 0xFF, 16) + " ");

// System.out.print(b+" ");

}

}

private static void decodeAndPrint(CharsetDecoder decoder, byte[] bytes,

boolean complete) {

decoder.reset();

CharBuffer charBuffer = CharBuffer.allocate(bytes.length);

ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);

CoderResult coderResult = decoder.decode(byteBuffer, charBuffer,

complete);

System.out.println("CoderResult: " + coderResult);

System.out.println(“input " + bytes.length + " bytes”);

System.out.println(“decode " + byteBuffer.position() + " bytes”);

System.out.println(“decode " + charBuffer.position() + " characters”);

System.out.println(Arrays.copyOf(charBuffer.array(), charBuffer

.position()));

System.out.println("-----------------------------------------------");

}

Console show the results:

input : 你好

input bytes in utf8 : e4 bd a0 e5 a5 bd

=========================================

CoderResult: UNDERFLOW

input 4 bytes

decode 4 bytes

decode 2 characters

你?


CoderResult: UNDERFLOW

input 4 bytes

decode 3 bytes

decode 1 characters


CoderResult: UNDERFLOW

input 4 bytes

decode 4 bytes

decode 1 characters


CoderResult: UNDERFLOW

input 4 bytes

decode 3 bytes

decode 1 characters


CoderResult: UNDERFLOW

input 5 bytes

decode 5 bytes

decode 2 characters

你?


CoderResult: UNDERFLOW

input 5 bytes

decode 3 bytes

decode 1 characters


CoderResult: UNDERFLOW

input 5 bytes

decode 5 bytes

decode 1 characters


CoderResult: UNDERFLOW

input 5 bytes

decode 3 bytes

decode 1 characters


MyTest.java.zip (861 Bytes)

This problem causes some messy code sometimes at Openfire 3.6.4 or earlier.

Because of using strict mode at Openfire 3.7.0 :

encoder = Charset.forName(charset).newDecoder()

.onMalformedInput(CodingErrorAction.REPORT)

.onUnmappableCharacter(CodingErrorAction.REPORT);

encoder.decode(byteBuffer.buf());

This will throw exception when imcomplete state. And the after code

if (lastChar >= 0xfff0)

will don’t arrive.

So there is random disconnect !

upload my patch. It resolve the problem, and work correctly for a month
3.6.4_XMLLightweightParser.java.patch.zip (1813 Bytes)
3.7.0_XMLLightweightParser.java.patch.zip (1719 Bytes)

Filed as OF-458

Upload it where, sorry for such as newbie question?

Upload as a plugin?

I think he meant “i’m uploading my patch” This patch can only be applied to the source code and then Openfire should be recompiled.

Oh oK . But when we can w8 for the new release from U guys?

Mb someone has recompiled .exe with this fix? If yes could U upload somewhere. Would be incredibly appreciated.!

I confirm the same problem on Openfire 3.7.0 + Pandion. Please, somebody, upload compiled patched Openfire if that’s possible!

This patch fails to apply to current Openfire svn copy in Netbeans for me.

I apply this patch to the selected source(org.jivesoftware.openfire.nio.XMLLightweightParser) in Eclipse , it’s ok.

So I upload the final source code ,thus you can simple replace the target.
XMLLightweightParser.java.zip (4034 Bytes)

Ok. Now i was able to apply your patch to the selected source file. Attaching the compiled openfire.jar, which should be copied intom /openfire/lib folder. Though i havent tested this and i’m not sure one need only the recompiled openfire.jar. But here you have it. Make a bacup of the original openfire.jar!
openfire.jar (7195618 Bytes)

1 Like

I just pass away…

Thank you! I’m going to test this jar today =)

Well, I’m testing Openfire 3.7.1 Alpha for 18 hrs. It works fine with no errors or warnings (clients are using Pandion). I’ve simply replaced openfire.jar with patched one.

Thank you, guys, for this update!

hanguokai, it looks like you are using Arrays.copyOf in you patch, which is not supported by Java 5 and it looks like wer are still supporting this obsolete version. This was already questioned, whether we should support this version, which has hitted end of life long time ago, see this poll (and maybe vote) http://community.igniterealtime.org/polls/1025

But, maybe you can make this patch with some other function still available in java 5? Meanwhile we have reverted this patch in the svn. But as i said, this can change.

On last patch, I did not notice Java5 restrictions, I usually use the environment Java6.

This time I upload a new patch for Java5.

The old Statement:

char[] buf = Arrays.copyOf(charBuffer.array(), charBuffer.position());

It can be replaced with some other writings.

  1. char[] buf = charBuffer.flip().toString().toCharArray();
    

char[] buf = new char[charBuffer.position()];

charBuffer.flip();charBuffer.get(buf);

  1. use the implement of Arrays.copyOf :

char[] copy = new char[newLength];

System.arraycopy(original, 0, copy, 0,

Math.min(original.length, newLength));

I use method 2.
XMLLightweightParser.java.patch.zip (1706 Bytes)

Thank you! We are always looking for more openfire SVN committers, please consider it

daryl