Wolf_P,
Thanks for making those changes for this. We’ve tried this, but it seems it does not handle 8 bit characters properly
We have a roster URI, which is
…roster;name=%C3%84%C3%A4kk%C3%B6nen%20%C3%96ky
This represents the name “Ääkkönen Öky” and it is utf-8 encoded and then pct-encoded. This follows step 2 of section 3.1 of RFC 3987, which states
Step 2. For each character in ‘ucschar’ or ‘iprivate’, apply steps
2.1 through 2.3 below.
2.1. Convert the character to a sequence of one or more octets
using UTF-8 [RFC3629].
2.2. Convert each octet to %HH, where HH is the hexadecimal
notation of the octet value. Note that this is identical
to the percent-encoding mechanism in section 2.1 of
[RFC3986]. To reduce variability, the hexadecimal notation
SHOULD use uppercase letters.
2.3. Replace the original character with the resulting character
sequence (i.e., a sequence of %HH triplets).
If the data is sent with no utf-8 encoding, i.e. the 8 bit characters are sent as is, then the name will appear in the Nickname.
The %20 is decoded to a space. Looking at UriIManager, it does not do any decoding of the String apart from the call to replace for the %20 encoding.
I am not sure of the fix for this as I don’t know how the command line parameters are handled, but from my reading of the RFC, all the URI mapping handlers should actually be treating the Java uriMapping String as a pct-encoded utf-8 byte stream, so they should be doing something like
a) Decode pct-encoding to byte
b) Create byte array from command line including decoded bytes
c) Create java String from byte array with new String (byte, Charset.forName(“UTF-8”));
Do you agree?
Antony