XHTMLExtensionProvider un-escapes restricted XHTML chars when parsing

Hello,

After upgrading to smack 4 (specifically 4.1.2), we noticed that some incoming XHTML messages are failing XHTML 1.0 validation. It seems that the parsing has changed with version 4. In XHTMLExtensionProvider the body is populated using the output parsed from the XmlPullParser. The pull parser, MXParser.parseEntityRef() method, is unescaping sequences like:

& < > ' " etc…

So if smack receives a message like this:

Sending restricted XHTML char &

Isome_thread_id

Sending restricted XHTML char &

The body that smack provides in the parsed XHTMLExtension will be:

Sending restricted XHTML char &

Which is not compliant XHTML.

Is there a way to force the pull parser to preserve the XML escaped character sequences within the XHTML body?

Hmm, right that is not ideal. It don’t think it’s the provider that should change, but we should simply re-escaping the body text when adding it.

SMACK-680

I’ve uploaded Smack 4.1.3-SNAPSHOT with https://github.com/Flowdalic/Smack/commit/b9c87cef7377b0ff56f9bda452ba69111a41ed 10 it’s completly untested, but maybe fixes the issue.

As always, please test and report back.

I had to make a small change to line 500:

            CharSequence text = parser.getText();

            if (event == XmlPullParser.TEXT) {

                // TODO the toString() can be removed in Smack 4.2.

                text = StringUtils.escapeForXML(text.toString());

                sb.append(text);

            } else {

            sb.append(parser.getText());

            }

That seems to work. (but I still have alot of testing/tinkering to do)

My fault, fixed with https://github.com/Flowdalic/Smack/commit/e6a403fb1c869e9c6a1b5edf3d5b99d3cff2cb 98

Uploading new 4.1.3-SNAPSHOT while writing this.

Thanks! That checkin fixes the problem.

Do you have any idea of how frequently point releases like that will come out? Or more specifically do you have a target date for smack 4.1.3?