After upgrading to smack 4 (specifically 4.1.2), we noticed that some incoming XHTML messages are failing XHTML 1.0 validation. It seems that the parsing has changed with version 4. In XHTMLExtensionProvider the body is populated using the output parsed from the XmlPullParser. The pull parser, MXParser.parseEntityRef() method, is unescaping sequences like:
& < > ' " etc…
So if smack receives a message like this:
Sending restricted XHTML char &
Isome_thread_id
Sending restricted XHTML char &
The body that smack provides in the parsed XHTMLExtension will be:
Sending restricted XHTML char &
Which is not compliant XHTML.
Is there a way to force the pull parser to preserve the XML escaped character sequences within the XHTML body?
CharSequence text = parser.getText();
if (event == XmlPullParser.TEXT) {
// TODO the toString() can be removed in Smack 4.2.
text = StringUtils.escapeForXML(text.toString());
sb.append(text);
} else {
sb.append(parser.getText());
}
That seems to work. (but I still have alot of testing/tinkering to do)