Bug?: Server passes along malformed XML (illegal UTF8) to clients

Some clients currently use strict XML parsers to parse s2c messages. I’'ve found that if a client gets an illegal (non-UTF8) character to the server, it will pass it along (in what amounts to non-well-formed XML) to the intended recipient. If the recipient uses a strict parser, the connection (which is “one big XML document” essentially) terminates.

http://www.w3.org/TR/REC-xml/#sec-references (in particular http://www.w3.org/TR/REC-xml/#NT-Char) define what valid characters. For example, &01; (�) would be an illegal character (I saw the character entity sent by the server when a client sent the unescaped character in the message body).

I observed this in Wildfire Server 3.0.1 (running on Ubuntu Dapper Drake, though that should not matter). I have not tested it with newer versions of Wildfire, and I have not tested it with other servers. Also, I have only tested this in one-to-one type=“chat” (perhaps type=“message”–I wasn’'t paying close enough attention at the time) messages.

I am somewhat (but not very) hesitant to post a test case, since the result is a fairly annoying DoS attack against strict clients.

I already reported it to some of the clients, but since it seems like legitimately invalid XML, maybe the server can play a role in the solution (I’'m not sure what way would be appropriate, and obviously, addressing this warrants looking at how various clients send and otherwise cope with non-ISO-8859-1 characters.

  • Tim

Hi Tim,

one can read in the document which you mentioned above “The behavior of a validating XML processor is highly predictable; it must read every piece of a document and report all well-formedness and validity violations.” so also a strict XML parser must not terminate the connection but continue to read every piece of the document. Otherwise it is able to find only one error in the document.

One could add a strict parser to Wildfire but this would just decrease performance so I can’'t see a need to do this.

LG

I think Wildfire should try to protect against this if at all possible. I know you’‘re hesitant to post a test case, but doing so would help us find and fix the problem much more easily. If you’'d like, you could email it to me or Gato directly.

Thanks,

Matt

Hi Matt,

within every editor you can use “ALT+0 + ALT+1” to create the special “SOH” character and then copy it in Spark and send it. You have just received a “SOH SOH SOH” message which looks empty within Spark. If you use an editor like Notepad++ to view “Spark\user\matt@jivesoftware.com\transcripts\lg@jivesoftware.com.xml” you’'ll see it.

But please tell me what Wildfire should do? Warn the server administrator and filter the special characters? There is already a Content Filter Plugin which allows one to do exactly this - and these checks will decrease the server performance a lot.

LG