Spark Chat History XML Bug

If this has already been identified or fixed just ignore me…

Environment

Windows 7 Pro

Spark 2.6.3

Issue Summary

Sending XML data as a spark message causes the XML file that contains the chat history to stop being parsed at the comment containing the XML markup. History was still being appended correctly to the file, but could not be displayed.

Specific Issue

See attached picture (The message board also encountered an error when posting this data.)

My chat history ended right before this comment even though there were hundreds of additional messages after this comment.

Temporary Fix

Removing the message body containing the markup caused the spark history to show correctly.

Have filed this as SPARK-1513. As there are almost none active developers currently can’t say when and whether it will be fixed.

Hello keith,

I’m working on a patch right now and hope to have it committed in the very near future. It needs some refining and some testing.

If you could provide me with any additional XML data or HTML data that is known to break the current release of Spark transcript logs, it would help me in building a more robust solution.

Thanks!

Hi

this would be very helpful. Thanks.

Walter

Hey Jason,

Unfortunately, the only real issue I have had with my dev team is the one instance where CDATA was present. The only test I can think of running would be to make sure you can pass valid XML in chat and have it be parsed.

Hello Keith,

Thanks for getting back to me with your findings.

Spark already has built-in escaping of XML and HTML characters, however CDATA doesn’t look like it was anticipated for… and therefore caused the issue you ran into.

I have a WIP patch as of now, but it needs some more tweaking before I’m comfortable calling it Working. I’ll just run some production XML data from my company through Spark and see if I can get it to break the transcript logs… we have some… interesting XML data over here!

Essentially Spark should be able to store anything entered into the chat in the transcript logs without incident. The more test cases we have, the easier it will be to achieve that goal.

Thanks again, and check SPARK-1513 for progress updates on this issue.

You should probably check also, what impact this additional filtering would have on the processing of the history if you send a lot of xml data (in separate messages). Though Spark already has history loading problems and one of the Walters collegues was preparing a patch with history pagination to mitigate that. But it looks the work has stopped. SPARK-1407 (related SPARK-1404)

Hello,

I’ve tested by sending large XML’s through the chat log, some one at a time, others doble pasted into the chat box. None seem to break the formatting of the transcript log and can be displayed back in the history view window just fine.

Although I did run into an issue when pasting in a large XML in its entirety (about 3800 lines). If I sent this XML file (by pasting into the chat box) twice, then loading of history slows dramatically. However this occurs with simple plain text for me as well, which does not get filtered by my patch. I believe this is due to the massive amoutn of data Spark must parse prior to displaying in the hisotry window.

Here’s an example of a large XML I used for testing: http://www.enetpulse.com/wp-content/uploads/summer_olympics_mens_road_race.xml

If you copy and paste that XML into the chat box and send it twice, then close and open your session and click the History button, you will notice the performance hit. – This seems to be relevant to both Tickets you posted just now Wroot, and therefore I conclude my patch doesn’t aggrevate this issue any.

And as a side note, during my testing, I noticed spark will not allow you to paste a clipboard into the chat box window if the clipboard holds over a certain amount of data? I was testing with 1+MB XML files and it would not allow me to paste them into the window and send. However, if I copied and pasted the XML from the link above (about 3800 lines), I could paste that into the chat box window multiple times and send it. I ran into an issue when I pasted it into the chat box a few times at once, then sent, and the server side seemed to choke hard and actually dissconnected me for several minutes. I tried closing spark and reopenging, but the same server would not accept login for several minutes. During this time I was able to connect to other servers no issues, so it wasn’t a Spark issue. The server was an OpenFire server… Maybe the server-side logging of OpenFire chokes on very large data same as Spark transcript loading? – just an observation…

I just uploaded a .patch file for this. Can it be merged into mainline? I’ve tested it and use this build now personally with no issues.

Thanks.

review is requested as well as merge to trunk

I noticed that SPARK-1513 was closed, but there is still a problem with a CDATA block. Conversation history is truncated when a message contains the closing CDATA symbol : ] ] >. In the XML saved by Spark, the starting < is encoded, but it seems to be the unencoded final > that is causing the problem. If I encode it in the saved file, the history is then displayed correctly.

Spark 2.7.0

1 Like

Thanks for bringing this up. And sorry. I wasn’t sure how to test it and thought it might be fixed already. It was a bit confusing to find out which of the patches was final one and Jason is not around anymore. I also had to apply it manually. But after some testing i think it works now. You may test it with 681 build (eventually will be in 2.7.1) Install4j

Note: it won’t fix an already corrupted history. So if you have some of these characters in your history, remove them from the transcript file manually or clear the history to start with fresh one.

1 Like

I confirm that build 681 works as expected. Thanks! :smiley: