HTML to String

I have a String field that contains HTML text. I want to remove all HTML mark-up, only the 'Enters' should stay visible in the converted String value. Who knows how to convert this String?
3 answers

Perhaps you can use a Java action. See this from stackoverflow here


This is actually much harder than it seems. I suggest looking at the link David gave you and picking one of the libraries that they suggest, don't try to code this all yourself or you will most likely overlook many corner cases.


I've personally used this in a Java Action in which the HTML is stripped with the following expression:

Content.replaceAll("<br/>", "\n").replaceAll("&nbsp;", "").replaceAll("</p>","\n").replaceAll("<[^>]*>", "")

with Content being the String which is to be stripped. So far hasn't caused any issues as far as I can remember, so might be a decent starting point for this. (Though depending on the potential input, you might also have to deal with things such as rsquo, lsquo etc.)

Just noticed btw, where the code says replaceAll(" ", ""), the tag being replaced is the nbsp one, which however is getting interpreted by the forum :p