I had the same problem.
Keeping everything that is contained between <body> and </body> (excluding the body tags) worked for my document creation with XHTML.
I found a userlib --> commons-text-1.10.0 here --> https://commons.apache.org/proper/commons-text/download_text.cgi
Downlaoded the jar file and inserted it in my user lib. Then I created a java action that I use to unescape the HTML -->
// BEGIN USER CODE
return StringEscapeUtils.unescapeHtml4(this.InputHTML_Unescaped);
// END USER CODE
where the static class is imported like this in the upperside of the code -->
import org.apache.commons.text.StringEscapeUtils;
That is more a stackoverflow question like this one: https://stackoverflow.com/questions/60327377/getting-doctype-is-disallowed-when-the-feature-http-apache-org-xml-features-d
Regards,
Ronald