One of our customers has noted that XML files, which are imported (XML-to-Domain), converted to a different format using microflows, and exported (Domain-to-XML) using Mendix, do not carry over special characters such as Ä or Ü. Instead, they get "ï¿½" etc. I have read a few things about this on the forums but the advice dates from quite a while ago. We are using 126.96.36.199 for this. The incoming XML files have encoding="WINDOWS-1252", but Mendix seems to change this to utf-8 somewhere in the mapping process. Can I get Mendix to read incoming XML files with umlauts and such correctly, and how can I make sure that the output XML also ensures the correct encoding? Thanks UPDATE: Achiel helped me out. Some XML files have an encoding that does not match the encoding statement. Not a Mendix problem :)
The SAX parser will use the
InputSource object to determine how
to read XML input. If there is a
character stream available, the
parser will read that stream directly,
disregarding any text encoding
declaration found in that stream. If
there is no character stream, but
there is a byte stream, the parser
will use that byte stream, using the
encoding specified in the InputSource
or else (if no encoding is specified)
autodetecting the character encoding
using an algorithm such as the one in
the XML specification.
The parser then hands the xml data to the xml importer, one 'java' char at a time. This should result in correctly encoded xml.
That being said, are you sure the data is parsed incorrectly? What does the data look like inside the database?