How can I resolve Unicode Hex Value Mismatches between WordML and XSL:FO?

Question

We have an important legal document that our app generates in WordML, with foreign characters represented via Unicode. These foreign characters vary widely, and include languages with special characters like Korean and Cyrillic.

We have all of the unicode hex values for WordML, but our print room has informed us that they can't accept .doc files at all - only PDFs. So we're now converting the entire file into an XSL-FO document.

The problem is - XSL-FO doesn't use the same Unicode hex values, and in fact when we try to produce the XSL-FO document, the hex values come out as "#" symbols, indicating that no proper value was found.

Not all the unicode characters failed to be produced - in particular, special characters for French and Spanish seemed to display just fine. But none of the Cyrillic or Korean characters were successfully displayed.

Is there a library of Hex code characters for XSL-FO, or some type of simple conversion we could do to make these hex codes match the XSL-FO Unicode values?

Unicode is designed such that each character has a unique code independent of the programs you use. It is more likely that you get those # symbols because your FO processor can't find an appropriate symbol for those characters in the fonts it has access to/is told to use. — Bart van Ingen Schenau, Aug 05 '15 at 16:09
@BartvanIngenSchenau So it's more likely to be a library problem than a template problem? — Zibbobz, Aug 05 '15 at 16:22
Just curious, you have doc files (MS Word) which look fine, and you just need PDF? Why don't you just use the inbuilt PDF conversion of MS Word which is available since Office 2010? — Doc Brown, Aug 05 '15 at 16:39
I would rather classify it as a tooling problem. But see also the comment by DocBrown. — Bart van Ingen Schenau, Aug 05 '15 at 16:47
... and concerning xsl-fo: which fo processor are you using? For example, Apache FOP 1.1 lists under the "known issues" section " characters whose code points are greater than 65535 not yet supported", see https://xmlgraphics.apache.org/fop/1.1/knownissues_overview.html. Don't know if Cyrillic or Korean are in that range. — Doc Brown, Aug 05 '15 at 16:49

How can I resolve Unicode Hex Value Mismatches between WordML and XSL:FO?

0 Answers0