This is the Message Centre for Jim Lynn
Question for you about encoding
Felonious Monk - h2g2s very own Bogeyman Started conversation Feb 3, 2007
I've started work on GuideDog again after a long break (mainly to learn C++ so I can write it properly this time). It's actually coming along rather nicely at the moment. I have some XML related questions for you.
What character encodings does the DNA engine support? Does it store Unicode characters directly or does it use some other kind of internal coding scheme? Also, what encodings will it handle when someone posts an entry?
Question for you about encoding
Jim Lynn Posted Feb 5, 2007
We will expect ISO-8859-1. The database doesn't yet handle Unicode. If you pass any other encoding, expect to have the characters mangled or have it totally fail to work at all. The best bet for any and all out of range characters is to XML-escape them (Ӓ.
Question for you about encoding
Felonious Monk - h2g2s very own Bogeyman Posted Feb 5, 2007
so what about HTML named entities such as —? Will they work, or should I go for the escaped syntax as a preference?
The reason I ask this is because I need to validate the XML against a DTD on import otherwise the XSLCompiledTransform class throws an exception on encountering special characters. As I have access to the DTD and hence the entity mappings, it's pretty easy to put the named form back into the output.
Question for you about encoding
Jim Lynn Posted Feb 5, 2007
I would use the escaped syntax in preference. Under the hood, for our XSLT transformations, we have to use a DTD to convert the named entities into the escaped unicode characters anyway.
Question for you about encoding
Felonious Monk - h2g2s very own Bogeyman Posted Feb 5, 2007
Well, that makes my life a lot simpler as the XmlTextWriter can be created with the ISO encoding scheme and will handle this automatically. The only advantage I can see to retaining the named syntax is to allow article authors to identify special characters in the raw XML.
Question for you about encoding
Felonious Monk - h2g2s very own Bogeyman Posted Feb 7, 2007
BTW: do you happen to have any GuideML test documents that exercise character encodings, so I can have a muck about with them? Something along the lines of 'lorem ipsum' with different symbols in, please?
Question for you about encoding
Felonious Monk - h2g2s very own Bogeyman Posted Feb 7, 2007
Thanks: but I can't get at the original GuideML, which makes testing difficult.
Question for you about encoding
Felonious Monk - h2g2s very own Bogeyman Posted Feb 7, 2007
Thank you...I shall try it out.
Question for you about encoding
Felonious Monk - h2g2s very own Bogeyman Posted Feb 8, 2007
I've created a project at http://www.codeplex.com/guidedog to manage the source code for the project. You will need VS.NET 2005 Pro to be able to program it, but there is other work, such as creating stylesheets, that requires no more than a text editor.
This time around I have a much better feeling about it: MFC provides me with most of the necessary tools to construct the application just the way I want it. C++ allows me to get deep down into MSHTML. And there aren't all those pesky interop issues that cause exceptions to worry about.
Key: Complain about this post
Question for you about encoding
- 1: Felonious Monk - h2g2s very own Bogeyman (Feb 3, 2007)
- 2: Jim Lynn (Feb 5, 2007)
- 3: Felonious Monk - h2g2s very own Bogeyman (Feb 5, 2007)
- 4: Jim Lynn (Feb 5, 2007)
- 5: Felonious Monk - h2g2s very own Bogeyman (Feb 5, 2007)
- 6: Felonious Monk - h2g2s very own Bogeyman (Feb 7, 2007)
- 7: SEF (Feb 7, 2007)
- 8: Felonious Monk - h2g2s very own Bogeyman (Feb 7, 2007)
- 9: SEF (Feb 7, 2007)
- 10: Felonious Monk - h2g2s very own Bogeyman (Feb 7, 2007)
- 11: Felonious Monk - h2g2s very own Bogeyman (Feb 8, 2007)
More Conversations for Jim Lynn
Write an Entry
"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."