This is the Message Centre for Jim Lynn

Question for you about encoding

Post 1

Felonious Monk - h2g2s very own Bogeyman

I've started work on GuideDog again after a long break (mainly to learn C++ so I can write it properly this time). It's actually coming along rather nicely at the moment. I have some XML related questions for you.

What character encodings does the DNA engine support? Does it store Unicode characters directly or does it use some other kind of internal coding scheme? Also, what encodings will it handle when someone posts an entry?


Question for you about encoding

Post 2

Jim Lynn

We will expect ISO-8859-1. The database doesn't yet handle Unicode. If you pass any other encoding, expect to have the characters mangled or have it totally fail to work at all. The best bet for any and all out of range characters is to XML-escape them (&#1234smiley - winkeye.


Question for you about encoding

Post 3

Felonious Monk - h2g2s very own Bogeyman

so what about HTML named entities such as —? Will they work, or should I go for the escaped syntax as a preference?

The reason I ask this is because I need to validate the XML against a DTD on import otherwise the XSLCompiledTransform class throws an exception on encountering special characters. As I have access to the DTD and hence the entity mappings, it's pretty easy to put the named form back into the output.



Question for you about encoding

Post 4

Jim Lynn

I would use the escaped syntax in preference. Under the hood, for our XSLT transformations, we have to use a DTD to convert the named entities into the escaped unicode characters anyway.


Question for you about encoding

Post 5

Felonious Monk - h2g2s very own Bogeyman

Well, that makes my life a lot simpler as the XmlTextWriter can be created with the ISO encoding scheme and will handle this automatically. The only advantage I can see to retaining the named syntax is to allow article authors to identify special characters in the raw XML.


Question for you about encoding

Post 6

Felonious Monk - h2g2s very own Bogeyman

BTW: do you happen to have any GuideML test documents that exercise character encodings, so I can have a muck about with them? Something along the lines of 'lorem ipsum' with different symbols in, please?


Question for you about encoding

Post 7

SEF

There's this one: <./>GuideML-Characters</.>


Question for you about encoding

Post 8

Felonious Monk - h2g2s very own Bogeyman

Thanks: but I can't get at the original GuideML, which makes testing difficult.


Question for you about encoding

Post 9

SEF

<./>test1098876</.>


Question for you about encoding

Post 10

Felonious Monk - h2g2s very own Bogeyman

Thank you...I shall try it out.


Question for you about encoding

Post 11

Felonious Monk - h2g2s very own Bogeyman

I've created a project at http://www.codeplex.com/guidedog to manage the source code for the project. You will need VS.NET 2005 Pro to be able to program it, but there is other work, such as creating stylesheets, that requires no more than a text editor.

This time around I have a much better feeling about it: MFC provides me with most of the necessary tools to construct the application just the way I want it. C++ allows me to get deep down into MSHTML. And there aren't all those pesky interop issues that cause exceptions to worry about.


Key: Complain about this post

More Conversations for Jim Lynn

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more