This is the Message Centre for Jim Lynn

Why doesn't the W3C Validator like your space?

Post 1

TRiG (Ireland) A dog, so bade in office

I've been playing around with the Validator at for a while, and found to my surprise that it doesn't like h2g2. (Come to that, it doesn't like the vast majority of the web, as far as I can tell. But it's happy with wikis.)

I've mentioned this, as a mild curiosity, in a couple of places. But today I tried it out on your user page, and got this:

"Sorry, I am unable to validate this document because on line 88, 91 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication."

http://validator.w3.org/check?uri=http%3A%2F%2Fh2g2.com%2Fu6

Defining the encoding as iso-8859-1 solves that problem, and reveals a few more.

http://validator.w3.org/check?uri=http%3A%2F%2Fwww.bbc.co.uk%2Fdna%2Fh2g2%2Fu6&charset=iso-8859-1&doctype=Inline

I'm just wondering what's so special about your userpage.

Cheeky, I know.

F2124165?thread=3436971&skip=29&show=1

TRiG.smiley - smiley


Why doesn't the W3C Validator like your space?

Post 2

Jim Lynn

The validator is unilaterally deciding that a page is UTF-8 without that being specified in the header or anywhere else. The norm when our skin was developed was iso-8859-1 (and, in fact, an attempt to use UTF-8 would probably have broken most browsers in use at the time).

And I should point out that, once you specify the actual encoding of the page, my space actually generates fewer errors than the Google homepage.

I'd truly like the h2g2 skin to be more standards compliant, but since there's been no budget for any kind of work on the skin for many years now, that's not been possible.

"I'm just wondering what's so special about your userpage."

Nothing. Try the BBC homepage and you'll get exactly the same result. If you don't find any encoding specification and therefore assume UTF-8, you'll generally be assuming wrongly.


Why doesn't the W3C Validator like your space?

Post 3

TRiG (Ireland) A dog, so bade in office

I wonder why it picks utf-8, so.

But it does pass on some pages. It doesn't mind my personal space, though it doesn't like Wilma's: http://validator.w3.org/check?uri=http%3A%2F%2Fwww.bbc.co.uk%2Fdna%2Fh2g2%2FU2192551&charset=iso-8859-1&doctype=Inline&outline=1&verbose=1 I really don't have a clue what I'm talking about. And I should be in bed.

TRiG.


Why doesn't the W3C Validator like your space?

Post 4

Jim Lynn

It will fail if your page happens to contain a character which confuses UTF-8. UTF-* is what's known as a multibyte character set. It can encode every single possible unicode character (and there are hundreds of thousands) which it does by using several byte values as 'control' characters to extend the 256 possible characters (or code points as they are known in unicode). This means that if you're trying to interpret an ISO-8859-1 encoded string as UTF-8 you might hit one of those control characters, but the bytes following it will not make sense as UTF-8. If your page happens not to contain any of these characters (they will usually be accented characters) it will still look like UTF-8 (although some of the characters will look wrong).


Why doesn't the W3C Validator like your space?

Post 5

TRiG (Ireland) A dog, so bade in office

Oooookay......

Thanks for taking the time to explain this. I'm out of my depth, but I do understand what you're saying, I think. I'm just wondering why the validator picks UTF-8 as the default if, as you say, the ISO one is more common. But I suppose that's not your problem.

Anyway, this place works on most browsers, doesn't it. And I think 2legs is using a screenreader, and he seems to manage. So it can't be too bad.

TRiG.smiley - smiley


Why doesn't the W3C Validator like your space?

Post 6

Jordan

Jim, does h2g2 use XSLT to transform GuideML into HTML? Gnomon just led me to realise the blindingly obvious, that GuideML can be transformed into XML simply by lowercasing the tag names.

Personally, I would also love to see h2g2 being more standards compliant. You couldn't possibly have done it back then, but it would certainly help with the server load if it were coded with modern browsers in mind!


The only major accessibility beef I've spotted with h2g2 is that the title text for the smileys is just the smiley code. Makes sense from a certain perspective (ie. "how do I make that smiley?", but I wish it were something a little more helpful. For example, 2legs didn't know that the handcuffs were pink until I told him a couple of years ago at a Meet, and I wondered earlier how he could know that the "cheerup" smiley is a picture of a smiley offering a flower, say.

smiley - space—Jordan


Why doesn't the W3C Validator like your space?

Post 7

TRiG (Ireland) A dog, so bade in office

GuideML is an XML dialect. XML is case-sensitive, but there's no requirement for it to be all lower case. XHTML is another XML dialect, and that one is all lower case.

Many GuideML tags are equivalent to XHTML tags (or to HTML tags), but some are rather special.

=
=
= ?
= ?

TRiG.smiley - smiley


Why doesn't the W3C Validator like your space?

Post 8

Jordan

"XML is case-sensitive, but there's no requirement for it to be all lower case."

smiley - blush I should have known that. I assumed that since all the XML I've seen is in lowercase, that was the way it had to be.

Yes, the ',' 'smiley - smiley' and '' tags (for example) don't map to anything particular in (X)HTML, and they're a nice addition. I've also noticed that I can use most valid (X)HTML tags, and include inline styles in my GuideML, which is cool. (If a little dangerous to IE6 users! smiley - evilgrin)

smiley - space—Jordan


Why doesn't the W3C Validator like your space?

Post 9

Jordan

Dang it! s/smiley - smiley/< smiley >/


Why doesn't the W3C Validator like your space?

Post 10

TRiG (Ireland) A dog, so bade in office

I think the parser lets anything it doesn't recognise through, so GuideML is infinitely extendable. And as long as what you've written makes sense in HTML 4.0, it is, in a sense, valid GuideML.

Or something like that.

TRiG.smiley - smiley


Why doesn't the W3C Validator like your space?

Post 11

Jordan

Well, except for the closing ''s and the like. Though I do like closing all my tags explicitly, it would save time for researchers if they could just elide, say, closing paragraph tags.

(Just tested to make sure. smiley - smiley)

smiley - space—Jordan


Why doesn't the W3C Validator like your space?

Post 12

TRiG (Ireland) A dog, so bade in office

Apparently, is correct XHTML 1.0, but, if you're declaring the page as text/html (instead of application/xml), you should put a space in, to save confusing older browsers.

TRiG.smiley - smileysmiley - geek


Key: Complain about this post

More Conversations for Jim Lynn

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more