This is a Journal entry by Skankyrich [?]
- 1
- 2
index, follow
Skankyrich [?] Started conversation Oct 19, 2009
This is going to be quite a dull bit of news for most people, but I've just noticed that the robots tag on all h2g2 pages now says ''.
This, in theory, means that we should get nudged rather higher in search results, as the whole site is indexed and all the internal links between Entries come in to play. And, as conversation forums will be indexed by Google from now on, we're going to get conversation threads starting to appear in Google searches much more frequently.
I wonder if this is the first step in getting the search facility fixed/improved? If not, it will be a hell of a lot easier to design a custom Google search box that works.
index, follow
Gnomon - time to move on Posted Oct 19, 2009
Yay!
It would certainly be great if they would turn on the "Search conversations" function - it's a long time since they turned it off.
index, follow
Skankyrich [?] Posted Oct 19, 2009
It does seem that conversations are being indexed already. I googled 'what can we blame 2legs for' and the Ask thread came up, having last been indexed a week ago. My recent journals have been indexed, too.
It would have been interesting to know before the change was made, though. When I suggested the nofollow be dropped, I offered to select 50 Edited Entries at random and track their placings in search rankings to see if there was any tangible benefit, but they never got back to me. A little too geeky, perhaps
index, follow
Secretly Not Here Any More Posted Oct 19, 2009
Does mean we'll now be a target for link farmers though. A dofollow site on a BBC domain that lets you specify anchor text? It's a dream.
index, follow
Skankyrich [?] Posted Oct 19, 2009
This string will search h2g2 threads for the term 'whatever' if you plug it into Google:
site:www.bbc.co.uk/dna/h2g2 inurl:?thread= whatever
The 'site' part restricts the search to h2g2 only, while the 'inurl' section only searches for conversations (which all have the term '?thread=' in the url).
I've made cutom Google search boxes in the past, and I think it would be fairly simple to create one with those specific filters.
index, follow
Skankyrich [?] Posted Oct 19, 2009
It's not quite going to be as easy as I thought, because it seems that custom Google search engines won't let you use a inurl: filter. That seems really silly, but there you go.
I think I can see how to do it on a webpage, but it's getting a bit late in the day to experiment. Watch this space.
index, follow
You can call me TC Posted Oct 20, 2009
If they revive the "Search Conversations" they shouldn't forget to include the subject line of the conversations as well.
When it was working, I tried out a search for an unusual-ish word which occurred hundreds of times in the subject lines of a thread, but, as it never ever occurred in the posting, that thread did not appear in the search. Can't remember what it was for the moment, but it was a very well-known one.
index, follow
Elentari Posted Oct 20, 2009
I'm not techy enough to really understand this, but people who are techy enough say it's a good thing for the site, so yay.
index, follow
Skankyrich [?] Posted Oct 20, 2009
The next question is: will the site ever actually get fully indexed, and if so how long will it take?
There are six million Guide Entries onsite, plus I don't know how many coversations. I also don't know whether each individual posting is a separate page or each conversation is essentially one long or short page. But that's a hell of a lot of pages to be indexed - don't forget that just because every page on h2g2 *can* be found and indexed, it doesn't mean it *will* be.
Essentially, each page on the site has to be seen by a search engine spider before it will appear in search results. A conversation attached to the bottom of a Front Page entry might be indexed very quickly because it is very visible; a conversation between two long-Elvised Researchers might never be indexed, simply because it's so far from the spider's entry point that it will never actually get there.
Although I think the tag was only changed in the last couple of weeks (judging from the rate of indexing against the amount that have been indexed), it's going to take a very long time before we start to see the site indexed to any meaningful level.
index, follow
Skankyrich [?] Posted Oct 20, 2009
Incidentally, Google is indexing the site considerably more quickly than Yahoo.
index, follow
Gnomon - time to move on Posted Oct 20, 2009
According to the <./>Info</.> page on h2g2, there are 10,000 Edited Entries and 220,000 Unedited ones.
index, follow
TRiG (Ireland) A dog, so bade in office Posted Oct 20, 2009
This is one reason why creating your own search engine can be more efficient than relying on the standard internet search engines: you know your own data. You know how it's structured. Google doesn't.
I went from my conversations list to a conversation: F19585?thread=7012862&show=20&skip=0#pi1.
Then I went to the conversation list (forum): F19585?thread=7012862.
Then back to the conversation: F19585?thread=7012862.
Then to the entry the conversation is attached to: A148907.
And on to the conversation list (forum): F19585?thread=143046.
I went to the bottom of the page, and clicked the little blue down arrow on the bottom post to get to page two of the conversation: F19585?thread=143046&post=1343890#p1343890
And then I clicked the "next list" button to get onto page three of the conversation: F19585?thread=143046&skip=40&show=20.
And we haven't even mentioned the different skins!
And that's just the automatically generated links, coded into the skins themselves. People will post all kinds of other links. Whenever you post to a conversation, you are taken to a URL with post=[your new post id] in it. There will be twenty of these for each page of the conversation. If you drop such a link elsewhere in the site, Google will find and index that too.
There's no reason why Google shouldn't find each post fifty times or so. And that makes searching inefficient. It's nice to have, but a search which understands the database would be more efficient.
Or, Can we please have conversation search back!
TRiG.
index, follow
Skankyrich [?] Posted Oct 20, 2009
Gnomon, I've never understood the numbering system for Entries. Why are the last two on the info page A58597726 and A58593180 rather than two consecutive numbers?
index, follow
Malabarista - now with added pony Posted Oct 20, 2009
It's similar to an ISBN, the last digit is a check digit so not all numbers are possible. I think.
index, follow
Skankyrich [?] Posted Oct 20, 2009
Ah! I know why. It's because you have to be able to be to perform some mathematical calculation on the digits to make 42, or it isn't a true Guide Entry.
Key: Complain about this post
- 1
- 2
index, follow
- 1: Skankyrich [?] (Oct 19, 2009)
- 2: Gnomon - time to move on (Oct 19, 2009)
- 3: Skankyrich [?] (Oct 19, 2009)
- 4: TRiG (Ireland) A dog, so bade in office (Oct 19, 2009)
- 5: Secretly Not Here Any More (Oct 19, 2009)
- 6: Skankyrich [?] (Oct 19, 2009)
- 7: Malabarista - now with added pony (Oct 19, 2009)
- 8: Skankyrich [?] (Oct 19, 2009)
- 9: You can call me TC (Oct 20, 2009)
- 10: Elentari (Oct 20, 2009)
- 11: Skankyrich [?] (Oct 20, 2009)
- 12: Skankyrich [?] (Oct 20, 2009)
- 13: Gnomon - time to move on (Oct 20, 2009)
- 14: TRiG (Ireland) A dog, so bade in office (Oct 20, 2009)
- 15: Skankyrich [?] (Oct 20, 2009)
- 16: Malabarista - now with added pony (Oct 20, 2009)
- 17: Skankyrich [?] (Oct 20, 2009)
- 18: Titania (gone for lunch) (Oct 20, 2009)
- 19: Baron Grim (Oct 20, 2009)
- 20: Galaxy Babe - eclectic editor (Oct 20, 2009)
More Conversations for Skankyrich [?]
Write an Entry
"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."