This is a Journal entry by Skankyrich [?]

index, follow

Post 1

Skankyrich [?]

This is going to be quite a dull bit of news for most people, but I've just noticed that the robots tag on all h2g2 pages now says ''.

This, in theory, means that we should get nudged rather higher in search results, as the whole site is indexed and all the internal links between Entries come in to play. And, as conversation forums will be indexed by Google from now on, we're going to get conversation threads starting to appear in Google searches much more frequently.

I wonder if this is the first step in getting the search facility fixed/improved? If not, it will be a hell of a lot easier to design a custom Google search box that works.


index, follow

Post 2

Gnomon - time to move on

Yay!

It would certainly be great if they would turn on the "Search conversations" function - it's a long time since they turned it off.


index, follow

Post 3

Skankyrich [?]

It does seem that conversations are being indexed already. I googled 'what can we blame 2legs for' and the Ask thread came up, having last been indexed a week ago. My recent journals have been indexed, too.

It would have been interesting to know before the change was made, though. When I suggested the nofollow be dropped, I offered to select 50 Edited Entries at random and track their placings in search rankings to see if there was any tangible benefit, but they never got back to me. A little too geeky, perhaps smiley - smiley


index, follow

Post 4

TRiG (Ireland) A dog, so bade in office

That's brilliant news!

TRiG.smiley - geeksmiley - cool


index, follow

Post 5

Secretly Not Here Any More

Does mean we'll now be a target for link farmers though. A dofollow site on a BBC domain that lets you specify anchor text? It's a dream.


index, follow

Post 6

Skankyrich [?]

This string will search h2g2 threads for the term 'whatever' if you plug it into Google:

site:www.bbc.co.uk/dna/h2g2 inurl:?thread= whatever

The 'site' part restricts the search to h2g2 only, while the 'inurl' section only searches for conversations (which all have the term '?thread=' in the url).

I've made cutom Google search boxes in the past, and I think it would be fairly simple to create one with those specific filters.


index, follow

Post 7

Malabarista - now with added pony

Oooh, yay! That is good new smiley - biggrin

Though I'm not sure the journals should be available on google. smiley - doh


index, follow

Post 8

Skankyrich [?]

It's not quite going to be as easy as I thought, because it seems that custom Google search engines won't let you use a inurl: filter. That seems really silly, but there you go.

I think I can see how to do it on a webpage, but it's getting a bit late in the day to experiment. Watch this space.


index, follow

Post 9

You can call me TC

If they revive the "Search Conversations" they shouldn't forget to include the subject line of the conversations as well.

When it was working, I tried out a search for an unusual-ish word which occurred hundreds of times in the subject lines of a thread, but, as it never ever occurred in the posting, that thread did not appear in the search. Can't remember what it was for the moment, but it was a very well-known one.


index, follow

Post 10

Elentari

I'm not techy enough to really understand this, but people who are techy enough say it's a good thing for the site, so yay. smiley - smiley


index, follow

Post 11

Skankyrich [?]

The next question is: will the site ever actually get fully indexed, and if so how long will it take?

There are six million Guide Entries onsite, plus I don't know how many coversations. I also don't know whether each individual posting is a separate page or each conversation is essentially one long or short page. But that's a hell of a lot of pages to be indexed - don't forget that just because every page on h2g2 *can* be found and indexed, it doesn't mean it *will* be.

Essentially, each page on the site has to be seen by a search engine spider before it will appear in search results. A conversation attached to the bottom of a Front Page entry might be indexed very quickly because it is very visible; a conversation between two long-Elvised Researchers might never be indexed, simply because it's so far from the spider's entry point that it will never actually get there.

Although I think the tag was only changed in the last couple of weeks (judging from the rate of indexing against the amount that have been indexed), it's going to take a very long time before we start to see the site indexed to any meaningful level.


index, follow

Post 12

Skankyrich [?]

Incidentally, Google is indexing the site considerably more quickly than Yahoo.


index, follow

Post 13

Gnomon - time to move on

According to the <./>Info</.> page on h2g2, there are 10,000 Edited Entries and 220,000 Unedited ones.


index, follow

Post 14

TRiG (Ireland) A dog, so bade in office

This is one reason why creating your own search engine can be more efficient than relying on the standard internet search engines: you know your own data. You know how it's structured. Google doesn't.

I went from my conversations list to a conversation: F19585?thread=7012862&show=20&skip=0#pi1.

Then I went to the conversation list (forum): F19585?thread=7012862.

Then back to the conversation: F19585?thread=7012862.

Then to the entry the conversation is attached to: A148907.

And on to the conversation list (forum): F19585?thread=143046.

I went to the bottom of the page, and clicked the little blue down arrow on the bottom post to get to page two of the conversation: F19585?thread=143046&post=1343890#p1343890

And then I clicked the "next list" button to get onto page three of the conversation: F19585?thread=143046&skip=40&show=20.

And we haven't even mentioned the different skins!

And that's just the automatically generated links, coded into the skins themselves. People will post all kinds of other links. Whenever you post to a conversation, you are taken to a URL with post=[your new post id] in it. There will be twenty of these for each page of the conversation. If you drop such a link elsewhere in the site, Google will find and index that too.

There's no reason why Google shouldn't find each post fifty times or so. And that makes searching inefficient. It's nice to have, but a search which understands the database would be more efficient.

Or, Can we please have conversation search back!

TRiG.smiley - geek


index, follow

Post 15

Skankyrich [?]

Gnomon, I've never understood the numbering system for Entries. Why are the last two on the info page A58597726 and A58593180 rather than two consecutive numbers?


index, follow

Post 16

Malabarista - now with added pony

It's similar to an ISBN, the last digit is a check digit so not all numbers are possible. I think.


index, follow

Post 17

Skankyrich [?]

Ah! I know why. It's because you have to be able to be to perform some mathematical calculation on the digits to make 42, or it isn't a true Guide Entry.


index, follow

Post 18

Titania (gone for lunch)

/lurk

smiley - laugh

smiley - lurk


index, follow

Post 19

Baron Grim

heheh smiley - laughsmiley - towel

smiley - cheerssmiley - pggb


index, follow

Post 20

Galaxy Babe - eclectic editor

smiley - headhurtsI'm with Elentarismiley - ok


Key: Complain about this post