When we mention a typo, that's usually a polite euphemism for an error of spelling or perhaps punctuation - although just because someone happens to key in 'teh' that doesn't necessarily mean that they don't know how to spell 'the'. But to a professional typesetter there is much more to typesetting than spelling and punctuation alone, and there are far more opportunities for making a typographical error (which is what 'typo' is short for).
Until the advent of the word processor (WP), typesetting was a highly skilled job for specialists. But computer typesetting has largely made the old skills, and the people who spent many years acquiring them, redundant. Now the WP program for the personal computer has brought non-experts into contact with the typesetter's arcane terminology.
Many of the terms still in use in the computer age are relics of an era when the Web was not World Wide but meant a roll of paper (or, in papermaking, a wire mesh), and typesetters would have to work with molten lead (or 'hot metal'), silk screens, and other strange processes.
Here's a guide to some of the most commonly encountered terms. It must be borne in mind that in matters of grammar, punctuation, layout and style, while there are certain rules that should always be observed there are also 'grey areas'. There are many different opinions, often varying not only from one Anglophone country to another but also between different publishers in the same country. The accepted norms are also liable to change over time. If you are writing for publication you should always check the publisher's preferences or style guide.
Typeface or Font
Deriving from an old word for the casting of metal (as in a 'foundry'), a font is a set of characters of a particular design and size - a typeface. A character could be a letter, a numeral, a punctuation mark, a symbol, or one of several kinds of space. Some fonts have little bits called serifs added on at the ends of the letters, which seem to help to lead the eye forward through the text.
Typesetters would use different fonts for different jobs, and each font would be stored neatly away in drawers until needed. These drawers were called cases, and for each font there would be two cases, one stored above the other. The upper case would contain capitals, the lower case the small letters. Today we still call capital letters 'upper case' and small letters 'lower case'. Typesetters abbreviate these to uc and lc, though they also refer to capitals as 'caps'.
In engineering, pitch refers to the distance between items which are usually evenly spaced, like the cogs on a wheel or the thread of a screw. In typesetting it refers to the density of characters on a line.
Early typewriters would usually have a pitch of either 10 or 12 characters per inch. Such machines would be described as having a fixed pitch, which meant that each character would have the same width, usually one-tenth or one-twelfth of an inch. On a page of typescript produced by such a machine all the characters would line up vertically as well as horizontally, which is particularly useful for columns of figures, but looks rather stiff and regimented if it is a novel or a news report.
The reason it doesn't look so pleasing to the eye is that in handwriting, different letters have different widths. They vary from narrow letters such as 'i' to wide ones such as 'm'. Capitals are usually wider than small letters, a full stop is tiny, and so on.
More advanced typewriter designs, and computers, are able to take account of these differences and produce text with a much more pleasing and 'professional' appearance by employing what is called 'proportional spacing'. With this system the width allocated to each character depends on which character it is. This means that it is no longer useful to measure the number of characters per inch, since this will vary from one word to the next.
So What's the Point?
The point (abbreviated to 'pt') is simply a measurement - there were originally 72 points to the inch. In the USA and UK a point is 0.351 millimetres, whereas for some reason in continental Europe it is 0.376 mm.
Rather confusingly, in typesetting a point is also another name for a full stop (also known as a full point, or period) or a decimal point.
Font sizes are described in points: the higher the point size, the larger the printed characters. The size of a font is measured from the top of an ascender (eg the vertical line of the letter 'b') to the bottom of a descender (eg the 'tail' of the letter 'p'). In modern English and in most modern fonts, however, there is no single letter that has both an ascender and a descender, so it is not always very easy to measure the point size of a font!
The 'x height' of a font is, as you might expect, the height of the lower case letter x1 (the same as all letters without ascender or descender). The bottom of the x, and of all the letters in this sentence (since none of them has a descender) sits on what is called the 'baseline'. Letters with descenders extend below the baseline, as do some punctuation marks and - in some fonts - some of the numerals.
Capitals are usually the same height as letters with an ascender, but of course appreciably wider. Small capitals are x height.
Above and Below
As well as the tops and tails of letters that need them, the space above and below the 'x' line can be used for what are rather grandly called, respectively, superscript and subscript characters.
Examples of subscripts are punctuation marks such as the comma ( , ) and scientific symbols such as the 2 in the chemical symbol for water - H2O. Examples of superscripts are the quotation mark ( ' ) and exponents such as the 2 in the abbreviation for a square centimetre - cm2.
Twelve points make a pica (abbreviated to 'p'), which is therefore one-sixth of an inch. Rather confusingly, however, 'pica' can also refer to a fixed pitch of 10 characters to the inch, and is the name of a font of that pitch. By the way, pica is an old word for a magpie!
Some Font Attributes
Roman type is the technical term for normal, ordinary, upright type, whatever the font. This sentence is in roman type.
Italic type slopes to the right. It can be used for emphasis, and is also typically used to indicate the title of a book, a film, or other major work, and for foreign words. For further information on using italic type see English Usage in the Edited Guide under 'Using Italics', and also see the note under 'Using Capital Letters'.
Gothic type is derived from the angular style of handwriting with broad vertical downstrokes used in western Europe from the 13th Century. An example is Fraktur, the German style of black-letter type.
Bold type is a thick or 'heavy' version of a typeface, and is used for headings. It is not a good idea to sprinkle it too liberally within text, however, as it does tend to jump off the page at you. Some fonts include different 'weights', such as medium bold and extra bold. Medium bold, in those fonts that have it, can be useful to emphasise words without having them hit you in the eye. Extra bold would probably be used for screaming headlines.
In the h2g2 Edited Guide, Bold is normally used only for mini-headings, as in this section.
Underline or underscore is not often used any more, though it sometimes crops up in textbooks, and of course is still useful in manuscript and on typewriters that do not have bold or italic.
And of course the blobs used in lists such as the one just above are called bullets, and come in many different shapes and sizes.
Leading (it rhymes with 'heading', not 'pleading') is something that is usually dealt with automatically by a typewriter or WP program, so you might never even be consciously aware of it. A good WP program will, however, allow you to adjust the leading. This can be very useful at times, so it's good to know about it.
If you have ever tried to read anything where the lines were jammed up tight against each other you will know how difficult and tiring that can be. Leading is simply extra space between lines of type, to help the eye move from the end of one line to the beginning of the next. In the old days a typesetter would insert a thin strip of lead after each line of type.
Leading is measured in points. A 10 pt font, for example, would typically have 3 pts of leading between the lines, and would be known in the trade as '10 on 13'. More leading would be used with larger fonts, so for example a 15 pt font might have 4 pts of leading and be known as '15 on 19'.
Careful leading adjustments can improve the appearance of a page, and can also be used to create a better fit.
In a business letter, for example, many people just put an extra line feed or 'hard return' at the end of each paragraph. But have you ever got to the end of a letter, or even a paragraph, only to find that the very last line doesn't fit and starts a new page? Very annoying, especially if all that is in the last line is your name below the signature space. In typesetting, such a line is called a widow (see below).
You don't really need a whole extra blank line between paragraphs, however. So instead of pressing the Enter key twice to get the extra line feed, you can adjust the leading to give you, say, an extra 3 pts of leading when you press Enter once. Unless the letter consisted of just one page-long paragraph, that would give you extra space at the foot for your name. It would also almost certainly improve the appearance of the whole page.
Widows and Orphans
In typesetting, a widow is the last line of a paragraph, or a short line of text, overflowing to the top of the next page. An orphan is a heading or the first line of a paragraph or verse at the foot of a page. A good typesetter will usually seek to avoid leaving widows and orphans.
Messing about with the leading to achieve a desired result is known as feathering. In the example above, instead of using extra leading on hard return, you could slightly reduce the overall leading, or line spacing, to give you the extra room. Don't overdo it, though, or it won't look very good.
You can also use feathering if you are working on a page with columns and you insist on having all the columns exactly the same depth. But here again, the result might not be as pleasing to the eye as you might think. You might need to experiment.
Are You Fully Justified?
Justifying type means adjusting it so that it fills a space evenly, or lines up at one or both side margins.
Left justified means that the lines generally start at the left-hand margin, or at a fixed distance from it, but end at different points at or before the right-hand margin. Strictly speaking, this should be referred to as 'aligned left' or 'ranged left'. Most of this entry is ranged left.
Right justified (or aligned or ranged right) would mean all the lines ending at the right-hand margin, but with a ragged left-hand edge.
Centred text would mean that the centre of each line coincides with the centre of the page, but both left- and right-hand edges would be ragged. Long text passages that are centre-justified are not very easy to read, so centring is usually best confined to text with short lines and generous amounts of leading. Examples include headings and short passages, also menus, invitations, business cards, flyers and similar material.
Fully justified text is where all lines start at the left-hand margin and end at the right-hand margin2. Long used in print, where it could be manually set out by a highly skilled typesetter, this became extremely popular for general and business use in the earlier days of WP documents and web pages, since computers could perform full justification automatically. Its popularity later declined, however. It will perhaps come into and out of vogue at various times.
Full justification is usually achieved by subtle adjustments to the spaces between words. It probably works best where word breaks are allowed. Although word breaks brings their own problems, full justification without word breaks can result in some lines being very tightly packed while others are excessively spread out.
Kerning, Letter Spacing, and Ligatures
Speaking of words being spread out brings us neatly to kerning, which is concerned with adjusting the spacing between letters. Kerning can be used to achieve a better fit between slanted or overhanging letters.
It is sometimes desirable to pad out a line very subtly, by adding a very thin 'hair space' between letters. This also improves the appearance of a line - perhaps a heading - that is all in capitals. Conversely, there are occasions when it might be good to reduce very slightly the space between letters, perhaps to avoid a word break at the end of the line.
Ligatures are pairs of letters (or, very occasionally, three letters) that are joined together. One example that can be displayed here is: Æ
When a word at the end of a line is too long to fit on that line, either the whole word must be carried over to the next line or the word must be divided between two successive lines, creating a 'word break'. It's best to avoid word breaks whenever possible. But when space is tight, for example, or perhaps when text is set in narrow columns, word breaks might be needed. In such cases, try to avoid having word breaks at the ends of more than two consecutive lines - it doesn't look good.
In English, at least, there are no firm rules which tell you where best to break a word. It's true that there are special dictionaries, used by typesetters but also generally available. These give spellings and word breaks (they don't give the meanings or derivations of the words). But the dictionaries do not agree among themselves. For example, one dictionary might prefer to break a word according to its sound - eg, 'diag-nostic' - while another suggests a break according to etymology - 'dia-gnostic'.
WP programs sometimes have their own ideas about where to break words, but should allow you to override their choice. If they work according to an algorithm which says, for instance, that it's OK to break after prefixes such as 'dis-' or 'pre-', you might get some bizarre results (eg, 'dis-hcloth', 'pre-aches'), so beware! Some classic howlers to avoid are 'reap-pear', 'leg-end', and 'the-rapist', and there are many more.
Mind Your Ms and Ns
Whereas a pica is a fixed-width space (one-sixth of an inch, remember) irrespective of the font being used, the width of other spaces will vary according to the font. An em space, or an em dash (which is a dash of the same width as an em space, and is sometimes called an em rule) is based on the width of the capital M in the font being used.
The em space, or simply the em, can be used as a unit of measure when specifying layout. For example, you could specify that paragraphs are to be indented two ems. In GuideML, using the <BLOCKQUOTE> tag indents the paragraph four ems.
The em dash can be used as punctuation. Some matters of punctuation are the subject of endless (and sometimes heated!) debate, and the suggestions here are offered only as suggestions.
One common usage is to replace commas or parentheses (brackets) either side of a phrase, as in, for example:
This is how—if you like—you could use a pair of em dashes.
The em dash could also be used instead of an ellipsis (...) to indicate something unfinished:
She exclaimed, 'What the— !'
The en space and the en dash are nominally half the width of an em. When used parenthetically the en dash, unlike the em dash, usually has a space either side of it:
This is how – if you like – you could use a pair of en dashes.
The en dash can also be used, without flanking spaces, to replace the word 'to' in expressions such as:
There were about 20–25 people there.
I caught the New York–Boston flight.
Both human typesetters and computers will normally assume that it's OK to start a new line where there is a space. Usually that's fine, but there are certain circumstances where you wouldn't want to risk that happening. In such cases you need to indicate that a space is 'hard' or non-breaking.
For example, some publications use a space rather than a comma as the thousands separator, so they would want to print 10 000 rather than 10,000. It would be very odd and perhaps confusing if the 10 came at the end of the line and the 000 was on the next line, so a non-breaking space is needed to make sure that the whole of the number appears on the same line.
Similarly, some publishers like to insert a space between the quantity and the unit, as in for instance 50 km. Again, a line break at the space is not what is wanted, so a non-breaking space must be used.
You'll remember that full justification is done by adjusting the width of the spaces, but in the above examples you would want to keep the 10 and the 000, or the 50 and the km, not only on the same line but also not too far away from each other. So you would need a space that is not only non-breaking but also of fixed width for the point size used. In practice it is usually preferable to use a fixed-width space which is a bit less than the normal spacing between words. This is called a 'thin space'.
Non-breaking Spaces in HTML and GuideML
Finally, it is worth mentioning that the non-breaking space can be very useful in HTML and GuideML, especially in tables. It can prevent unwanted or unsightly line breaks within a table cell, and give greater control over the final appearance of the table3. Use it with caution, however, since if overused it can result in over-long lines.
Printers' Symbols and Their Origins
In the fullness of time a number of symbols that printers use have become modified, and sometimes their original meaning has too. Here are a few.
At - the @ sign used to be known as the Commercial At. It is used to indicate a unit price, as in '2 reams @ 3s 9d per quire'. It is just a quick way of writing the word 'at' without lifting pen from paper. More recently, of course, it is used in an email address to indicate that what follows is a domain name.
Question mark - the ? was originally written as a small 'q' over a small 'o'. Gradually the 'q' changed into the curly bit, and the 'o' became a dot under it. The 'q' and 'o' are the first and last letters of the Latin quaestio, question.
Exclamation mark - the ! was originally written as a capital 'I' over a small 'o', making the Latin word Io, an exclamation of joy or wonderment.
Paragraph symbol - the ¶ is a stylised form of the letter P, reversed.
Section symbol - the § is a double letter 'S'.
Asterisk - the * comes from the Greek asteriskos, small star. It was originally used to draw attention to something unusual or striking. These days it is used to indicate the presence of a footnote, or to show that something is missing.
Dagger - the † is also known as an obelisk or obelus (also from Greek). It was originally used to mark something that was spurious, doubtful, or obsolete. These days it is used to indicate either the presence of a footnote (if the * has already been used) or, particularly after a person's name, that the person is deceased.
Double dagger - the ‡ is used for a third footnote.
Hash - the # is the number sign. Probably a corruption of 'hatch', the symbol represents a cross-hatching in which the digits 1 to 9 can be written4.
Ampersand - the & means 'and'. 'Ampersand' is a corruption of the phrase 'and per se and', meaning the symbol '&' in itself means 'and'. The symbol derives from a stylised form of Et, the Latin word for 'and' - in some fonts you can see more clearly how it was derived from an E and a t. It was invented by Marcus Tullius Tiro in the 1st Century AD. He was Cicero's editor and scribe, and devised the Tironian shorthand system. The Latin et cetera (meaning 'and other things') can be shortened to &c.