XML and Friends
Created | Updated Jun 14, 2003
Coming Soon! This is still a work-in-progress so please, no comments until it's in Peer Review. ~ OwlofDoom |
XML (Extensible Markup Language) is a language1 used to mark bodies of text with information about their meaning. It was developed by the World Wide Web Consortium (W3C), initially to meet the demands of large-scale online publishing and, because of its flexibility, has been used for many different purposes since, on and off the World Wide Web.
A Short History of XML
In the early 1970s, a language for marking up text with useful information about its structure and meaning (known as semantic information) was emerging. This language came to be known as the Standard Generalized Markup Language (SGML) and was standardised as ISO 8879 in 1986.
In 1993, the World Wide Web came on the scene, and a variant of SGML known as Hypertext Markup Language (HTML) was used to mark up text documents for presentation and linking within the Web.
By the mid 1990s, it was evident that HTML was not a permanent solution to problems of marking up huge bodies of text (and the World Wide Web is a huge body of text). The language had no strict formal guidelines to follow, leading to badly-written code and inconsistencies between browsers; it concentrated too much on presentation, rather than meaning, and was only able to target a limited number of applications (mostly Web browsers) - not allowing the users to deal with the content in a way appropriate to them.
In 1998, the first XML Specification was announced. This proposed a language that was backwards-compatible with SGML, but was also much stricter, with only a few rules to learn. Documents could be marked up with any semantic information the author wished, and then processed by machines, either to be presented in a Web browser, or utilised in an infinite number of other ways (for example, the Mozilla web browser stores its bookmarks in an XML document, and the layout of the toolbars is also saved in XML documents).
The name XML comes from a contraction of the language's full name, Extensible Markup Language. The name was chosen from a number of alternatives, including MGML for Minimal Generalized Markup Language, because it made the language sound free and unrestricted. The 'X' has since become a symbol for most of the W3C's Extensible Technologies, including XSL, XHTML and XLink, discussed later.
The Structure of XML
The structure of XML is what makes it so inviting as a language. As you will see, the whole language can be discussed and understood very easily. An XML document consists of four major constructs; each is discussed below.
Elements
Elements are the basic building block of XML. An element typically consists of a start tag, content and an end tag. The content can be other elements and/or character data. The start tag may contain attributes, and the end tag may be omitted if the element is empty (that is, it contains no content); if so, the start tag must end with a forward-slash (/) to signify that there is no end tag. Tags are delimited by the less-than (<) and greater-than (>) symbols. The following are all well-formed (see later) XML elements:
- <body id="hello">Hello, world!</body>
- <body><name>hello</name><content>Hello, world!</content></body>
- <body id="hello" content="Hello, world!" />
Attributes
Attributes are qualities of an element type that can vary.