The start tags of an element can optionally contain any number of attributes. By convention these usually store values for reference or processing purposes, or indicate specific properties of a given information content. Attributes come in ‘name/value’ pairs. There is often only a very fine distinction between the use of an element or an attribute – this issue is examined in detail later on.
We could add attributes to our recipe:
Gently fry 2 finely sliced
onions in
30
g of
unsalted butter for . Add the
sausage meat,
chopped liver and
cooked chestnuts
…
…
Here we can see that the elements are the same: the two documents are made up of the same semantic building blocks. However, the properties of some of the element content are different, hence there are different values for the attributes.
An XML document is a continuous and sequential stream of characters. This stream is punctuated with ‘markup’, consisting of:
• Elements: the basic, labelled information container that we have seen already
• Attributes: reference or other control information
• Comments: unprocessed text that serves only human readers
• Processing instructions or PIs: as an ‘escape mechanism’ from the stream, allowing non-XML processing or manipulation of non-XML content
• Entities: placeholders for a block of text, set of rules, or even entire XML-documents that are defined outside the character stream
We have already encountered the first two: the following sections explain them in more detail.
Elements At its most primitive, an XML document contains text enclosed in any number of perfectly-nested elements. Such containment within elements is indicated to human and machine processor alike with a pair of tags.
Gently fry 2 finely sliced
onions in
30g of
unsalted butter for
. Add the
sausage meat,
chopped liver and
cooked chestnuts
…
…
The tag label can currently only contain only ASCII alphanumeric and a handful of other characters, while any Unicode character can be used elsewhere in an XML document. There is however an initiative under way within the W3C to change this, the ‘Blueberry’ initiative.
XML is not a technology, but rather a standard that serves as a powerful medium for describing, communicating and implementing a true information management strategy. It is a computing standard that does not – or should not – belong to the IT specialists, although their support is vital, as we will see. Its basic concepts – that allow you clearly to define and express business entities and terminology – are close to management concerns. Its expressiveness is of immense value to the growing field of information architecture.
As a potentially powerful and expressive management medium, therefore, XML, and the family of standards around it, is too important a business asset to be treated as merely a technical issue. The expressiveness of XML-based systems will come from their ability to associate real meaning to simple text and thus facilitate knowledge management. If we accept that knowledge is an organized progression and aggregation of data and information, we must come to terms with what this implies, particularly as more and more data and information are available only in electronic form.
Computer programming languages put processing – actions – at the centre of their concerns. XML, on the other hand, is and should remain centred on what can be processed: content – objects . Processing is only a means to an end. An enterprise’s information – the content, whether text, data or information in all its guises – is often an end in itself. It is an increasingly valuable business asset, and XML can help manage it wisely and profitably. XML is not a programming language: it is a powerful standard that allows you to package and label your objects in such a way that they can be processed in whatever way is most to your advantage.