XML Entities

June 3rd, 2009

Entities can be thought of as a type of placeholder. When an XML processor encounters an entity in a data stream, it is required to replace the placeholder with the ‘target’ content indicated by the entity. Different types of entity are recognized by XML. Examples include:

• Predefined entities. These allow the five special characters used in XML markup, <, >, &, ‘, and ” to be included as normal characters in content rather than as indicators of markup. For example, the symbol ‘<’ cannot be used within content, as it would signal the beginning of the content element’s end tag. Therefore its corresponding entity ‘&lt;’ is used instead.

In this first example, the following: <example>Using the < entity in content</example> would cause an error, as the processor would be expecting the matching end tag label as soon as it encounters the ‘<’ in the text. The following however, would process normally: <example>Using the &lt; entity in content</example> and the ‘&lt;’ be preserved and understood as representing the character ‘<’ when needed later.

• User-defined entities. These, as their name suggests, permit a user to specify a ‘shorthand’ to signify a target piece of text, markup or mixture of both. This is particularly useful when there is much repetition within a document, such as a particularly long or oft-used name or title, or when reference needs to be made to a particular text indirectly using a reference, for example for security reasons, or to protect personal data.
Supposing that the entity is defined – we’ll see how and where later – we could have a document that includes: <proceedings>In the case against &X; little evidence has been… where the entity reference ‘&X;’ would only be replaced at processing/ If the definition and resolution of the reference are kept secure, the actual value of the information referred to can be made available only when and how you wish.

XML Attributes

May 27th, 2009

The start tags of an element can optionally contain any number of attributes. By convention these usually store values for reference or processing purposes, or indicate specific properties of a given information content. Attributes come in ‘name/value’ pairs. There is often only a very fine distinction between the use of an element or an attribute – this issue is examined in detail later on.

We could add attributes to our recipe:

<recipe>
<step n°="1">Gently fry 2 finely sliced
<ingredient ref="0687">onions</ingredient> in
<quantity measure="metricWeight">30</quantity>
<measure>g</measure> of
<ingredient ref="0688">unsalted butter</ingredient> for <time measure="
informal">several minutes</time>. Add the
<ingredient ref="0689">sausage meat</ingredient>,
<ingredient ref="0690">chopped liver</ingredient> and
<ingredient ref="0691">cooked chestnuts</ingredient>
...
</step>
...
</recipe>

Here we can see that the elements are the same: the two documents are made up of the same semantic building blocks. However, the properties of some of the element content are different, hence there are different values for the attributes.

XML Elements

May 25th, 2009

An XML document is a continuous and sequential stream of characters. This stream is punctuated with ‘markup’, consisting of:
• Elements: the basic, labelled information container that we have seen already
• Attributes: reference or other control information
• Comments: unprocessed text that serves only human readers
• Processing instructions or PIs: as an ‘escape mechanism’ from the stream, allowing non-XML processing or manipulation of non-XML content
• Entities: placeholders for a block of text, set of rules, or even entire XML-documents that are defined outside the character stream

We have already encountered the first two: the following sections explain them in more detail.

Elements At its most primitive, an XML document contains text enclosed in any number of perfectly-nested elements. Such containment within elements is indicated to human and machine processor alike with a pair of tags.

<recipe>
<step>Gently fry 2 finely sliced
<ingredient>onions</ingredient> in
<quantity>30g</quantity> of
<ingredient>unsalted butter</ingredient> for
<time>several minutes</time>. Add the
<ingredient>sausage meat</ingredient>,
<ingredient>chopped liver</ingredient> and
<ingredient>cooked chestnuts</ingredient>
...
<step>
...
</recipe>

The tag label can currently only contain only ASCII alphanumeric and a handful of other characters, while any Unicode character can be used elsewhere in an XML document. There is however an initiative under way within the W3C to change this, the ‘Blueberry’ initiative.

Why choose XML?

May 23rd, 2009

XML is not a technology, but rather a standard that serves as a powerful medium for describing, communicating and implementing a true information management strategy. It is a computing standard that does not – or should not – belong to the IT specialists, although their support is vital, as we will see. Its basic concepts – that allow you clearly to define and express business entities and terminology – are close to management concerns. Its expressiveness is of immense value to the growing field of information architecture.

As a potentially powerful and expressive management medium, therefore, XML, and the family of standards around it, is too important a business asset to be treated as merely a technical issue. The expressiveness of XML-based systems will come from their ability to associate real meaning to simple text and thus facilitate knowledge management. If we accept that knowledge is an organized progression and aggregation of data and information, we must come to terms with what this implies, particularly as more and more data and information are available only in electronic form.

Computer programming languages put processing – actions – at the centre of their concerns. XML, on the other hand, is and should remain centred on what can be processed: content – objects . Processing is only a means to an end. An enterprise’s information – the content, whether text, data or information in all its guises – is often an end in itself. It is an increasingly valuable business asset, and XML can help manage it wisely and profitably. XML is not a programming language: it is a powerful standard that allows you to package and label your objects in such a way that they can be processed in whatever way is most to your advantage.