How to create a Document Type Definition file for XML

In this article, we are going to look into how one can create a DTD. In order to understand XML (extensible markup language) DTD files we will review the basics of XML in this introduction. You are expected to have a basic understanding of XML for this how-to.

In this article, we are going to look into how one can create a DTD. In order to understand XML (extensible markup language) DTD files we will review the basics of XML in this introduction. You are expected to have a basic understanding of XML for this how-to.

XML - The extensible markup language is a text-based cross-platform language that is enables you to store data (like addresses in an address book) in a structured manner. The XML document is expected to have correct syntax. Therefore, when making XML documents they should be well-formed. A well-formed document has the following:

  • Closed tags (<hello></hello> or <hi />) 
  • An element's attribute must be enclosed in double quotes (<candy price="0.50">)
  • XML is case sensitive. In other words the beginning and ending tag should use the same case

A very simple example of an XML document is given below for a candy store menu.

<candystore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <candy>
        <productName company="XYZ">Lollipops</productName>
        <id>634847</id>
        <price>0.50</price>
    </candy>
    <candy>
        <productName company="ABC">Mints</productName>
        <id>634812</id>
        <price>0.75</price>
    </candy>
</candystore>

The first statement is the processing instruction. It tells the parser we are working with a particular version of XML. The rest of the document is composed of our tags and elements. In the example above the element "name" has an attribute of "company". A great way to remember an attribute is to think of it as a property that something may have. A candy has a name. The name is associated with a particular company.

In our how-to we are going to take the example above and create a DTD for it which will define rules for how our element's content, or data in other words, must be given. (For example, what format is expected for our id? Can it have alphabetical characters?)

DTD (Document Type Definition) - A DTD, like an XML Schema, declares metadata which assigns a particular type, whether it be simple or complex, to the elements and their attributes in an XML document. 

Step 1:

Let's take a look again at our example for the candy store.

<candystore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <candy>
        <productName company="XYZ">Lollipops</productName>
        <id>634847</id>
        <price>0.50</price>
    </candy>
    <candy>
        <productName company="ABC">Mints</productName>
        <id>634812</id>
        <price>0.75</price>
    </candy>
</candystore>

Our candystore element has two candy sub-elements. Each candy element has the subelements of name, id, and price. It also has an attribute called company. When building rules of syntax we must define constraints for the format of our data. These constraints may be based on how the data will be used. If, for example, our candy's ID must fit on a product invoice within 6 characters then that will help determine our constraints. Ask yourself a few questions about your data:

1. Does the data have to be only digits or alphabetical letters?
2. Does the data have to be a certain length? Or a minimum or maximum length more specifically?
3. Does the data have to have a particular attribute or subelement specified? Or are they optional?

Now that we have some ideas on what questions we can ask ourselves, we can move forward. Let's build a set of rules before we start creating our XML Schema code.

1. The name of candy is required.
2. The company name can be no more than 30 characters long. It is optional.
3. The id must be all digits and must be exactly 6 digits long. It is required.
4. The price is required.

Step 2:

With our list of rules we can now create our document type definition. Let's review the basics.

The best thought process in creating a DTD is to think of creating a table in a database. We know what is necessary from step 1 such the name of the candy to the price. With this list we can define our elements. Elements in a DTD file are defined as follows:

<!ELEMENT elementname (content-type or content-model)>

  • elementname specifies the name of the element
  • Content-type or content-model specifies whether the element contains textual data or other elements

 Our elements can be one of three types: empty, unrestricted, or container.

Empty elements have no content, (name or numerical price, etc) and they are marked up as <empty-element/>. 

<!ELEMENT storemanager EMPTY>

Unrestricted elements are naturally the opposite of the empty element. They can contain any element declared somewhere else in the DTD file. Container elements contain character data and other elements.

Before we specify our elements, we need to be introduced to some symbols used in DTD files:

  • ? - The element occurs zero or one time.
  • , - This can be used to specify multiple children.
  • | - The pip symbol is used as an OR statement. This value or that value is acceptable.
  • + - There is a minimum of one occurance.
  • * -The element occurs zero or more times.

<!ELEMENT productName (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT candystore ((candy+))>
<!ELEMENT candy ((productName, id, price))>

After our elements we have the text PCDATA. That basically stands for character data. Our element candystore has the child element of candy. We use the + symbol since we can have more than one. The element candy has the child elements of productName, id, and price. These are contained in paranthesis after candy. One of our elements, productName, has an attribute. We will have to declare that as well. 

<!ATTLIST elementname attributename valuetype [attributetype] [“default”]>

The attributename valuetype [attributetype] ["default"] section is repeated as often as necessary to create multiple attributes for an element.

<!ATTLIST productName company CDATA #REQUIRED>

Step 3:

Now let's build our DTD file.

First of all in a DTD file we need a processing instruction.

<?xml version="1.0" encoding="UTF-8"?>

 Next we need to specify the productName element and its attribute.

<!ELEMENT productName (#PCDATA)>
<!ATTLIST productName company CDATA #REQUIRED>

Now we can specify our other elements:

<!ELEMENT price (#PCDATA)>
<!ELEMENT id (#PCDATA)>

Next is our candystore container with its child element candy:

<!ELEMENT candystore ((candy+))>

 Finally, we need to specify the attribute of candystore. The attribute is actually one that might throw you off. But remember, XML expects you to follow syntax.Recall the candystore element line in the XML file:

<candystore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

We need to define this in our DTD as an attribute:

<!ATTLIST candystore
    xmlns:xsi CDATA #FIXED "http://www.w3.org/2001/XMLSchema-instance"
>

Finally, we give the element candy with its children attributes:

<!ELEMENT candy ((productName, id, price))>

Step 4:

candystore.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE candystore SYSTEM "candystore.dtd">
    <candy>
        <productName company="XYZ">Lollipops</productName>
        <id>634847</id>
        <price>0.50</price>
    </candy>
    <candy>
        <productName company="ABC">Mints</productName>
        <id>634812</id>
        <price>0.75</price>
    </candy>
</candystore>

candystore.dtd

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT productName (#PCDATA)>
<!ATTLIST productName company CDATA #REQUIRED>
<!ELEMENT price (#PCDATA)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT candystore ((candy+))>
<!ATTLIST candystore
    xmlns:xsi CDATA #FIXED "http://www.w3.org/2001/XMLSchema-instance"
>
<!ELEMENT candy ((productName, id, price))>