How To Implement an XML Schema

In this article, we are going to look into how one can create an XML Schema. In order to understand XML (extensible markup language) Schemas we will review the basics of XML in this introduction. You are expected to have a basic understanding of XML for this how-to.

XML - The extensible markup language is a text-based cross-platform language that is enables you to store data (like addresses in an address book) in a structured manner. The XML document is expected to have correct syntax. Therefore, when making XML documents they should be well-formed. A well-formed document has the following:

  • Closed tags (<hello></hello> or <hi />) 
  • An element's attribute must be enclosed in double quotes (<candy price="0.50">)
  • XML is case sensitive. In other words the beginning and ending tag should use the same case
A very simple example of an XML document is given below for a candy store menu.

<?xml version="1.0" encoding="utf-8"?>
<candystore>
 <candy>
    <productName company="XYZ">Lollipops</name>
    <id>634847</id>
    <price>0.50</price>
</candy>
 <candy>
    <productName company="ABC">Mints</name>
    <id>634812</id>
    <price>0.75</price>
</candy>
</candystore>

The first statement is the processing instruction. It tells the parser we are working with a particular version of XML. The rest of the document is composed of our tags and elements. In the example above the element "name" has an attribute of "company". A great way to remember an attribute is to think of it as a property that something may have. A candy has a name. The name is associated with a particular company.

In our how-to we are going to take the example above and create an XML Schema for it which will define rules for how our element's content, or data in other words, must be given. (For example, what format is expected for our id? Can it have alphabetical characters?)

XML Schema - An XML Schema declares metadata which assigns a particular type, whether it be simple or complex, to the elements and their attributes in an XML document. 

Step 1

Let's take a look again at our example for the candy store.

<?xml version="1.0" encoding="utf-8"?>
<candystore>
 <candy>
    <productName company="XYZ">Lollipops</name>
    <id>634847</id>
    <price>0.50</price>
</candy>
 <candy>
    <productName company="ABC">Mints</productName>
    <id>634812</id>
    <price>0.75</price>
</candy>
</candystore>

 Our candystore element has two candy sub-elements. Each candy element has the subelements of name, id, and price. It also has an attribute called company. When building rules of syntax we must define constraints for the format of our data. These constraints may be based on how the data will be used. If, for example, our candy's ID must fit on a product invoice within 6 characters then that will help determine our constraints. Ask yourself a few questions about your data:

1. Does the data have to be only digits or alphabetical letters?
2. Does the data have to be a certain length? Or a minimum or maximum length more specifically?
3. Does the data have to have a particular attribute or subelement specified? Or are they optional?

Now that we have some ideas on what questions we can ask ourselves, we can move forward. Let's build a set of rules before we start creating our XML Schema code.

1. The name of candy is required.
2. The company name can be no more than 30 characters long. It is optional.
3. The id must be all digits and must be exactly 6 digits long. It is required.
4. The price is required.

Step 2

With our list of rules we can now create our schema. First of all we need to provide a declaration of our namespace called xsd. We could use any name we want like candyxsd, but xsd is short and easy to understand. The term xsd will be used as a prefix throughout our schema to specify our namespace.

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

Within your schema you need to define each element as a particular type. There are 2 types you can use:

Simple:
<xsd:element name="nameofelement" type="nameofbuilt-insimpletype">

or

<xsd:simpleType name="nameofSimpleType">
   derived type constraints...
</xsd:simpleType>

Complex:
<xsd:complexType name="nameofComplexType">
   <xsd:sequence>
    elements...
    </xsd:sequence>
</xsd:complexType>

The difference between the 2 is quite simple. A simple type cannot have attributes or element content.

In step 3 we will go over the built in simple types.

Step 3

Now that you are familiar with the element types, let's look at a couple of the built-in types.

  • date 
  • string
  • decimal
  • time
  • integer, long, int, positiveInteger, double, float

You can see the full list of built-in simple types.

With a built-in type we can create a simple type like price or id which does not have an attribute or a subelement.

<xsd:element name="price" type="xsd:decimal" />
<xsd:element name="id" type="positiveInteger"/>

Step 4

The picture given illustrates the structure of our XML Schema. We need to build your complex types and add in constraints on size for our price and id. To make sure our id and the company attribute have a particular format we are going to create 2 simple types. We will assign id the type of idType. The attribute company will be assigned the type of companyType.

    <xsd:simpleType name="idType">
        <xsd:restriction base="xsd:positiveInteger">
            <xsd:totalDigits value="6"/>
        </xsd:restriction>
    </xsd:simpleType>

    <xsd:simpleType name="companyType">
        <xsd:restriction base="xsd:string">
            <xsd:maxLength value="30"/>
        </xsd:restriction>
    </xsd:simpleType>

The restriction tag within the simple type element allows us to declare information regarding our simple type's facet. A facet constrains a simple type. We can state things like how the total digits for a numerical value to the minimum length for our string value. Be sure to refer to the built-in simple types facets for the list. The base attribute specifies what type restrictions can be applied. In other words, it would not be valid to use maxLength with all numerical values like decimal or double. The totalDigits facet is for that.

Step 5

Now you need to build your complex types which will contain all the others. If you recall from earlier, your complex types can have sub-elements and attributes. Our candystore element has the candy sub-element. The candy element has three sub-elements (productName, price, and id). The productName element has an attribute, therefore it needs to be built to accomodate it and its value.

Our candy store element contains everything:

    <xsd:element name="candystore" type="candystoreType"/>
    <xsd:complexType name="candystoreType">
        <xsd:sequence>
            <xsd:element name="candy" type="candyType" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>

First of all we declare a candystore element that is a complexType. If you will recall our candy element can be repeated as many times as desired. That is where our attributes minOccurs and maxOccurs come in. They specify the maximum and minimum occurances. Say for example you only wanted it only once or not at all. Then minOccurs could still be 0 and maxOccurs would be 1.Our candy sub-element is declared as a type called candyType. This one is quite large.

    <xsd:complexType name="candyType">
        <xsd:sequence>
            productName element will go here... Explained below...
            <xsd:element name="id" type="idType" minOccurs="1" maxOccurs="1"/>
            <xsd:element name="price" type="xsd:decimal" minOccurs="1" maxOccurs="1"/>
        </xsd:sequence>
    </xsd:complexType>

Do you remember the simpleType information for our id? Here it refers to its type which was called idType.

Here is the productName element which was cut out from above so it could be explained.

<xsd:element name="productName" minOccurs="1" maxOccurs="1">
                <xsd:complexType>
                    <xsd:simpleContent>
                        <xsd:extension base="xsd:string">
                            <xsd:attribute name="company" type="companyType"/>
                        </xsd:extension>
                    </xsd:simpleContent>
                </xsd:complexType>
            </xsd:element>

Within the element tags of productName we declare a complexType. We have to be able to put in a string value for the productName element so we need to declare a simpleContent type. This will allow us to have restrictions or extensions on our text based complexType. If we did not take these steps then we would not be able to put the text for the name of the candy between the productName tags. In our case we want to be able to put in a string value. Also, we have an extension for our attribute called company that needs to be declared. The base attribute for the extension element above specifies our built-in data type (i.e. string, long, etc.), a simpleType, or complexType. Once we put it all together we are done!

 Make sure your XSD file has this at the top:

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

And the xsd:schema tage should be closed with </xsd:schema> at the very end after everything.

Take your completed XSD file and name it candystore.xsd. It is valid and well-formed.

Step 6

Our final step is to apply our XML Schema to our original XML file.

candystore.xml

<?xml version="1.0" encoding="UTF-8"?>
<candystore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="candystore.xsd">
    <candy>
        <productName company="XYZ">Lollipops</productName>
        <id>634847</id>
        <price>0.50</price>
    </candy>
    <candy>
        <productName company="ABC">Mints</productName>
        <id>634812</id>
        <price>0.75</price>
    </candy>
</candystore>

candystore.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
        <xsd:element name="candystore" type="candystoreType"/>
    <xsd:complexType name="candystoreType">
        <xsd:sequence>
            <xsd:element name="candy" type="candyType" minOccurs="0" maxOccurs="unbounded"/>
        </xsd:sequence>
    </xsd:complexType>
    
    <xsd:complexType name="candyType">
        <xsd:sequence>
            <xsd:element name="productName" minOccurs="1" maxOccurs="1">
                <xsd:complexType>
                    <xsd:simpleContent>
                        <xsd:extension base="xsd:string">
                            <xsd:attribute name="company" type="companyType"/>
                        </xsd:extension>
                    </xsd:simpleContent>
                </xsd:complexType>
            </xsd:element>
            <xsd:element name="id" type="idType" minOccurs="1" maxOccurs="1"/>
            <xsd:element name="price" type="xsd:decimal" minOccurs="1" maxOccurs="1"/>
        </xsd:sequence>
    </xsd:complexType>
    
    <xsd:simpleType name="idType">
        <xsd:restriction base="xsd:positiveInteger">
            <xsd:totalDigits value="6"/>
        </xsd:restriction>
    </xsd:simpleType>
    <xsd:simpleType name="companyType">
        <xsd:restriction base="xsd:string">
            <xsd:maxLength value="30"/>
        </xsd:restriction>
    </xsd:simpleType>
</xsd:schema>

Note: Make sure your entire XML schema is within the <xsd:schema> </xsd:schema> tags.

 

Share this article!

Follow us!

Find more helpful articles: