How To Use an XML Parser Module

Parsing XML Document Using DOM in VC++

eXtensible Markup Language (XML) is the industry standard for exchanging information. XML allows users to define their own tags. These tags are then used to define the data. For example, <author>Syed Feroz Zainvi</author>.

An XML document has hierarchical structure. At the top, there is a document root followed by other nodes, which may in turn have further nodes. There may be attributes and data associated with these nodes.

XML parsers are tools used to read or write to an XML document. These parsers are based on either DOM (Document Object Model) or SAX (Simple API for XML) standards. A comparison of DOM and SAX approaches is given in many books and on websites as well.

In this article, you will learn how to read and extract information from an XML document in VC++ application. It is assumed that you have basic experience of using Microsoft Visual Studio 6.0 (MSIDE) for creating VC++ apps.

  1. Download latest MSXML SDK from http://www.microsoft.com/. Double click the installer to install the downloaded SDK. Installation will create a folder starting with MSXML in your set path like C: or under C:\Program Files. This folder will contain sub-folders inc and lib, containing libraries and header files required for application development. Also, msxmlX.dll,msxmlXr.dll -- where X is version (latest being 6) -- will be copied to C:\Windows\System32 directory.
  2. Create a new project using MSIDE. It can be of any type - MFC Application, Win32 Console app, etc.
  3. Next create a header file and a source file to put your code of parsing XML.
  4. Include MSXML headers and libraries either manually or automatically. Manually this can be done by adding installation path and libraries under respective tabs of the dialog that opens through Project Settings menu of MSIDE. It is better to do this automatically by just adding following two lines of code in your header file. #import <msxml6.dll> named_guids
    using namespace MSXML2;
  5. Also, in your header file put following declarations: IXMLDOMDocumentPtr m_plDomDocument;
    IXMLDOMElementPtr m_pDocRoot;

    void DisplayChildren(IXMLDOMNodePtr pParent);
    void DisplayChild(IXMLDOMNodePtr pChild);
    void DisplayAttributes(IXMLDOMNodePtr pChild,string parent,string &apname);
    void InitialiseXMLCOM(CString fomFileName);

    bool loaded;

  6. And since string is being used, add following lines also to your header file: #include "string"
    using namespace std;
  7. After these project settings, you need to write code for loading an XML document. Microsoft COM (Component Object Model) module is used. Although I am putting comments before using COM function, interested readers can consult COM documentation elsewhere. loaded = true;
    //for initializing COM module
    ::CoInitialize(NULL);

    //create an instance of XML document
    HRESULT hr = m_plDomDocument.CreateInstance(L"Msxml2.DOMDocument.6.0");

    //check an instance creation failed and display the error before aborting
    if (FAILED(hr))
    {
    _com_error er(hr);
    AfxMessageBox(er.ErrorMessage());
    EndDialog(1);
    }

    // convert xml file name string to something COM can handle (BSTR)
    _bstr_t bstrFileName;
    bstrFileName = strFileName.AllocSysString();

    // call the IXMLDOMDocumentPtr's load function to load the XML document
    variant_t vResult;
    vResult = m_plDomDocument->load(bstrFileName);

    //if loading is successful then get root element of the document
    if (((bool)vResult) == TRUE) // success!
    {
    // now that the document is loaded, we need to initialize the root pointer
    m_pDocRoot = m_plDomDocument->documentElement;

    //Following function recursively traverse the XML tree for children, and sibling nodes
    DisplayChildren(m_pDocRoot);
    }
    else
    {
    AfxMessageBox("Document FAILED to load!");
    }

  8. Now, its time to traverse the tree contained in XML document. Following functions do this recursively.
    void DisplayChildren(IXMLDOMNodePtr pParent)
    {
    // display the current node's name
    DisplayChild(pParent);

    // simple for loop to get all children
    for (IXMLDOMNodePtr pChild = pParent->firstChild;
    NULL != pChild;
    pChild = pChild->nextSibling)
    {
    // for each child, call this function so that we get
    // its children as well
    DisplayChildren(pChild);
    }
    }

    void DisplayChild(IXMLDOMNodePtr pChild)
    {
    IXMLDOMElementPtr pElem1,pElem2;
    string cIDs,apname;

    //check if its a child node. Second condition shows how you can check name of the node
    if(NODE_ELEMENT == pChild->nodeType && (pChild->nodeName)==_bstr_t("Somename"))
    {
    HRESULT hr1 = pChild->QueryInterface(MSXML2::IID_IXMLDOMElement,(void **)&pElem1);
    if(FAILED(hr1)) {_com_raise_error(hr1);}

    cIDs = static_cast<string>((_bstr_t)(pElem1->getAttribute(_T("name"))));

    //Process all its siblings recursively and extract their attributes
    for (IXMLDOMNodePtr pChild1 = pChild->firstChild ;
    NULL != pChild1;
    pChild1 = pChild1->nextSibling )
    {
    DisplayAttributes(pChild1,cIDs,apname);

    }

    }

    }

  9. For each node, extract its attributes. Following function uses same calls as the function above. void DisplayAttributes(IXMLDOMNodePtr pChild, string cid, string &cIDs)
    {
    IXMLDOMElementPtr pElem;

    if(NODE_ELEMENT == pChild->nodeType && (pChild->nodeName == _bstr_t("somename1")||pChild->nodeName == _bstr_t("attribute")))
    {
    HRESULT hr1 = pChild->QueryInterface(MSXML2::IID_IXMLDOMElement,(void **)&pElem);
    if(FAILED(hr1)) {_com_raise_error(hr1);}

    cIDs=static_cast<string>((_bstr_t)(pElem->getAttribute(_T("name"))));
    }
    else (//some different conditions like different node name
    {
    }
    }

  10. Interested users can explore creating DOM dynamically, querying XML document using XPath, editing, and saving to XML document.

 

Share this article!

Follow us!

Find more helpful articles: