Introduction to LBXML Operator, A C# API-Based Tool for XML Insertion, Modification, Searching, and Removal

Environment: XML

Overview

LBXML Operator is a C# API-Based XML tool that supports insertion, modification, searching, and removal on XML files. At present, there are a lot of XML parsers and XML tools. However, none of them provides APIs to complete the above functionalities when manipulation conditions become complex. Those APIs only support simple operations on XML, which do not cover complex cases in XML files. Using LBXML Operator, after specifying conditions using C# rich data structures, any particular value between tags in an XML file can be touched and returned into C# data structures.

Application Situations of LBXML Operator

XML is a standard and platform-independent data description language. With the development of the Internet and distributed technologies, XML becomes more and more important in data representation, automatic data exchanging over heterogeneous platforms, data modeling, search engines, and even data storage. When developers use XML to do this work, it is essential for them to access values between tags. Usually, XML is used to describe complex data structures. Developers are happy to manipulate XML flexibly. Under this situation, LBXML Operator is a good tool to handle those problems.

Representing Data in XML

Before we had XML, developers always needed to represent data in the plain text format. To describe those data, they had to define their own format using some special symbols, such as “|”, “#”, and so on. The plain text is not standard and can only be understood by the developers who defined it. Without proper comments, even developers themselves will forget the sense of the format after some time. With the support of XML, developers are able to throw away the nightmare because they can define meaningful tags and put corresponding values between them. By using rich XML data structures, they can represent data in a human readable format and express abundant sense.

However, it is possible that developers need to access XML files. For example, if an XML file is a configuration file, after the program starts to run, it has to read some initial data from it. At that time, LBXML Operator is needed to retrieve data from the configuration file in XML. It is also possible that some information in the configuration file will need to be updated after the program runs. LBXML Operator is a tool to modify corresponding data in the configuration file.

Data Storage

Another exciting feature that XML brings to developers is that they can store data in XML temporarily or even permanently. It is common to see that there is a large amount of initial data for a program to start up. Or, there is a great deal of output data generated while the program is running. It is a good solution to store all the data in XML files temporarily because it decreases the number of visiting databases. Sometimes, if the system capability allows or the amount of data is not so large, it is feasible to store those data into XML files instead of databases.

Therefore, it is necessary to manipulate those XML files just like accessing databases. Fortunately, LBXML Operator provides C# APIs, which are similar to SQL statements, for developers to access data in those XML files.

Automatic Data Exchanging

The most important utilization of XML is to implement data exchanging over heterogeneous platforms because XML is independent of platforms. However, data in the exchanged XML files must be processed. If this procedure is automatic, it is really beneficial to the procedure of data exchanging. Therefore, it is necessary to design a program that can access XML files flexibly and powerfully. Because LBXML Operator is a bunch of C# APIs, developers are able to program with those APIs to design an interface that is capable of processing exchanging data in XML.

Search Engine

More and more data over the Internet is described in the format of XML instead of HTML because XML is able to express much richer information than HTML. Therefore, retrieving data efficiently and effectively is an issue for developers to design a search engine over the Internet. LBXML Operator provides a bunch of APIs that are used to touch any values between any tags in XML files. To have a good search engine, such a tool must give a hand to developers.

Data Modeling

Another important utilization of XML is to describe data models. For example, in the area of E-Commerce/Business, developers and scientists describe business process models in XML. In the area of workflow management, developers describe workflows in XML. In short, in almost all kinds of areas, data models can be represented in XML.

Usually, those data models are used to drive a system to work. It is incredible that there is no such a tool as LBXML Operator to access those data models. APIs of LBXML Operator provide runtime support for developers to access data models in XML.

Existing Java-Based XML Tools

There are some similar existing Java-based XML tools. But none of them supports all the manipulations, i.e., insertion, modification, searching, and removal. The major difference between LBXML Operator and those tools is the approach to specify manipulation conditions. Those existing tools define such conditions using XQuery, XPath, or their own script languages. However, LBXML Operator depicts conditions through APIs, i.e., a particular API handles a specific case to manipulate XML files.

One major advantage to specify manipulation conditions using XQuery, XPath, or script languages is that those descriptions have standard specifications and cover all the situations on XML searching. Those situations consist of basic query, range query, Max/Mix, and so on. The searching of LBXML Operator includes basic query only. Furthermore, those tools search XML based on not only tags but also attributes. The search of the latest version of LBXML Operator is based on tags only. However, the major disadvantage of existing tools is that each of them covers only one type of manipulations. Some tools are used for searching and some are used for modification. LBXML Operator supports all the fundamental operations—insertion, modification, searching, and removal. Another disadvantage of them is that they regard XML files as tagged plain text files. Therefore, they do not utilize the important feature of XML, hierarchy. LBXML Operator views XML files as structured plain text files. An XML file is organized by a bunch of similarly structured plain text. Each of them is similar to a record in a relational database. There exist a unique key or multiple keys in each XML file. Thus, the procedure to manipulate XML files with LBXML Operator is equal to the one to manipulate records in a table of databases.

All the available XML tools are not good for developers to program over XML. First, developers have to write script languages that are separated from codes. Writing those script languages is more effort developers have to spend. However, LBXML Operator utilizes Java’s rich data structures to specify operation conditions. Developers are not required to spend extra effort on script languages except Java code. Second, the existing tools do not return results of manipulations into Java-rich data structures. Most tools result in an XML document, which is not convenient for developers to use. Results of LBXML Operator are put into Java-rich data structures, such as Hashtable, HashSet, and String, which are easy for developers to program.

Overall Approach of LBXML Operator

LBXML Operator is implemented for C# programmers because XML is applied widely in the current Internet world. XML is becoming the standard Internet data description language. There are a great number of C# programmers. Because of the feature of platform independence, C# is one of the first choices for Internet programmers. Therefore, LBXML Operator is a good candidate for those Web developers to program with C# and XML.

Features of LBXML Operator

LBXML Operator has two important features. First, LBXML Operator regards XML files as hierarchical structure-based plain text files. Each XML file consists of a series of similar structure sub-XML files and has its own keys. This architecture looks close to that of relational databases. Based on this understanding, LBXML Operator provides a bunch of APIs that manipulate XML just like SQL statements for tables in databases.

The second important feature is that the goal of LBXML Operator is to provide convenient XML tools for programmers. This tool should be compatible with a particular programming language—C#. Thus, programmers can specify operation conditions through C# data structures, manipulate XML with C# APIs, and get results in C# data structures.

Concepts Used in LBXML Operator

Some terms—tag and value, updating tag, parent tag, key tag, and sibling tag—are defined when processing XML files using LBXML Operator. Those two concepts are used to describe conditions to figure out which tag’s value is to be accessed in an XML file. They work like navigators to help LBXML Operator find the tag and then operate on it.

Tags and Their Values

When performing a certain operation on an XML file, a tag and its value are two issues that are always taken into account. In an XML file, some tags have attached values and some tags do not have any values. By default, all the operations provided by LBXML Operator are concentrated only on tags that have attached values. LBXML Operator regards the combination of a tag and its value as a basic node or unit of XML, which is different from the view of XML parsers. From the view of XML parsers, a tag and its value are looked at as two separate nodes.

Updating a Tag

Updating a tag specifies the tag whose value is accessed by LBXML Operator, which consists of operations such as insertion, modification, searching, and removal.

Key Tag

This concept is borrowed from the domain of a relational database. In each table of a relational database, there is at least one field that behaves as a key. SQL statements identify each row in a table based on the key field. The key is the field through which a row in the table is different from others. Either a unique field or multiple fields can form a key for a table.

Similarly, a particular tag can be defined as a key tag for a specific structure in an XML file. In general, an XML file represents information in hierarchical structures and each XML file is organized by such similar multiple structures. It is always found that each structure has one or more tags whose values differ the structure from others. Such a tag/tags are called key tags for the particular structure.

For example, in Listing 1, the Version tag is the key tag for the entire XML file. The Organization tag is the key tag for the structure between <SAT> and </SAT>. For the structure between <Form> and </Form>, the key tag is <Request>. The Organization, User, and Workflow tags are also key tags for the structure between <SAT> and </SAT> because the their values differ in the structure between <SAT> and </SAT> from other structures between <SAT> and </SAT>.

According to the definition of a key tag, it is known that each key tag has a value. The tag without a value is not a key tag in the LBXML operator.

Parent Tag

Because an XML file can be interpreted as a tree (DOM) structure, each tag resides in a node of a tree. Thus, a tag and its value must have different levels. If tag A is one level higher than tag B, tag A is parent of tag B. For example, in Listing 1, the NewSAT tag is the parent tag of the SAT tag. The Form tag is the parent tag of the Request tag.

Sibling Tag

Similarly to the concept of a parent tag, if tag A resides in the same level as tag B, tag A is the sibling tag of tag B. For example, in Listing 1, the Request tag is the sibling tag of the Response tag. Two sibling tags can be the same. For example, in Listing 1, between the <Vocabulary> and </Vocabulary>tags, there are two VocabularyName tags. Because those two reside in the same level of the DOM tree, they are sibling tags for each other.

ParentSibling Tag

If tag A is the parent tag of tag B and tag C is the sibling tag of tag A, tag C is the ParentSibling Tag of tag B. In Listing 1, the Vocabulary tag is the parent of the VocabularyName tag and theOrganization tag is the sibling of the Vocabulary tag, so the Organization tag is the ParentSibling Tag of the VocabularyName tag.

Limitations of LBXML Operator

LBXML Operator does not support XML manipulations based on attributes. In the future version, it is essential to add this feature because attribute values are also important to represent data. According to our experiences, all the attributes of XML tags can be converted to tags. The XML without attributes looks neat. So, we suggest XML developers use tags instead of attributes.

Another issue is the resource occupation when LBXML Operator runs. Because an entire XML file is loaded in memory when manipulating XML files, it costs a large amount of memory if the XML file is too large. It is a good idea to load a large XML file into memory part by part and combine the results after all the XML file is processed.

LBXML Operator provides powerful operations in searching and modification. Relatively, the operations of insertion and removal are limited. In the future release, it is necessary to add more APIs for insertion and removal.

Case Study

This section shows two cases to manipulate XML with LBXML Operator. The first one is to modify the value between a pair of tags and the second one is to search a value between a pair of tags. Both of them are required to specify operation conditions in advance.

changeByKeyTagKeyValueSiblingTagUpdateTagNewValue()

For the XML file that follows, sometimes it is necessary to modify a tag’s value based on both the key tags and sibling tags. For example, users would like to modify the CreditRequestNo value of the VocabularyName tag.

<?xml version="1.0"?>
<!DOCTYPE RequirementInSAT SYSTEM "requirement_in_sat.dtd" >
<RequirementInSAT>
  <Version>2.10</Version>
  <NewSAT>
    <SAT>
      <Organization>BigBug.com</Organization>
      <User>customer</User>
      <Workflow>Customer-Retailer</Workflow>
      <Form>
        <Request>OrderForm</Request>
        <Response>ReceiptForm</Response>
      </Form>
      <Vocabulary>
        <VocabularyName>OrderedNumber</VocabularyName>
        <VocabularyName>InputCreditCardNumber</VocabularyName>
      </Vocabulary>
      <Contract>
        <ContractName>WholesaleContract</ContractName>
        <ContractName>CreditCheckingContract</ContractName>
      </Contract>
    </SAT>
  </NewSAT>
  <ExistingSAT>
    <SAT>
      <Organization>RequiredCreditChecking.com</Organization>
      <User>guest</User>
      <Workflow>Customer-Creditor</Workflow>
      <Form>
        <Request>CreditRequestForm</Request>
        <Response>CreditResponseForm</Response>
      </Form>
      <Vocabulary>
        <VocabularyName>OrderedNumber</VocabularyName>
        <VocabularyName>CreditRequestNo</VocabularyName>
      </Vocabulary>
      <Contract>
        <ContractName>CreditCheckingContract</ContractName>
      </Contract>
    </SAT>
  </ExistingSAT>
</RequirementInSAT>

Listing 1. The XML file used as an example in the article

The changeByKeyTagKeyValueSiblingTagUpdateTagNewValue() method is used to handle this problem. The format of the method follows.

bool changeByKeyTagKeyValueSibling
           TagUpdateTagNewValue(String xmlFile, String keyTag,
                                String keyValue, String siblingTag,
                                String siblingValue,
                                String updateTag, String newValue)

Listing 2. The XML file used as an example in the article

For example, to change the CreditRequestNo value of the VocabularyName tag to CreditNo, the code is written as follows.

LBXMLOperator lbxmlOperator = new LBXMLOperator();
lbxmlOperator.changeByKeyTagKeyValueSibling
              TagUpdateTagNewValue("./xmlfile.xml", "Organization",
                                   "RequiredCreditChecking.com",
                                   "VocabularyName",
                                   "OrderedNumber",
                                   "VocabularyName", "CreditNo");

Listing 3. The XML file used as an example in the article

After the operation, the changed XML file is shown as follows.

<?xml version="1.0"?>
<!DOCTYPE RequirementInSAT SYSTEM "requirement_in_sat.dtd" >
<RequirementInSAT>
  <Version>2.10</Version>
  <NewSAT>
    <SAT>
      <Organization>BigBug.com</Organization>
      <User>customer</User>
      <Workflow>Customer-Retailer</Workflow>
      <Form>
        <Request>OrderForm</Request>
        <Response>ReceiptForm</Response>
      </Form>
      <Vocabulary>
        <VocabularyName>OrderedNumber</VocabularyName>
        <VocabularyName>InputCreditCardNumber</VocabularyName>
      </Vocabulary>
      <Contract>
        <ContractName>WholesaleContract</ContractName>
        <ContractName>CreditCheckingContract</ContractName>
      </Contract>
    </SAT>
  </NewSAT>
  <ExistingSAT>
    <SAT>
      <Organization>RequiredCreditChecking.com</Organization>
      <User>guest</User>
      <Workflow>Customer-Creditor</Workflow>
      <Form>
        <Request>CreditRequestForm</Request>
        <Response>CreditResponseForm</Response>
      </Form>
      <Vocabulary>
        <VocabularyName>OrderedNumber</VocabularyName>
        <VocabularyName>CreditNo</VocabularyName>
      </Vocabulary>
      <Contract>
        <ContractName>CreditCheckingContract</ContractName>
      </Contract>
    </SAT>
  </ExistingSAT>
</RequirementInSAT>

Listing 4. The XML file after the operation, changeByKeyTagKeyValueSiblingTagUpdateTagNewValue()

selectByMultipleTagsAndWhere()

selectByMultipleTagsAndWhere() is a powerful searching approach. By using this method, users can specify complex conditions to retrieve a value of a tag. The format of the method is shown as follows.

String selectByMultipleTagsAndWhere(String xmlFile,
                                    Hashtable keyTagHash,
                                    Hashtable keyValueHash)

Listing 5. The XML file used as an example in the article

There are two Hashtables in the parameters of the method. The two Hashtables are used to store complex conditions to retrieve a tag’s value. The first one, keyTagHash, is used to store key tags and the second, keyValueHash, is used to store corresponding key values. With those constraints, the method is able to retrieve the value of a particular tag exactly.

To demonstrate the utilization of the method, an XML file is shown as follows.

<?xml version="1.0"?>
<!DOCTYPE RequirementInSAT SYSTEM "requirement_in_sat.dtd" >
<RequirementInSAT>
  <Version>2.10</Version>
  <NewSAT>
    <SAT>
      <Organization>BigBug.com</Organization>
      <User>customer</User>
      <Workflow>Customer-Retailer</Workflow>
      <Contract>
        <ContractName>WholesaleContract</ContractName>
        <ContractName>CreditCheckingContract</ContractName>
      </Contract>
    </SAT>
  </NewSAT>
  <ExistingSAT>
    <SAT>
      <Organization>BigBug.com</Organization>
      <User>guest</User>
      <Workflow>Retailer-Wholesaler</Workflow>
      <Contract>
        <ContractName>CreditCheckingContract</ContractName>
      </Contract>
    </SAT>
    <SAT>
      <Organization>BigBug.com</Organization>
      <User>customer</User>
      <Workflow>Retailer-Wholesaler</Workflow>
      <Contract>
        <ContractName>CreditCheckingContract</ContractName>
      </Contract>
    </SAT>
  </ExistingSAT>
</RequirementInSAT>

Listing 6. A more complex XML file for searching example

For example, to search the value of the ContractName tag, which is underlined, selectByMultipleTagsAndWhere() is used here to deal with this case. The corresponding code is written as follows.

SelectorByMultipleTagsAndWhere selectorByMultipleTagsAndWhere =
        new SelectorByMultipleTagsAndWhere();
Hashtable keyTagHash = new Hashtable();
Hashtable keyValueHash = new Hashtable();
keyTagHash.put("0", "Organization");
keyTagHash.put("1", "User");
keyTagHash.put("2", "Workflow");
keyTagHash.put("3", "ContractName");
keyValueHash.put("0", "BigBug.com");
keyValueHash.put("1", "customer");
keyValueHash.put("2", "Retailer-Wholesaler");
keyValueHash.put("3", "?");
String contractName =
       selectorByMultipleTagsAndWhere.
       selectByMultipleTagsAndWhere("./xmlfile.xml", keyTagHash,
                                    keyValueHash);
Console.WriteLine("contractName = " + contractName);

Listing 7. The XML file used as an example in the article

The result of the above code is displayed as follows.

contractName = CreditCheckingContract

Listing 8. The XML file used as an example in the article

Appendix

  • Tool: LBXML Operator
  • DLL: com.lblabs.xmltool.dll, com.lblabs.tools.csharp.dll
  • Class: CSharpLBXMLOperator
  • Constructor: Summary LBXMLOperator()

Method Summary

Method Description
bool insertNodeByParentTag(String xmlFile, String parentTag, String updateTag, String updateValue)
bool insertNodeParentTagAndParentSibling(String xmlFile, String parentTag, String parentSiblingTag, String parentSiblingValue, String updateTag, String updateValue)
bool removeNodeByTagValue(String xmlFile, String tag, String value)
bool changeByValue(String xmlFile, String oldValue, String newValue)
bool changeByTag(String xmlFile, String tag, String newValue)
bool changeByTagNewValue(String xmlFile, String tag, String oldValue, String newValue)
bool changeByNoTagNewValue(String xmlFile, String tag, int no, String newValue)
bool changeBySiblingTagNewValue(String xmlFile, String siblingTag, String siblingValue, String newValue)
bool changeBySiblingTagUpdateTagNewValue(String xmlFile, String siblingTag, String siblingValue, String updateTag, String newValue)
bool changeByKeyTagKeyValueUpdateTagNewValue(String xmlFile, String keyTag, String keyValue, String updateTag, String newValue)
bool changeByKeyTagKeyValueSiblingTagUpdateTagNewValue(String xmlFile, String keyTag, String keyValue, String siblingTag, String siblingValue, String updateTag, String newValue)
bool changeByMultipleTagsAndWhere(String xmlFile, Hashtable keyTagHash, Hashtable keyValueHash)
String selectByKeyTag(String xmlFile, String keyTag)
String selectByTagAndWh ere(String xmlFile, String keyTag, String searchTag, String keyValue)
String selectByBelowTagAndWhere(String xmlFile, String belowKeyTag, String searchTag, String belowKeyValue)
Hashtable selectHash(String xmlFile, String hashTag)
Hashtable selectSet(String xmlFile, String setTag)
Hashtable selectByTagAndWhereForHash(String xmlFile, String keyTag, String searchTag, String keyValue)
Hashtable selectByTagAndWhereForSet(String xmlFile, String keyTag, String searchTag, String keyValue)
String selectByMultipleTagsAndWhere(String xmlFile, Hashtable keyHash, Hashtable keyValueHash)
Hashtable selectByMultipleTagsAndWhereForHash(String xmlFile, Hashtable keyHash, Hashtable keyValueHash)
Hashtable selectByMultipleTagsAndWhereForSet(String xmlFile, Hashtable keyHash, Hashtable keyValueHash)

References

  1. XPATH: http://www.w3.org/TR/xpath
  2. GMD-IPSI XQL: http://xml.darmstadt.gmd.de/xql/
  3. XSet: http://www.cs.berkeley.edu/~ravenben/xset/
  4. Fxgrep: http://www.informatik.uni-trier.de/~aberlea/Fxgrep/
  5. Quip: http://developer.softwareag.com/tamino/quip/
  6. XML:QL: http://theoryx5.uwinnipeg.ca/mod_perl/cpan-search?dist=XML-QL

Downloads


Download source code – 487 Kb

More by Author

Previous article
Next article

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Must Read