In
less than three years, XML has gained the acceptance and support of virtually every type
of organization that uses, sells or supports Information Technology (IT).
What is XML and
what is the perceived value that has caused it to move so quickly from the standards
drafting committees of the World Wide Web Consortium (W3C) to the headlines of every
IT-related publication? What is the path and what are the challenges that organizations
face before they can harvest meaningful value from XML? This document attempts to answer
these questions.

XML is the acronym for eXtensible Markup Language. It is
a customized subset of the Standard Generalized Markup Language (SGML) from which HTML
(HyperText Markup Language) was also derived. Markup languages evolved from the process
that layout designers used to pass information to typesetters. These designers would use
notations ("tags") on typewritten text that would indicate to the typesetter how
to layout copy. As electronic publishing systems such as word processing software and
desktop publishing software evolved, they also used a type of internal markup language
that is imbedded in a document and which describes how the document should be laid out.
This is called procedural markup - a way of presenting the information.
Evolution
of SGML and HTML
The SGML standard was first published in 1986 and included not only procedural markup
standards but also descriptive markup standards. Descriptive markup describes the purpose
of the information in a document, whereas procedural markup describes the appearance of
that information. SGML has many positive attributes but essentially was too complicated
and too versatile to be supported by browser developers and was not endorsed as a standard
on the Web. This precipitated the creation of HTML.
HTML is a small
subset of the SGML markup language. It provides a standard method (using markup
"tags") for delivering document information over the World Wide Web (www).
Because there are standard tags, different browsers like Netscape Navigator and Microsoft
Internet Explorer can process information from a web site and display this information
consistently on a local desktop PC.
HTML has become
extremely popular because it is very simple, provides a standard but powerful way to
display text and graphics, and supports flexible, powerful and easy linking to other
information on the same or other web sites. HTML has been instrumental in the explosion of
the Internet. But HTML has some serious limitations when something more sophisticated is
desired such as the exchange of data. HTML has no descriptive markup features. Because of
its simplicity and focus on formatting and linking, HTML actually removes most of the
meaning (semantics) associated with any data that it is passed.
XML
to the Rescue!
XML was created and defined by the World Wide Web Consortium to address many of the
inadequacies of HTML. It is a meta-markup language that has descriptive markup
capabilities, providing a format for describing structured data using markup tags. XML has
a great many of the features of SGML except in the area of document creation. Its great
strength is that it is much simpler than the full SGML specification but it is fully
capable when it comes to document/data delivery. XML facilitates more precise disclosure
of the semantics of data content. This precision facilitates accurate data interchange and
more meaningful search results across multiple platforms. A new generation of Web-based
applications oriented toward data viewing and data manipulation is being triggered because
of XML.
Extensible
HTML uses a fixed set of tags, whereas in XML an unlimited set of tags can be defined.
While XML provides an environment for tagging structured data, HTML tags can only be used
for functions such as displaying a word in italics or bold or dictating a certain font
size. The restricted set of tags accounts for many of HTML's limitations and is a main
factor in the evolution of XML. An XML tag can declare its associated data to be a bond
price, an interest rate, an expiration date, today's gain or loss, or any other data
element. As industry-standardized XML tags are embraced there will be a corresponding
capability for searching, sharing and manipulating data regardless of where it is located.
Once data has been found, it can be presented by a browser in a variety of ways, or it can
be passed to other applications for further processing. This is leading to true
Internet-enabled applications.

Application integration - both intra- and inter-business
- is the number one priority for virtually every IT organization as we enter the
twenty-first century. Application integration includes such activities as combining and
correlating previously freestanding data, uniting previously disjointed steps or
replacing manual processes with automation. Initiatives like ERP package implementations,
totally electronic end-to-end equity and derivative trade processing
("Straight-Through-Processing") and electronic supply-chain management in the IT
hardware industry are examples of the type of sophisticated application integration that
is going on today.
"The greatest
challenge to IT in the large enterprise is finding better and simpler ways of making
application systems work together more effectively," says Roy Schulte of the Gartner
Group. XML answers this challenge by providing standard frameworks to exchange different
types of data so that the information - be it in a transaction, exchanged via an
Application Program Interface (API), web automation, database portal, catalog, a workflow
document or message - can be searched, decoded, manipulated, and displayed consistently
and correctly.
Middleware is the Glue, XML is the Foundation
While middleware technologies provide the "glue", the foundation of
application integration is data interchange. One of the primary barriers to achieving
successful data interchange has been removed with the global acceptance of the Internet
(and, internally, an Intranet) as a secure and reliable communication link over which data
can be exchanged.
XML removes a second
major barrier by ensuring that structured data will be uniform and independent of
applications, platforms or vendors - and, that it is optimized for delivery over the
Internet. This resulting ability to communicate between myriad software and hardware
platforms is encouraging a whole new generation of business and electronic-commerce
"super" applications that take advantage of the reliable, secure and effective
interchange of data between applications. Those corporations that are able to develop them
are viewing these super-applications as a strategic competitive advantage.
The
Basics Look Good, Too
XML provides a standard that can be used for tagging ordinary documents,
structured records (such as purchase orders, receivers or invoices) or data derived from a
database query. Once received, XML-tagged data can be gathered from many different
sources, manipulated, edited, and presented in a variety of ways without re-accessing the
source(s) of the data. This enables networks and servers to become more responsive and
handle larger workloads.
For example, assume
that you want to reserve an airline flight and the reservation system site sends you a
long list of flight information in XML format along with a program that will allow you to
manipulate the data you receive. Off-line from the server, you can manipulate the data and
look at it in various ways until you've made a decision about your flight. An HTML version
of this approach would require constant interaction with the server to achieve these same
results.
Separation
of Data from Presentation and Process
Because XML tags only the data and not how the data is to be presented or processed, it
allows a high degree of flexibility about how the information is going to be used. XML
defines data content whereas HTML specifies how a browser should display data. With XML,
it becomes possible to use physically separate style sheets using facilities such as
eXtensible Style Language (XSL) and Cascading Style Sheets (CSS) to present data in a
browser. These style sheets can be changed (personalized) independent of the XML data,
enabling, for example, the same XML document to be presented differently on a computer
screen versus a printer.

"The biggest barrier to application integration is
data consistency," states the Gartner Group. And it turns out, the biggest barrier to
the successful use of XML within an organization is also data consistency. So what is
"data consistency" and how do we achieve it?
Data
Consistency
Data consistency is the state of understanding the meaning - the semantics - of your data.
Not just its descriptive attributes like data type, length, scale and domain of values,
but also its contextual attributes such as the meaning of each valid value and how and
where it is used in computations and decision-making.
In order for two
parties to define an XML document (set of tags) to facilitate data exchange between them,
they must first reach an agreement on data consistency - on the semantics of every tag in
the document. This is the first data consistency challenge.
Industry
Oriented Standards
In XML, an unlimited set of tags can be used, broad usage can only be achieved by creating
XML tag and document standards that support data interchange for a particular domain of
users. For internal application users, this can usually be achieved easily and rapidly. As
the number of users/companies and/or the complexity of the data interchange increases, the
effort and time required to design the tag and document rules increases substantially. As
the designers work through the semantics of the data that will be exchanged they
need to deal with three primary areas:
- Which tags are
going to be allowed?
- How is nesting of tagged elements going to be
structured?
- How will the tagged items be processed?
To this end, various
committees are being established to develop the standards for industry-specific needs.
These industry-focused standards are the key to effective business-to-business use of XML.
One example is
FpML (financial products markup language). FpML is a new XML-based protocol for
Internet-based electronic dealing and information sharing of financial derivatives,
initially focused on interest rate and foreign exchange products. The standards committee
was founded by J.P. Morgan & Co. Incorporated and Price-Waterhouse Coopers LLP and
includes International Business Machines Corp. (NYSE: IBM), webMethods, Inc and Forecross
Corporation on the technical committee. The FpML specification, which will be freely
licensed, is expected to become a standard in the rapidly growing field of
business-to-business, Internet-based integration for a range of services, from electronic
trading and confirmations to portfolio specification for risk analysis.
Getting
There from Legacy System Data
Once agreement is reached - either internally or within
an industry - regarding the XML tags and documents to use, the major remaining challenge
is to map (match) existing data with the XML tags. For most IT organizations, this will be
a difficult task because of their lack of data consistency.
What is required is to find
the data item(s) within a legacy system that directly or indirectly map to each tag in the
XML document(s) that are to be retrieved or created. Doing this manually is difficult and
error-prone for all but the smallest systems.
Automation
to the Rescue!
To overcome the complexity of tag assignment and data transformation rule
creation, a dictionary must be created in which all of an application's data semantics
("meta-data") are stored. This is accomplished by semantically parsing all of
the source code for the application, and by analyzing all of the application data and
storing the resulting meta-data in the dictionary. In addition, the target XML tags and
documents, including their data semantics, are also parsed and stored in the dictionary.
Using a graphical interface
supported by sophisticated analysis and query engines, each XML tag is used as a
"search argument" to drill down into the dictionary meta-data to discover the
legacy system data item(s) that are most likely to map to it. After performing any
required research using the interface functions, the user identifies the correct data
item(s) and defines any required transformation rules to finalize the mapping. These
"tagging rules" are then stored in the dictionary.
Once all of the tagging rules
for the required documents have been defined, it becomes possible to create the source
coding required to retrieve (use) or create (publish) the target XML document(s). This can
be done either manually or automatically; the key to success is data consistency in the
mapping of the legacy system data to the XML tags.

L2X SmartXML is a state-of-the-art software product from
Forecross Corporation that facilitates the migration / integration of legacy applications
with the Internet. Among its benefits are:
- Reduced time to create and use industry standard XML
documents
- Increased accuracy of tag creation and data
transformations
- Improved control of XML tag creation and
maintenance
Features and functions, some of which are exemplified
in the figure below, include:
- Visual confirmation of tag assignments
- Data-transformation rule creation /
maintenance
- Generation of data interchange programs
to/from XML
- Selective or full redeployment of existing
system data stores
- Superior semantic analysis using the
Sentinel Semantic Analyzer (SSA)

Mr. Kim O. Jones is
president and CEO of Forecross Corporation, a San Francisco, California firm which
specializes in the development and use of automated Migration technologies and associated
methodologies.
(c)1999 Forecross Corporation |