XML Legacy Migration XML Legacy Migration XML Legacy Migration XML Legacy Migration XML Legacy Migration
xml Home Forecross Corporation Xml Solutions Migration Solutions Integrity Solutions XML Links Information and News Investor Relations
wp-xml-header.gif (2025 bytes)
xml, xml encapsulation, legacy migration, ADSO, COBOL, Oracle
XML - The most important data-related standard to be introduced since SQL.

In less than three years, XML has gained the acceptance and support of virtually every type of organization that uses, sells or supports Information Technology (IT).

What is XML and what is the perceived value that has caused it to move so quickly from the standards drafting committees of the World Wide Web Consortium (W3C) to the headlines of every IT-related publication? What is the path and what are the challenges that organizations face before they can harvest meaningful value from XML? This document attempts to answer these questions.

What is XML?
XML is the acronym for eXtensible Markup Language. It is a customized subset of the Standard Generalized Markup Language (SGML) from which HTML (HyperText Markup Language) was also derived. Markup languages evolved from the process that layout designers used to pass information to typesetters. These designers would use notations ("tags") on typewritten text that would indicate to the typesetter how to layout copy. As electronic publishing systems such as word processing software and desktop publishing software evolved, they also used a type of internal markup language that is imbedded in a document and which describes how the document should be laid out. This is called procedural markup - a way of presenting the information.

Evolution of SGML and HTML
The SGML standard was first published in 1986 and included not only procedural markup standards but also descriptive markup standards. Descriptive markup describes the purpose of the information in a document, whereas procedural markup describes the appearance of that information. SGML has many positive attributes but essentially was too complicated and too versatile to be supported by browser developers and was not endorsed as a standard on the Web. This precipitated the creation of HTML.

HTML is a small subset of the SGML markup language. It provides a standard method (using markup "tags") for delivering document information over the World Wide Web (www). Because there are standard tags, different browsers like Netscape Navigator and Microsoft Internet Explorer can process information from a web site and display this information consistently on a local desktop PC.

HTML has become extremely popular because it is very simple, provides a standard but powerful way to display text and graphics, and supports flexible, powerful and easy linking to other information on the same or other web sites. HTML has been instrumental in the explosion of the Internet. But HTML has some serious limitations when something more sophisticated is desired such as the exchange of data. HTML has no descriptive markup features. Because of its simplicity and focus on formatting and linking, HTML actually removes most of the meaning (semantics) associated with any data that it is passed.

XML to the Rescue!
XML was created and defined by the World Wide Web Consortium to address many of the inadequacies of HTML. It is a meta-markup language that has descriptive markup capabilities, providing a format for describing structured data using markup tags. XML has a great many of the features of SGML except in the area of document creation. Its great strength is that it is much simpler than the full SGML specification but it is fully capable when it comes to document/data delivery. XML facilitates more precise disclosure of the semantics of data content. This precision facilitates accurate data interchange and more meaningful search results across multiple platforms. A new generation of Web-based applications oriented toward data viewing and data manipulation is being triggered because of XML.

Extensible
HTML uses a fixed set of tags, whereas in XML an unlimited set of tags can be defined. While XML provides an environment for tagging structured data, HTML tags can only be used for functions such as displaying a word in italics or bold or dictating a certain font size. The restricted set of tags accounts for many of HTML's limitations and is a main factor in the evolution of XML. An XML tag can declare its associated data to be a bond price, an interest rate, an expiration date, today's gain or loss, or any other data element. As industry-standardized XML tags are embraced there will be a corresponding capability for searching, sharing and manipulating data regardless of where it is located. Once data has been found, it can be presented by a browser in a variety of ways, or it can be passed to other applications for further processing. This is leading to true Internet-enabled applications.

The XML Value Proposition
Application integration - both intra- and inter-business - is the number one priority for virtually every IT organization as we enter the twenty-first century. Application integration includes such activities as combining and correlating previously freestanding data, uniting  previously disjointed steps or replacing manual processes with automation. Initiatives like ERP package implementations, totally electronic end-to-end equity and derivative trade processing ("Straight-Through-Processing") and electronic supply-chain management in the IT hardware industry are examples of the type of sophisticated application integration that is going on today.

"The greatest challenge to IT in the large enterprise is finding better and simpler ways of making application systems work together more effectively," says Roy Schulte of the Gartner Group. XML answers this challenge by providing standard frameworks to exchange different types of data so that the information - be it in a transaction, exchanged via an Application Program Interface (API), web automation, database portal, catalog, a workflow document or message - can be searched, decoded, manipulated, and displayed consistently and correctly.

Middleware is the Glue, XML is the Foundation
While middleware technologies provide the "glue", the foundation of application integration is data interchange. One of the primary barriers to achieving successful data interchange has been removed with the global acceptance of the Internet (and, internally, an Intranet) as a secure and reliable communication link over which data can be exchanged.

XML removes a second major barrier by ensuring that structured data will be uniform and independent of applications, platforms or vendors - and, that it is optimized for delivery over the Internet. This resulting ability to communicate between myriad software and hardware platforms is encouraging a whole new generation of business and electronic-commerce "super" applications that take advantage of the reliable, secure and effective interchange of data between applications. Those corporations that are able to develop them are viewing these super-applications as a strategic competitive advantage.

The Basics Look Good, Too
XML provides a standard that can be used for tagging ordinary documents, structured records (such as purchase orders, receivers or invoices) or data derived from a database query. Once received, XML-tagged data can be gathered from many different sources, manipulated, edited, and presented in a variety of ways without re-accessing the source(s) of the data. This enables networks and servers to become more responsive and handle larger workloads.

For example, assume that you want to reserve an airline flight and the reservation system site sends you a long list of flight information in XML format along with a program that will allow you to manipulate the data you receive. Off-line from the server, you can manipulate the data and look at it in various ways until you've made a decision about your flight. An HTML version of this approach would require constant interaction with the server to achieve these same results.

Separation of Data from Presentation and Process
Because XML tags only the data and not how the data is to be presented or processed, it allows a high degree of flexibility about how the information is going to be used. XML defines data content whereas HTML specifies how a browser should display data. With XML, it becomes possible to use physically separate style sheets using facilities such as eXtensible Style Language (XSL) and Cascading Style Sheets (CSS) to present data in a browser. These style sheets can be changed (personalized) independent of the XML data, enabling, for example, the same XML document to be presented differently on a computer screen versus a printer.

The Path and the Challenges that Lie Ahead
"The biggest barrier to application integration is data consistency," states the Gartner Group. And it turns out, the biggest barrier to the successful use of XML within an organization is also data consistency. So what is "data consistency" and how do we achieve it?

Data Consistency
Data consistency is the state of understanding the meaning - the semantics - of your data. Not just its descriptive attributes like data type, length, scale and domain of values, but also its contextual attributes such as the meaning of each valid value and how and where it is used in computations and decision-making.

In order for two parties to define an XML document (set of tags) to facilitate data exchange between them, they must first reach an agreement on data consistency - on the semantics of every tag in the document. This is the first data consistency challenge.

Industry Oriented Standards
In XML, an unlimited set of tags can be used, broad usage can only be achieved by creating XML tag and document standards that support data interchange for a particular domain of users. For internal application users, this can usually be achieved easily and rapidly. As the number of users/companies and/or the complexity of the data interchange increases, the effort and time required to design the tag and document rules increases substantially. As the designers work through the semantics of  the data that will be exchanged they need to deal with three primary areas:

  • Which tags are going to be allowed?
  • How is nesting of tagged elements going to be structured?
  • How will the tagged items be processed?

To this end, various committees are being established to develop the standards for industry-specific needs. These industry-focused standards are the key to effective business-to-business use of XML.

One example is FpML (financial products markup language). FpML is a new XML-based protocol for Internet-based electronic dealing and information sharing of financial derivatives, initially focused on interest rate and foreign exchange products. The standards committee was founded by J.P. Morgan & Co. Incorporated and Price-Waterhouse Coopers LLP and includes International Business Machines Corp. (NYSE: IBM), webMethods, Inc and Forecross Corporation on the technical committee. The FpML specification, which will be freely licensed, is expected to become a standard in the rapidly growing field of business-to-business, Internet-based integration for a range of services, from electronic trading and confirmations to portfolio specification for risk analysis.

Getting There from Legacy System Data
Once agreement is reached - either internally or within an industry - regarding the XML tags and documents to use, the major remaining challenge is to map (match) existing data with the XML tags. For most IT organizations, this will be a difficult task because of their lack of data consistency.

What is required is to find the data item(s) within a legacy system that directly or indirectly map to each tag in the XML document(s) that are to be retrieved or created. Doing this manually is difficult and error-prone for all but the smallest systems.

Automation to the Rescue!
To overcome the complexity of tag assignment and data transformation rule creation, a dictionary must be created in which all of an application's data semantics ("meta-data") are stored. This is accomplished by semantically parsing all of the source code for the application, and by analyzing all of the application data and storing the resulting meta-data in the dictionary. In addition, the target XML tags and documents, including their data semantics, are also parsed and stored in the dictionary.

Using a graphical interface supported by sophisticated analysis and query engines, each XML tag is used as a "search argument" to drill down into the dictionary meta-data to discover the legacy system data item(s) that are most likely to map to it. After performing any required research using the interface functions, the user identifies the correct data item(s) and defines any required transformation rules to finalize the mapping. These "tagging rules" are then stored in the dictionary.

Once all of the tagging rules for the required documents have been defined, it becomes possible to create the source coding required to retrieve (use) or create (publish) the target XML document(s). This can be done either manually or automatically; the key to success is data consistency in the mapping of the legacy system data to the XML tags.

L2X SmartXML
L2X SmartXML is a state-of-the-art software product from Forecross Corporation that facilitates the migration / integration of legacy applications with the Internet. Among its benefits are:

  • Reduced time to create and use industry standard XML documents
  • Increased accuracy of tag creation and data transformations
  • Improved control of XML tag creation and maintenance

Features and functions, some of which are exemplified in the figure below, include:

  • Visual confirmation of tag assignments
  • Data-transformation rule creation / maintenance
  • Generation of data interchange programs to/from XML
  • Selective or full redeployment of existing system data stores
  • Superior semantic analysis using the Sentinel Semantic Analyzer (SSA)

Forecross L2X SmartXML legacy migration xml sql cobol db2 legacy mirgation

Mr. Kim O. Jones is president and CEO of Forecross Corporation, a San Francisco, California firm which specializes in the development and use of automated Migration technologies and associated methodologies.

(c)1999 Forecross Corporation

Back to Top of Page.

Forecross is a registered trademark of Forecross Corporation.
Copyright © 1996-2008 Forecross Corporation
All Rights Reserved.