| 1.6 XML Schema and Data
Integrity B2B e-commerce is the ultimate
challenge in program-to-program data sharing. Implementing XML as the data exchange
mechanism addresses a critical part of the problem with its loosely coupled architecture,
but another critical part remains: data integrity. Successful general-purpose data
exchange requires absolute data integrity.
Data integrity must be considered at 4 levels:
- Physical layer - hardware
- Logical layer - basic data attributes
- Context layer - where and how used
- Semantic layer - the meaning of the data
The physical layer is usually taken for granted, ensuring
that parity errors and other failures at the hardware level cannot contaminate our data.
The logical layer requires programmers and database analysts to coordinate each data item
definition with regard to its basic attributes, such as type (numeric, alpha, date, etc.),
size, format, scale, valid values, and etc. Getting the logical layer right has been an
on-going challenge since the advent of computers, but diligence combined with solid tools
and procedures generally makes a failure at this layer an unusual event at most sites. The
context layer is the focus of data integrity issues at most sites. This involves
referential integrity among data files and database tables, data flow between modules, and
where and how the data items are used. Given that the logical layer is under control, this
is the layer where most data related failures will occur. The semantic layer is seldom the
focus of data integrity concerns, but this will change with a vengeance as B2B data
exchange becomes com-mon. When all data is exchanged internally, there may be differences
in the meaning of a given data item's definition, but it is readily resolvable when issues
arise because both sides of the exchange are entirely under one roof. This will not be so
with B2B data exchange. When data must be exchanged among partners and competitors, among
dissimilar cultures and languages, and among differing hardware and software platforms, we
are facing a digital version of the Biblical Tower of Babel.
XML provides the mechanism to address this issue through the
concept of data tags. Regardless of what the data name or names by which a data items is
identified in different contexts, it is assigned a universal data tag which uniquely
identifies the semantic meaning of that data item whenever it is used. For example, our
recent Y2000 experience could be characterized as a data tagging exercise, in which we had
to identify dates wherever they appeared, ensure that their format was correct, and that
their use was correct. Now, we need to do the same thing for all data items that are
candidates for data exchange in B2B e-commerce. However,Y2000 was comparatively simple,
because there was little problem agreeing on what MMDDYY or YYYYMMDD meant. Consider a
simple tag such as "name". Does this mean "first name + space + last
name", or "last name + comma + space + first name", or what? And that's a
USA-centric view of the problem. How do you include the fact that Spanish names have two
family names? What an American would consider to be the family name would not be the last
name in Spanish and elsewhere. In many Asian countries the family name is the first name.
Then there are cultures where some people have only one name. Globalization with its
complementary problem of localization of display and usage is beginning to expose this
sort of problem even before B2B e-commerce issues are considered. |