| 3.5 XML Validation Over The
Life Cycle XML usage will evolve with the
applications using it over their life cycle. Let's consider that usage in more detail. We
will have programs that will encode and decode complete files or database tables, as a
batch task. These documents will persist for relatively long periods of time. We will have
programs that encode or decode documents in a real-time data exchange mode, both with
applications on the same platform and on different platforms, so that the documents exist
only as a byte stream in memory and only for a few milliseconds. Data elements in programs
that will be involved in XML data exchange need to be matched to the global dictionary of
valid data tags. Document encoding and decoding routines have to be written and
maintained. Schemas have to be developed and maintained within the scope of the relevant
schema definition schema. And, provision must be made to ensure that, as any of these
elements change over time, all will be in concert and, if any elements do get out of
coordination, the
discrepancies will be detected and corrected before any harm can occur.
The most important capability of XML with regard to
disciplined use is the schema validation procedure. XML documents that persist on disk for
extended periods of time can be validated against their respective schema as a batch
process. However, XML documents that are transient in nature also require validation.
Therefore, programs that encode and decode these documents must include either a mechanism
to capture and write the transient documents to a persistent disk file for subsequent
validation, or there must be a real-time validation engine that can be called as needed.
This validation must be present during all testing of new program versions before entering
production. In addition, some percentage of production XML documents, up to 100% if
possible, should receive validation. Documents that fail validation must have error
handling logic defined, particularly as fields are strongly typed and data validity tests
are centralized into XML schema.
In addition, some type of versioning procedure is needed to
ensure that the validation is occurring as expected. Depending on the site configuration,
validation may be by a universal routine or by a routine generated with customized logic
for a particular schema. The latter will be much faster, but then it can get out of
synchronization with revisions to the schema. In both cases, updates to the schema
definition schema will require revalidation of the validators.
XML schema are a passive validation device, since validation
occurs only when requested, and it can be bypassed. Most database systems have active
validation, such that discrepancies between the current version of the database and older
versions of programs using the database are detected at run time, and defined error
procedures are executed. We recommend that XML validation should operate like active
database validation. This can be implemented by a modest extension to the schema
definition schema to create a version or a time stamp attribute for each schema that is
defined to match an equivalent attribute in each XML document. Each custom validation
program will also have the version or time stamp compiled into its object code, so that a
mismatch will be reported as an immediate error. In this way, normal XML facilities can be
used to provide the active validation that is missing from the XML specification in the
name of providing the greatest possible flexibility. |