Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent Application
20030149934
Kind Code
A1
Worden, Robert Peel
August 7, 2003
Computer program connecting the structure of a xml document to its underlying meaning
Abstract
A computer program which uses a set of mappings between XML logical structures and business information model logical structures, in which the mappings describe how a document in a given XML based language conveys information in a business information model.
Inventors:
Worden; Robert Peel
(Cambridge, GB)
Correspondence Name and Address:
Woodbridge & Associates P O Box 592
Richard C Woodbridge
Princeton
NJ
08542-0592
US
Series Code:
275310
Claims
1. A computer program which uses a set of mappings between XML logical structures and business information model logical structures, in which the mappings describe how a document in a given XML based language conveys information in a business information model.
2. The computer program of claim 1 which achieves some functionality using XML, in which the same functionality can be achieved with different XML based languages by using a set of mappings appropriate to each language.
3. The computer program of claim 1 in which the set of mappings is embodied in an XML document.
4. The computer program of claim 1 adapted to generate XSL using the sets of mappings for a first and a second XML based language to enable a document in the first XML based language to be translated automatically to a document in the second L based language.
5. The computer program of claim 4 in which using the set of mappings involves the step of reading XML documents defining of the sets of mappings between XML logical structures and business information model logical structures.
6. The computer program of claim 1 adapted to translate dynamically a message in one AL language to another using the sets of mappings for the two languages to some common business information model.
7. The computer program of claim 6 in which using the set of mappings involves the step of reading XML documents defining the sets of mappings between XML logical structures and business information model logical structures.
8. The process of automatically generating a computer program, using information from the mappings as defined in claim 1, so that the generated programs will work with different XML languages depending on which set of mappings each program was generated from.
9. The computer program of claim 1 as used in an interface layer providing an API which insulates code written in a high level language which accesses or creates documents in XML based languages from the structure of those XML based languages.
10. An API computer program comprising an interface layer adapted to insulate code written in a high level language from a given XML based language to enable an application written in the high level language to interface with the XML based language by using the program of claim 1, so that the code in the application is not dependent on the structure of the XML language.
11. A computer program in which an interface layer adapted to insulate code written in a high level language from XML based languages takes as an input a document in a XML based language and converts in one or both directions between a tree mirroring the structure of the XML based language and business information model logical structures by using the mappings between them as described in claim 1.
12. A computer program in which an interface layer uses the mappings of a first XML language onto a business model to read in data in the first XML language and convert it to an internal form reflecting the logical structures of the business model, and in which the interface layer uses the mappings of a second XML language onto the same business information model to convert data from the internal form reflecting the logical structures of the business information model to the structures of the second XML language
13. A method of translating between a first and a second XML based language by using the computer program of claim 12.
14. The method of claim 13 adapted to allow runtime translations, allowing the choice of the input and output XML languages to be made dynamically by the use of the appropriate mappings
15. The computer program of claim 11 in which the code written in a high level language allows users to submit queries in terms which reflect the logical structures of the business information model, not requiring knowledge of the structure of an XML language, and the translation layer allows a document in the an XML based language to be queried, using the mappings of that XML language onto the business information model.
16. The query program of claim 15 in which the same query can be run against documents in different XML languages by using the sets of mappings appropriate for each such language.
17. The computer program of claim 1 in which the logical structures of the business information model categorise the information relevant to the operations of the business organisation in terms of (a) classes of entities, (b) attributes of the entities of each class and (c) relations between these entities.
18. The computer program of claim 1 in which the mappings are specifications of what nodes need to be visited and paths traversed in the XML to retrieve information about given objects of classes, attributes and relations.
19. The computer program of claim 1 in which the XML logical structures are objects classified according to XML element types, XML attributes and XML content model links.
20. The computer program of claim 1 in which the XML logical structures are derived from schema notations.
21. The computer program of claim 1 in which the business information model logical structures categorise information in terms of ontological knowledge representation techniques.
22. A method of performing e-commerce transactions between several organisations using different XML-based languages of XML, in which a computer program as defined in claim 1 is used.
23. A method of enterprise application integration within an organisation using different XML-based languages, in which a computer program as defined in claim 1 is used.
24. A method of enabling a business organisation to alter an e-commerce business model reliant on XML interoperability, comprising the use of a computer program as defined in claim 1.
25. A method of creating a XML-based language comprising the following steps: (a) creating a business information model (b) defining requirements for an XML-based language in terms of classes, attributes and relations in the business information model that need to be represented in documents in the language (c) automatically generating a schema definition of the XML-based language which meets those requirements, applying automatically various choices as to how different pieces of business information in the requirement are to be represented in XML.
26. The method of claim 25 comprising the further step of, as the schema is generated, recording the automatically generated mappings between the elements, attributes and content model links of the schema and the classes, attributes and relations which the schema is required to represent in the business information model.
Description
FIELD OF THE INVENTION
[0001] This invention relates to computer program connecting the structure of an XML document to its underlying meaning.
DESCRIPTION OF THE PRIOR ART
[0002] To conduct e-business transactions, companies need a common language through which to exchange structured information between their computer systems. HTML, the first-generation language of the Internet, is not suited for this task as it defines only the formatting of information, not its meaning. Extensible Markup Language--XML--has been developed to address this deficiency: XML itself is not a language, but gives a facility for users to define their own languages ("XML-based languages"), by defining the allowed elements, attributes and their structure. Like HTML, XML consists of text delimited by element markers or `tags`, so it is easily conveyed over the Internet. In XML however, the tags can define the meaning and structure of the information, enabling computer tools to use that information directly. By defining an XML-based language through a "schema", users may define that XML messages which conform to the schema have certain defined meanings to the computer systems or people who read those messages. For instance, a schema may define an element `customer` with the effect that text which appears between `customer` tags, in a form such as <customer>J. Smith</customer>, gives the name of a customer. A message is simply a document or documents communicated between computer systems.
[0003] XML has been designed to convey many different kinds of information in a way that can be analysed by computer programs, using a set of tags (as explained above) which determine what kind of information is being conveyed. Information in XML documents can also be viewed by people, using a variety of techniques--for instance, transforming the XML into HTML which can be viewed on a browser.
[0004] However, in order to view such information, or to write computer applications which use the information in XML documents, it is necessary to know how the XML language encodes different kinds of information.
[0005] For instance, one of the most common application programming interfaces (APIs) to XML is the Domain Object Model (DOM), in which XML structure in a document is converted to an internal tree structure in the computer memory, and the API gives facilities to navigate this tree. To use a DOM interface, the application designer needs to know the structure of the DOM tree and how to navigate the DOM tree to extract each kind of information he needs.
[0006] As another example, the current W3C candidate for an end-user query language for XML, whereby users may ask questions and retrieve the answers from an XML document, is called XQuery. In order to use XQuery effectively, a user needs to understand the structure of an XML document, and how that structure encodes information.
[0007] The result is that in order to adapt XML applications to different XML languages, very often either the source code of the application needs to be changed or the users need to understand the structure of a new XML language. As XML languages proliferate, these changes can be very expensive.
[0008] As noted above, the allowed elements, attributes and structures for an XML-based language are defined in the `schema` for that language. The W3C-approved standard schema `notation` for XML schemas is the Document Type Definition, or DTD. Several other schema notations are in use, including XML Data Reduced (XDR) and XML Schema, which is now a W3C recommendation. For any given schema notation, such as DTD, XDR and XML Schema, many schemas will have been written. Each schema defines a particular XML-based language.
[0009] This open-ended facility to define XML-based languages, each language having a well-defined set of possible meanings, has led to a proliferation of industry applications of XML, each with its own language definition or `syntax`, where syntax means the structure of elements, attributes and content model links in an XML message, which should conform to the structure required for the language in the schema. A schema defines the applicable syntax; there can be different schemas defining the same syntax in different schema notations.
[0010] XML has been embraced enthusiastically by all of the major IT suppliers and user groups. Its standardization and rapid uptake have been a major development in IT over the past three years. Industry rivals like IBM, Microsoft, Sun, and Oracle all support the core XML 1.0 standard, are developing major products based on it, and collaborate to develop related standards. XML can therefore be thought of as the standard vehicle for all Business-to-Business (B2B) e-commerce applications. It is also rapidly becoming a standard foundation for enterprise application integration (EAI) within the corporation.
[0011] A major problem is that of XML `interoperability`, i.e. enabling a computer system `speaking` XML in one XML-based language to communicate with another system using a different XML-based language. In this context, the two computer systems may be in different organisations (for e-commerce) or the same organisation (for application integration): XML interoperability can also be a problem within an organisation too--if different package suppliers favour different XML-based languages of XML, all their applications may need to be integrated within that one organisation
[0012] An element of any XML interoperability solution must include some form of translation between the different XML-based languages (Le. translation of documents in one XML-based language to another XML-based language): there is a standardised XML-based technology, XSL, and its XML-to-XML component XSLT, for doing so. However, translating between many XML-based languages is difficult, even using XSL, for the following reasons:
[0013] If there are N different XML based languages which a company may have to use, then in principle up to N.times.(N-1) XSL translation files may be needed to inter-operate between them. The numbers can be forbidding. On the BizTalk repository site (see below), there are 13
different XML formats for a `purchase order`. If even a small fraction of the 156 XSL translations are needed, this is a challenging requirement.
[0014] XSL is a complex Programming Language. To write an error-free translation between two XML-based languages, one must understand the semantics of both XML-based languages in depth; and understand the rich facilities of the XSL language, and use them without error.
[0015] There is a significant problem of version control between changing XML-based languages. As each XML-based language is used and evolves to meet changing business requirements, it goes through a series of versions. As a pair of XML-based languages each go through successive versions, out of synch with each other, and some users stay back at earlier versions, a different XSL translation is needed for every possible pair of versions--just to translate between those two XML-based languages. While much of a version change may consist of simple extensions and additions, some of it will involve changes to existing structures, and may require fundamental changes in the XSL.
[0016] The XML translation problem is often portrayed as an issue of different `vocabularies`, in that different XML-based languages may use different terminology--tag names and attribute names--for the same thing. However, the differences between XML-based languages go much deeper than this, because different XML-based languages can use different structures to represent the same business reality. These structural differences between XML based languages are at the heart of the translation problem. Just as in translating between natural languages such as English and Chinese, translation is not just a matter of word substitution; deep differences in syntax make it a hard problem. Finally, it might be impossible to translate between one XML-based language to another not just in practice, but in principle: the meanings may just not overlap.
[0017] The track record of XSL translation to date is not encouraging. For instance, the BizTalk website (see below) is intended to be a repository for XSL translations between XML-based languages, as well as for the XML-based languages themselves. But while (at the time of writing) over 200 XML-based languages have been lodged at BizTalk, there are few if any XSL translations between XML-based languages. In practice it seems to be a forbidding task to understand both your own XML-based language and somebody else's XML-based language in enough depth to translate between them. Suppliers of XML-based languages are not to date stepping up to this challenge.
[0018] A similar problem of interoperability arose in the 1980s with the emergence of relational databases. In spite of the existence of an underlying technology to solve it Relational Views), it has in practice not been solved in twenty years. The result has been an information Babel within every major company, which has multiplied their information management and IT development costs by a large factor.
[0019] A significant feature of XSL is that it makes no explicit mention of the underlying meanings of the XML actually being translated: it in effect typically comprises statements such as "translate tag A in XML-based language 1 to tag B in XML-based language 2". Hence, it nowhere attempts to capture the equivalence in meaning between tags A and B, or indeed what they actually mean.
[0020] Further reference may also be made to the following.
[0021] (1) Techniques to capture the meaning and structure of business information in implementation-independent terms, going back to data modelling and entity-relationship diagrams, including also UML class models, the W3C recommendation RDF-Schema, and AI-based ontology representations such as KIF, the DAML+OIL notation.
[0022] (2) Sun's XML-Java initiative, which aims to provide developers with automatically generated Java classes which reflect the structure of an XML-based language. This operates at the level of the XML syntax, not the semantics.
[0023] (3) The OASIS backed ebXML repository initiative, which talks about using UML to capture information about XML-based languages.
[0024] (4) XML parsers, which can convert XML from an external character-based file form into an internal tree form called `Domain Object Model` (DOM) standardised by W3C; and can also validate that an XML message conforms to some schema, or language definition.
[0025] (5) XSL translators, which can read in an XSLT file, store it internally as a DOM tree, then use that DOM tree to translate from an input XML message in one language to an output XML message in another language.
[0026] (6) The W3C XPath Recommendation, which is a method of describing navigational paths within an XML document; XSLT makes use of XPath.
SUMMARY OF THE PRESENT INVENTION
[0027] In a first aspect of the invention, there is a computer program which uses a set of mappings between XML logical structures and business information model logical structures, in which the mappings describe how a document in a given XML based language conveys information in a business information model.
[0028] Hence, the present invention envisages in one implementation using a set of mappings between an XML language and a semantic model of classes, attributes and relations, when creating or accessing documents in the XML language. In this implementation, a mapping is a specification of which nodes should be visited and which paths (e.g. XPaths) traversed in an XML document to retrieve information about a given class, attribute or relation in the class model.
[0029] The set of mappings between an XML language and a class model may be embodied in an XML form called Meaning Definition Language (MDL), which is described in more detail in this specification.
[0030] Using the mappings, a piece of software (the interface layer) can convert automatically between an XML structural representation of information (such as the Domain Object Model, DOM) and a representation of the same information in terms of a class model of classes, attributes (sometimes referred to as `properties`) and relations (sometimes referred to as `associations`. This conversion can be in either direction: X structure to class model, or vice versa.
[0031] The key benefit of mappings is: If applications are interfaced to XML via mappings (which are read by software as data, not `hard-coded` in software), then any application can be adapted to a new XML language by simply using the mappings (i.e. data) for the new language, without changing software.
[0032] Using mappings and an appropriate interface layer, three important applications are possible, as described in depth in the Detailed Description of this specification:
[0033] Meaning-level query language: queries are stated in terms of the class model. The query tool retrieves data from an XML file via the mappings, so (a) users do not need to know about XML structure, (b) the same query can be run against multiple XML languages.
[0034] Meaning-Level API: Applications in e.g. Java use an API (to the interface layer) which refers only to the class model, not to XML structure. The interface layer uses mappings for a language to translate class-model-based API calls into XML structure accesses for the language. Applications can adapt to new XML languages by simply changing the mappings, i.e. with no change to software.
[0035] Translation: The interface layer gets information from an XML document in language 1 and converts it into class model terms. Then the interface layer converts the same information from class model terms back to language 2--so the information is translated in two steps from language 1 to language 2. Or a tool can use mappings to generate XSL which translates documents from language 1 to language 2.
[0036] If we focus for the time being on the application of the present invention to translation, this invention has several advantages over the prior art approaches to solving XML interoperability: First, it solves the N.times.(N-1) proliferation of translations problem, since the effort required to define the mappings for N languages is proportional to N, not N.times.(N-1). Secondly, it places the XML interoperability solution in the hands of individual business organisations, removing the need to wait for a common business vocabulary to arise (as required by many of the repository or supra-standards initiatives). The term `business organisation` should be construed to cover not just a single organisation but also a group of organisations. The term `XML logical structures` is defined in section 3 of the W3C XML specification.
[0037] The business information model preferably categorises the information relevant to the operations of a business organisation in terms of the following logical structures: classes of entities, the attributes of those entities of each class and the relations between the entities of each class. This trilogy of structures, referred to in this specification as `classes, attributes and relations` are examples of business information model logical structures. These classes, attributes and relations may be contained in a Universal Modelling Language (UML) class diagram, or similar notation. The mappings between the logical structures in each XML-based language and the logical structures in the business information model may define how syntactic structures in each XML-based language relate to the business information model: the syntactic structures may readily be derived from Document Type Definitions (DTDs) or from any other form of schema notation such as an XDR file or XML Schema file. The business information model may categorise the information used by one or more organisations not only in terms of Universal Modelling Language class diagrams, but also in terms of ontological knowledge representation techniques, such as an RDF Schema model or a DAML+OIL model.
[0038] Each XML-based language may be described in its schema definition as a set of element types, attributes and content model links. Elements, attributes and content model links will be referred to collectively as `XML objects`. XML objects are an example of XML logical structures. The way in which each XML-based language conveys information in the business information model may then be defined by mappings between XML objects and the classes, attributes and relations (i.e. `logical structures`) of the business information model. Information about the mappings may be stored in an intermediate file, XML or otherwise. One such XML-based language for storing definitions of mappings is, as noted earlier, called Meaning Definition Language (MDL) and makes use of the W3C XPath recommendation. In MDL, XPath is used to define which paths in an XML document need to be traversed in order to extract the different entities, attributes and relations of a business information model.
[0039] In one implementation, it is possible to generate XSL using the sets of mappings for a first and a second XML based language to enable a document in the first XML based language to be translated automatically to a document in the second XML based language. Using the set of mappings involves the step of reading XML documents defining of the sets of mappings between XML logical structures and business information model logical structures. Messages can be dynamically translated from one XML language to another using the sets of mappings for the two languages to some common business information model.
[0040] As noted above, the mappings can be expressed in an intermediate mapping file in Meaning Description Language, MDL. One implementation of the present invention is therefore a tool which reads the MDL files (embodying the mappings of two XML languages) and uses it to generate XSLT to translate between them. It is also possible to provide a tool which can read MDL and, instead of using the mappings to generate XSLT, dynamically translates a message in one XML language to another. This implementation is described in more detail in this specification as a `direct translation embodiment`.
[0041] The XSL generated automatically may be in a file format and that file used by an external XSL processor to transform a document in the first XML-based language to a document in the second XML-based language. Alternatively, the XSL may be retained in some internal form such as the W3C-standard Domain Object Model, and then acted on by software which performs the same XML translation function as an XSL processor, acting directly on this internal form. Another possibility is that, instead of XSL, the system may generate source code in Java or some other programming language, which then performs the same translation functions as performed by an XSL processor.
[0042] The present invention envisages in one implementation an interface layer which uses the mappings of a first XML language onto a business model to read in data in the first XML language and convert it to an internal form reflecting the logical structures of the business model, and in which the interface layer uses the mappings of a second XML language onto the same business information model to convert data from the internal form reflecting the logical structures of the business information model to the structures of the second XML language. This can be used for translating between a first and a second XML based language. It can also be used to allow runtime translations, allowing the choice of the input and output XML languages to be made dynamically by the use of the appropriate mappings.
[0043] There are two important applications of MDL:
[0044] First, a meaning-based XML query language. This enables a user to interactively ask questions about XML documents in a form such as "display student.name where student attends course and course.name=`French`"--so that the form of the question is dependent only on the business information model and is independent of any particular XML language. A tool then uses the MDL for some XML language to answer the question from an XML document in that language. The advantages over current XML-based query languages are (1) the user does not need to know about the structure of the XML and (2) the same query can be run against XML documents in many different languages. Hence, more formally, another aspect of the present invention covers a computer program in which an interface layer adapted to insulate code written in a high level language from XML based languages takes as an input a document in a M based language and converts information from a tree form (such as DOM) mirroring the structure of the XML based language to a form reflecting the business information model logical structures by using the mappings between them. This information is then displayed to the user, answering the query. The code written in a high level language allows users to submit queries in terms which reflect the logical structures of the business information model, not requiring knowledge of the structure of an XML language, and the translation layer allows a document in an XML based language to be queried, using the mappings of that XML language onto the business information model. The same query can be run against documents in different XML languages by using the sets of mappings appropriate for each such language.
[0045] The other important use of MDL is in a meaning-level application programming interface (API). This enables people developing an XML application in, say, java, to write their programs making reference only to the classes and objects in the business information model, without reference to the XML structure. The advantages are that programmers would not need to know about the structure of the XML, and the same programme could (by using MDL) run unaltered with several different XML languages. The benefits are therefore not to do with translation between XML languages per se; but with `internal` translation from any XML to a form which depends only on the business information model--insulating developers from the vagaries of any one language. Hence, this invention covers an interface layer using the set of mappings described above and providing an API which insulates code written in a high level language which accesses or creates documents in XML based languages from the structure of those XML based languages. The interface layer may take as an input a document in an XML based language and converts in one or both directions between a tree mirroring the structure of the XML based language and business information model logical structures by using the mappings between them as described above.
[0046] Further aspects and details of the present invention are particularised in the appended claims.
[0047] Definitions
[0048] Throughout this patent specification these terms have the following meanings:
[0049] "XML-based language" is a specification of the allowed elements, attributes and content model links in a set of XML documents, as defined by a schema notation such as a DTD, XML Data Reduced or XML Schema
[0050] "XML" is the industry standard SGML derivative language standardised by the WorldWideWeb Consortium (W3C) used to transfer and handle data. (XML derives from SGML, Standard Generalised Markup Language. HTML is an application of SGML.)
[0051] "DTD" or "Document Type Definition" is a definition of the allowed syntax of an XML document. DTD is one example of a schema notation.
[0052] "Document": A document is any file of characters. "XSL" is the industry standard translation language for translating documents between one XML-based language of XML and another. An example XSL document is given in this patent specification.
[0053] "XSLT" is that part of XSL which is intended mainly for translating one form of XML to another form of XML. The other part is for translation from XML to HTML and other formatting languages.
[0054] A "Programming Language" and "Computer Program" is any-language used to specify instructions to a computer, and includes (but is not limited to) these languages and their derivatives: Assembler, Basic, Batch files, BCPL, C, C+, C++, Delphi, Fortran, Java, JavaScript, Machine code, operating system command languages, Pascal, Pearl, PL/1, scripting languages, Visual Basic, meta-languages which themselves specify programs, and all first, second, third, fourth, and fifth generation computer languages. Also included are database and other data schemas, and any other meta-languages. For the purposes of this definition, no distinction is made between languages which are interpreted, compiled, or use both compiled and interpreted approaches. For the purposes of this definition, no distinction is made between compiled and source versions of a program. Thus reference to a program, where the programming language could exist in more than one state (such as source, compiled, object, or linked) is a reference to any and all states. The definition also encompasses the actual instructions and the intent of those instructions.
[0055] "Schema" is a set of statements in a schema notation such as DTDs, XDR etc which defines the allowed elements, attributes and content model links in an XML-based language.
[0056] "Schema Notation": a given schema notation is a notation which defines how schemas compatible with that notation must be written. Schema notations include DTDs, XDRs, and XML Schema. Many schemas can be written in any one schema notation.
[0057] "XPath" is the W3C recommendation for a standard specification of navigational paths in an XML document.
[0058] "XMuLator" is a software embodiment of this invention.
BRIEF DESCRIPTION OF THE FIGURES
[0059] The invention will be described with reference to the accompanying Figures in which FIGS. 1-9 illustrate concepts relating to Meaning Definition Language and FIGS. 10-82 illustrate concepts relating to the XmuLator implementation of the present invention.
DETAILED DESCRIPTION
[0060] Meaning Definition Language--MDL
[0061] XML is designed to make meanings explicit in the structure of XML languages. However, when we build XML applications today, we interface to XML at the level of structure, not meaning. We navigate document structure by interfaces such as DOM, XPath and XQuery. Therefore every developer or user has to re-discover for himself `how the structure conveys meaning` for each XML language he uses. This is wasteful and error-prone. We need to develop tools so that XML developers and users can work at the level of meaning, not structure--with the tools providing the bridge between meaning and structure.
[0062] Schema languages such as XML Schema and TREX are about structure of XML documents. UML, RDF Schema, and DAML+OIL are about meaning. None of these notations provide the link between structure and meaning. Meaning Definition Language (MDL) is the bridge between XML structure and meaning--expressed precisely, in XML.
[0063] Using MDL, the language designer can write down--once and for all--how the structure of an XML language conveys its meaning. From then on, MDL-based tools allow users and developers to interface to that language at the level of meaning. The tools can automatically convert a meaning-based request into a structure-based provision of the answer. This chapter explains how, by introducing MDL and describing three working applications of MDL:
[0064] A Meaning-Level Java API to XML: allowing developers to build applications with Java classes that reflect XML meaning, not structure; then to interface those applications automatically to any XML language which expresses that meaning.
[0065] A Meaning-level XML Query Language: allowing users to express queries in terms of meaning, without reference to XML structure; to run the same query against any XML language which expresses that meaning, and to see the answer expressed in meaning-level terms
[0066] Automated XML translation, based on meaning: allowing precise, automatic generation of XSLT to translate messages between any two XML languages which express overlapping meanings.
[0067] The benefits of the meaning-level approach to XML are far-reaching:
[0068] Users and developers can work at the level of meaning--which they understand--rather than grappling with XML structures, where they may poorly understand the language designer's intention or make mistakes in the detail (particularly for large complex languages).
[0069] Applications, XML queries and presentations of XML information can be developed once at the meaning level, and then applied to any XML language whose MDL exists, without further changes
[0070] So whenever a new XML language comes along--as will frequently happen--all you need do is find (or if need be, write down) the MDL definition of that language. Then all your systems and users, using that MDL, will be immediately adapted to the new language, without any further effort. As XML usage grows and languages proliferate, the cost-savings from this easy adaptation will be huge.
[0071] The W3C Semantic Web initiative aims to make web-based information usable by automated agents. Currently, such automated agents are not able to use information from most XML documents, because of the diverse ways in which XML expresses meanings. So the semantic web depends on RDF, which expresses meanings in a more uniform manner than XML. MDL would enable agents on the web to extract information from XML documents, as long as their MDL was known--thus extending the scope of the Semantic Web from the RDF world to the larger world of XML documents on the web.
[0072] 1. XML--MEANING AND STRUCTURE
[0073] In this section we introduce the Meaning Definition Language and show how it provides a precise bridge between XML Structure and XML Meaning--defining how XML structures convey meanings.
[0074] Before we build the bridge, we need first to describe the two pillars which MDL spans--Structure and Meaning. Before we do that, we shall introduce a sample problem which has great practical importance. The examples in this chapter will use that sample problem.
[0075] 1.1 Example--Thirteen Purchase Orders
[0076] e-commerce is one of the killer apps which has propelled XML to fame over the past three years. Central to the conduct of much e-commerce is the electronic exchange of purchase orders. So a large number of XML message formats for purchase orders have been developed. Many of these can be found at the main X repositories such as XML.org and Biztalk.org.
[0077] The core meaning of a purchase order is fairly simple. A buying organisation sends an order to a selling organisation, committing to buy certain quantities of goods or products. There is one order line for each distinct type of goods, specifying the product and the amount required. The purchase order may also define who authorised or initiated the purchase, whom the goods are to be delivered to, and who will pay. Many other pieces of information may be given in specific purchase orders, but that is the basic framework.
[0078] We shall see below how the scope of this `core purchase order meaning` can be defined, and the range of ways in which the core meaning is conveyed in XML. For the moment we note that many different XML languages--certainly many more than thirteen--can be found which convey more or less the same `core purchase order` meaning in different XML structures. We have studied thirteen of them in some detail. Typical of the purchase order formats we have analysed with MDL are:
[0079] The BASDA purchase order message format, part of the BASDA eBIS-XML suite of schemas available from the Business & Accounting Software Developer's association (BASDA) at www.basda.org.
[0080] The cXML protocol and data formats, used by Ariba in their e-commerce platform.
[0081] Purchase order messages generated from an Oracle database by Oracle's XML SQL Utility (XSU); these have a relatively flat structure which mirrors the database structure directly.
[0082] The Navision purchase order message format from Navision Software a/s in Denmark, (http://www.navision.com/), a part of the Navision WebShop e-commerce solution.
[0083] Purchase order message formats from he Open Applications Group (OAG) in the OAGIS framework for application integration.
[0084] Now imagine you are setting up to sell goods by XML-based e-commerce, and your clients tell you what purchase order message formats they use. They are the customers, and you cannot tell them to use your own favorite XML format, so your systems must be able to accept all these formats--and others, as new e-commerce frameworks emerge. That is the test problem used for the examples in this chapter.
[0085] 1.2 Defining XML Structure
[0086] There is a proliferation of ways to define XML structures. In spite of W3C support for XML Schema, the proliferation shows little sign of abating, with other candidates such as TREX and RELAX supported by many. We will have to learn to live with a diversity of schema-defining languages. Despite this diversity, two points remain true:
[0087] Schema languages are mainly about structure, not meaning. For all the work that has gone on to define data types in XML Schema and other Schema languages, type is only a small part of meaning. It is of little use to know that some element has type `date` if I do not know what the date relates to, or how it relates to it. Is it the date of a purchase order, or someone's birthday? Is it the date the order was sent, or approved, or received? Data type on its own tells you none of these things.
[0088] The most important structure information remains `what XML trees are allowed`. AR schema languages basically define allowed nesting structures of elements. Even the elaborate apparatus in XML Schema for deriving complex types by extension or restriction serves only to define what nodes can be nested inside other nodes, and their sequence restrictions.
[0089] So the most important tool for understanding XML structure is a tree diagram, showing the possible nesting structure of elements (without repetition of the repeatable elements). A typical tree diagram, for one of the published purchase order formats we have analysed, is shown in FIG. 1.
[0090] This XML purchase order structure, from Exel Ltd, is one of the simpler purchase order structures available. It shows most of the core purchase order meaning components in a fairly self-evident way. For instance, the `Header` element contains information about the whole purchase order, such as the order date. Each order line is represented by an `Item` element which gives the quantity, unit price and so on of the order line.
[0091] Attribute nodes are marked with `@`. The number of distinct nodes in this tree diagram (with repeatable nodes not repeated) is 55. Not all of these are shown in the diagram; the `+` boxes show where sub-trees for `Address` and `Contact` have not been expanded in the diagram.
[0092] Other purchase order message formats can be much more complex--having hundreds or even thousands of distinct nodes, even without repeating any repeatable nodes. To fully understand even a few of these formats is a non-trivial exercise.
[0093] 1.3 Defining What XML Documents Mean
[0094] A minimal model of XML meanings assumes that any XML document can express meanings of three kinds:
[0095] About Objects in Classes: information of the form "there is a product" or "there are three purchase order lines"
[0096] About the Simple Properties of the Objects: "the product type is `video camera`" or "the product price is $31.50".
[0097] About Associations between the Objects: "the goods recipient has this address" or "this manufacturer made that product".
[0098] Associations are often referred to as `relations`, but we will use the UML term `association` everywhere for uniformity. It is hard to see how much meaning can be expressed at all without using all three of the core meaning types. Inspection of any data-centric XML document shows that it expresses meanings of all three types: about objects, simple properties and associations.
[0099] These three concepts are the building blocks of UML class diagrams. They have a successful track record of application in modelling of information and knowledge--for, instance, in Entity-Relation Diagrams and AI frames.
[0100] We can draw a class diagram (see FIG. 2) showing the core object classes, properties and associations expressed by typical purchase order messages.
[0101] Here, classes of object are denoted by boxes, and associations by lines. Simple properties are denoted by words next to the boxes. To summarise a central part of the diagram in words: "Several purchase order lines can be part of a purchase order. Each order line has a line number and a quantity, and is an order line for a product".
[0102] Most XML purchase order message formats convey a large part (if not all) of the information on this diagram--while some convey extra information not on the diagram. For instance, you can easily spot the equivalences between some of the properties of this diagram with nodes of the Exel XML purchase order message shown above.
[0103] As this is a UML class model, it can be expressed in any notation for class models. One notation is XMI, an XML language designed for interchange of metadata, for instance between CASE tools. However, XMI is a highly generic language designed to support many types of metadata, and in practice is rather verbose.
[0104] RDF Schema, proposed as a foundation for defining the meanings of web resources in RDF, embodies the same three concepts of classes, properties and associations Cm RDF and RDF Schema, the term `property` encompasses both what we here call `simple properties` and `associations`). XML encodings of RDF Schema are more concise than XMI, and more readable. The ontology formalism DAML+OIL is a modest extension of RDF Schema, which retains its readability while adding a few extra useful concepts, and has a well-defined semantics. We use DAML+OIL (March 2001 version) as our preferred way to encode in XML the model of classes, associations and properties needed to define the meanings of XML documents, for use in association with MDL.
[0105] A fragment of DAML+OIL describing the purchase order class model in the diagram has the form:
1
<daml:Class rdf:ID = "purchaseOrder"> <rdfs:label>purchaseOrder</rdfs:label> <rdfs:comment>document committing one organisation to purchase goods from another</rdfs:comment> <rdfs:subClassOf ID = "purchaseOrderPart" /> </daml:Class> <daml:Class rdf:ID = "orderItem"> <rdfs:label>orderItem</rdfs:label> <rdfs:comment>one line of a purchase order, specifying a quantity of one item</rdfs:comment> <rdfs:subClassOf ID = "purchaseOrderPart" /> </daml:Class> <daml:ObjectProperty ID = "[orderItem]isPartOf[purchaseOrder]"> <rdfs:label>isPart- Of</rdfs:label> <rdfs:domain rdf:resource = "#orderItem"/> <rdfs:range rdf:resource = "#purchaseOrder"/> </ daml:ObjectProperty > <daml:DatatypeProperty ID = "orderItem:quantity"> <rdfs:label>quantity</rdfs:label> <rdfs:domain rdf:resource = "#orderItem"/> <rdfs:range rdf:resource = "http://www/w3.org/2000/10/XMLSchema#nonNegativeInteger"/> </daml:DatatypeProperty>
[0106] Note the use of three different namespaces--with prefixes `daml:` `rdf:` and `rdfs:`--because DAML+OIL is an extension of RDF Schema incorporating concepts from RDF and RDF Schema. The daml:Class elements define a class inheritance hierarchy in a fairly straightforward way; properties and associations are inherited down this taxonomy. daml:DatatypeProperty elements define simple properties of objects in classes. The resource name (ID) of these properties must be unique across the model, but property labels such as `quantity` may occur several times in different classes, with different meanings for the properties. The XML Schema data type of any simple property is defined. daml:Object Property elements define associations, using rdfs:domain and rdfs:range elements to identify the two classes involved in each association.
[0107] A class model, as expressed in DAML+OIL or XMI, generally defines a space of possible meanings, and its coverage is made wide enough to encompass a set of XML languages. Any one XML language typically only expresses a subset of the possible objects, associations and properties in the class model.
[0108] That is the apparatus we use to define what meaning an XML language conveys; next we consider how it conveys that meaning.
[0109] 1.4 MDL--Defining how XML Expresses Meaning
[0110] There follows an outline description of MDL--intended to give enough of the flavour of MDL to understand the sample applications which follow. This outline does not cover all aspects of MDL--for that, see the full description at http://www.charteris.com/mdl.
[0111] If an XML language expresses meanings in a UML (or DAML+OIL) class model, then an MDL file can define how the XML expresses that meaning. The MDL defines how the XML represents every object, simple property or association which it represents.
[0112] Generally, particular nodes in the XML structure express particular types of meaning; for instance each element with some tag name may represent an object of some class, or each XML attribute may represent some property of an object. However, there is more to it than that.
[0113] To define how an XML language represents information, you need to define not only what nodes carry the information, but also the paths to get to those nodes. The best way to define such paths is to use the W3C-recommended XPath language. For instance, you need to define what XPaths to follow to get from a node representing an object to the nodes representing all of its properties. This leads to the core principle of MDL: For every type of meaning expressed by an XML language, MDL defines which nodes carry the information, and what XPaths are needed to get to those nodes.
[0114] MDL is designed to be the simplest possible way to define this node and path information in XML. It turns out that the nodes and paths you need to define how XML represents information follow a simple 1-2-3-Node Rule:
[0115] To define how XML represents objects of some class, you need to specify one node type and the path to it from the root node
[0116] To define how XML represents a simple property of objects of some class, you need to specify two node types and a path between them.
[0117] To define how XML represents some association between classes, you need to specify three node types and some of the paths between them
[0118] We shall see how this works out in the examples which follow.
[0119] 1.4.1 Structure of MDL
[0120] The primary form of an MDL document is a schema adjunct. Schema Adjuncts are a recent proposal for a simple XML file to contain metadata about documents in any XML language, which goes beyond the metadata expressed in to typical schema languages (in any way thought useful by the person defining the adjunct) and may be useful when processing documents. Schema Adjuncts have a wide range of potential uses.
[0121] An MDL document is an adjunct to a schema (e.g. an XML Schema) which defines the structure of a class of documents. The MDL defines the meanings of the same class of documents. An MDL document has a form such as:
2
<schema-adjunct target=http://www.myco.com/myschema- .xsd xmlns:me="http://www.myCo/dmodel.daml" > <document> ... </document> <element context = `product`> ... </element> <element context = `product/manufacturer`> ... </element> <attribute context = `product/@price`> ... </attribute> </schema-adjunct>
[0122] The attribute `target` of the top schema-adjunct element is URL of the schema of the XML language which this MDL describes, when there is a unique schema. (he case of XML languages using elements from several namespaces is not discussed here.) The namespace in the schema-adjunct element (in this example with prefix `me`) has a namespace URI for the semantic model (e.g. in DAML+OIL) which this meaning description is referenced to. This could be an RDDL URI, enabling access to the DAML+OIL model. Thus the top schema-adjunct element gives the means for an MDL processor to access both the schema and the semantic model, and to check the MDL against each of them individually or together.
[0123] The <document> element is not discussed further here. <element> and <attribute> elements each define what meaning is carried by various elements and attributes in the XML language. For each <element> element, its `context` attribute defines the XPath needed to get from the root of the document to the element in question (and similarly for attributes). The contents of the <element> element define what meaning that element carries (and similarly for attributes). The ways in which they do this are illustrated by the examples below.
[0124] 1.4.2 How XML Encodes Objects
[0125] Objects are almost always denoted by XML elements. There is typically a 1:1 correspondence between element instances and objects in a class. Therefore the MDL for an element may typically say `all elements of this tag name, reached by this path, represent objects of that class`. A typical piece of MDL to do this:
3
<element context="/NavisionPO"> <me:object class="purchaseOrder"/> </element>
[0126] This simply says "every element reached from the document root by the XPath `/NavisionPO` represents one object of class `purchaseOrder`."
[0127] Thus in accordance with the 1-2-3 Node Rule, the MDL to define how XML represents an object defines one node type, and the path to it from the document root. This is shown in FIG. 3 below.
[0128] There are cases where one element simultaneously represents two or more object of different classes. In that case, in the MDL there may be several `me:object` elements nested inside the same `element` element.
[0129] MDL may provide two further pieces of information about how elements represent objects, which we mention but do not describe in detail here:
[0130] An element may represent object of a class only conditionally--only when certain other, conditions (in the XML document) apply. MDL lets you define what those conditions are--i.e. just which elements represent objects.
[0131] When an XML document represents objects of a class, it will usually not represent all objects of the class, but only those objects which satisfy certain inclusion conditions (in the semantic model). MDL lets you define what the inclusion conditions are--i.e. which objects within the class are represented in the document.
[0132] 1.4.3 How XML Encodes Simple Properties
[0133] Simple properties are nearly always represented in XML in one of two ways:
[0134] Either a simple property is represented by an attribute (i.e. the value of the attribute represents the value of the simple property)
[0135] Or the value of a simple property is represented by the text value of an element.
[0136] In either case, you need to tie together the property with the object of which it is a property--the object instance which owns the property instance. This is done in MDL by defining the XPath to get from a node representing an object to the node representing its property.
[0137] A typical piece of MDL which defines how XML represents a property is:
4
<element context="/NavisionPO/Line/Unit_of_Measure"- > <me:property class="product" property="unitOfMeasure"> <me:find fromPath="Unit_of_Measur- e"/> </me:property> </element>
[0138] The `me:property` element defines what property the element represents; it defines the property name (`unitOfMeasure`) and the class (`product`) of which it is a property.
[0139] In this case, the MDL for objects of class `product` is:
5
<element context="/NavisionPO/Line"> <me:object class="product"> </element>
[0140] Therefore each `Line` element represents a product, and each `Unit_of_Measure` element represents the `unitOfMeasure` property of the product--as defined by the `me:property` element in the MDL. The `fromPath` attribute states that to get from an element representing a `product` object to the element representing its unit of measure, you have to follow the XPath "Unit_of_measure"--that is, find the immediate child element with that name.
[0141] The `fromPath` attribute serves the important purpose of tying up each object instance with the actual properties of that object instance. Without it, an XML document might represent many objects, and many property values, but you might not be able to link them together correctly. XPath is the general way to define the linkages.
[0142] Again in accordance with the 1-2-3 Node Rule, the MDL to define how XML represents some property depends on two node types (nodes representing objects, and nodes representing the property) and the XPath between them. This is shown in FIG. 4.
[0143] MDL can describe other aspects of how XML represents properties, which we will merely mention here but not describe in detail:
[0144] It may be that not all elements of given tag name, reached by a given XPath, represent a property; sometimes certain other conditions may need to be satisfied. MDL lets you define what these conditions are.
[0145] The XML may represent the value of a property in a particular format, which may need conversion to a `central`format defined in the semantic model. MDL lets you define formast conversion methods, e.g. in Java or XSLT.
[0146] 1.4.4 How XML Encodes Associations
[0147] As described above, the ways in which XML languages represent objects and properties are generally straightforward, and present few problems. However, the representation of associations (aka relations) in XML is more complex, and requires careful consideration.
[0148] XML can represent associations in three main ways, which at first sight look very different from one another:
[0149] By nesting of elements: e.g. when `orderLine` elements are nested inside a `purchaseOrder` element, this means that all the order line objects are part of the purchase order--representing the association [order line `is part of` purchase order]by element nesting.
[0150] By overloading of elements: e.g where the same `line` element represents an order line, the product which the order line is for, and the association [order line `is for` product].
[0151] By shared values: where elements representing the two associated objects are remote from one another in the XML, but their association is indicated by the fact that they share common values of some elements or attributes.
[0152] Each one of these three methods occurs commonly in practice, and cannot be neglected. Fortunately, the three methods all share some common underlying principles, which means that the same XPath-based form of description can be used to define all of them. We can define a common three-node model of representing associations, which covers all these cases.
[0153] In any XML representation of an association [E]A[F] between objects of class E and class F, nodes of some type denote instances of the association. We call these association nodes. Therefore each instance of an association in a document involves just three nodes--the two elements representing the objects at either end of the association instance, and the association node itself. To define how XML represents the association, we need to define how to tie together the three nodes of each instance of the association. If we can tie together these three nodes, we have in so doing tied together the two object-representing nodes--and can thus find out which object instances are linked in an association instance. That is all the information carried in an association, so it defines fully how XML represents the association.
[0154] In many cases, the three-node model will be `degenerate` in that two or more of the three nodes will be identical; a two-node model, or even a one-node model, would have been adequate. Nevertheless, the three-node model is adequate for all cases; the fact the it is more than adequate for some cases does not matter.
[0155] MDL defines how the three nodes are linked using XPath expressions, and supplementary conditions which the nodes must satisfy (these are necessary to describe the `shared value` representation of associations). MDL provides the means to define the XPaths both from the object-representing elements to the association node, and in the reverse direction. When extracting association information from a document, paths in either direction may be needed--either to go from E=>A=>F, or to go in the reverse direction.
[0156] The three-node model of associations is shown in FIG. 5.
[0157] In cases where the three-node model is an overkill, and two or more of the nodes of any association instance are identical, then the XPaths between the identical nodes are just the trivial `.` path which means `stay where you are`.
[0158] Therefore the full MDL definition of an association has a path from the root to define the set of association nodes, and it has relative paths between the association nodes and the elements representing objects at the two ends of the association. For instance, when an association is represented by element nesting, the MDL is of a form such as:
6
<element context="/NavisionPO/Ship_to/Ship_to_Conta- ct"> <me:object class="goodsAddressee"/> <me:association assocName="worksFor"> <me:object1
class="goodsAddressee" fromPath="." toPath="."/> <me:object2 class="recipientUnit" fromPath="Ship_to_Contact" toPath="parent::Ship_to"/> </me:association> </element>
[0159] The `me:object` element says that elements of tag name `Ship_to_Contact` represent objects of class `goodsAddressee`.
[0160] The `me:association` element says that the same elements also represent the association [goodsAddressee]worksFor[recipientUnit]. So in this case, the association node is the same as one of the object-representing nodes (i.e. the one representing the goods addressee). The fromPath and toPath attributes of the me:object1 are both trivial `stay here` paths; they mean `to get from the association node to the goodsaddressee node, or back again, just stay where you are`.
[0161] The me:object2 element defines how to get from the association node to the `recipientUnit` node, or back again. In this case it is clear that recipient units are represented by `Ship_to` elements, which ate parent nodes to the `Ship_to_Contact` nodes. So the toPath attribute says `go to your parent node` and the fromPath attribute says `go to your Ship_to_Contact child node`.
[0162] All this says that the association [goodsAddressee]worksFor[recipie- ntUnit] is represented by element nesting. But because it does so by using general XPath expressions, which can also be used for any other representation of an association, the association information can be extracted by general XPath-following mechanisms.
[0163] Again in accordance with the 1-2-3 Node rule, the MDL to define how XML represents some association depends on three node types (two for the objects linked by the association, and one for the association node) and some XPaths between them.
[0164] 1.4.5 A Simplification--Shortest Paths
[0165] MDL requires you to specify XPaths for both simple properties and associations--to define how you get from a node representing an object to the nodes representing its properties and associations.
[0166] Specifying all of these paths might be a lot of work, unless you had an automatic tool to help you do it. Fortunately, in the vast majority of cases, the required path--for instance the path from a node representing an object to a node representing one of its simple properties--obeys a `shortest path` heuristic; it is the shortest possible path from the one node to the other. Similarly, nearly all paths from object-representing nodes to their association nodes are shortest paths.
[0167] We can therefore simplify the language by defining that the default XPath is always the simplest path; you only need to define the XPath explicitly when it is some different path. This means that the great majority of XPaths need not be provided explicitly, but can be simply computed by MDL-based tools.
[0168] In the examples we have always used full-form MDL; but in practice the language can be written more tersely without most of the paths.
[0169] 1.4.6 How to Use MDL
[0170] In summary, MDL defines `how information is encoded in XML` in a rather uniform manner for the three main types of information, about objects, properties and associations. For each type of information, the MDL says `to extract the information from an XML document, follow these XPaths`.
[0171] MDL-based tools are given a definition at the level of meaning--in the semantic model--of what is required, and then they use the information in the MDL to convert this automatically to a structural description of how to navigate (or construct) the XML to do this.
[0172] To do so, builders of MDL-based tools need to solve two problems--the input problem and the output problem.
[0173] The Input Problem is to extract the information from an `incoming` XML document and view that information directly in terms of the classes, simple properties and associations of the semantic model. From the nature of MDL, this problem is fairly simple to solve. MDL defines the XPaths you need to follow in order to extract from a document a given object, or any of its simple properties, or any of its associations. So to find the value of any simple property or association of some object, you simply need to follow the relevant XPaths in the document, as defined in the MDL. This is easily done if you have an implementation of XPath, such as Apache Xalan.
[0174] The Output Problem is to `package` the information in an instance of the semantic model into an `outgoing` XML document which conveys that information. It is not quite so obvious how to do this from the definition of MDL, but in fact it is fairly straightforward. You need to construct the document from its root `downwards`. Generally you will come to nodes representing objects before you come to nodes representing their properties and associations. As you come to each node type, you check in the MDL what type of information the node type represents (e.g. what class of object, or what property), and you check what instances of that type of information exist in the semantic model instance. You then construct node instances to reflect these information model instances.
[0175] We will illustrate this by describing three MDL-based tools which allow users and developers to view XML at the level of its meaning. The first and second of these--a Java API to XML, and a meaning-level query language--only require a solution to the input problem; while the third (automated XML translation) requires a solution of both the input problem and the output problem.
[0176] 2. Meaning-Level API to XML
[0177] When we write applications to use XML in a language such as Java, we generally interface between the application and the XML via some standardised API, such as the W3C-recommended Domain Object Model (DOM). Several XML parsers provide high-quality implementations of the DOM API, and many XML applications are built on top of them.
[0178] The way this works, for a read-only application which consumes XML but does not create it, is shown in FIG. 6.
[0179] Here, the XML document is read in by the parser, which makes available the DOM interface to the resulting document tree, for use by the application code.
[0180] However, the DOM interfaces are defined entirely in terms of document structure--giving facilities to construct and navigate the document tree in memory. Therefore interfacing to XML via DOM has two drawbacks:
[0181] Developers are interested in getting the meaning out of an XML document (or putting it in). To do this via DOM, they need to understand the XML document structure, and how it conveys meanings, quite precisely. For large and complex XML languages, this is costly and error-prone.
[0182] Applications need to be written with one document structure in mind, `hard-wiring` that document structure into the code. If the application is to be re-used with another XML language which conveys the same meanings, that application needs to be rewritten.
[0183] Using MDL, we can write applications which interface to the XML at the level of its meaning, not its structure--and so avoid the two drawbacks above. The way this works (again for a read-only application which consumes XML but does not create it) is shown in FIG. 7.
[0184] The components of this diagram will first be outlined before discussing some of them in more detail:
[0185] The Application Code is written by the developer in Java to accomplish whatever the application is about. This code uses the classes immediately below it in the diagram--classes which reflect only the semantic model of the domain, and are independent of XML structure.
[0186] The classes purchaseOrder, orderLine, product, manufacturer and so on are the classes of the UML (or DAML+OIL) semantic model. Each instance represents one purchase order, order line, and so on--the objects of the semantic model which supports the application. The available object instances are precisely the object instances represented in the input XML. Their instance methods return the values of an object's properties, or sets of other objects linked to that object by the associations of the semantic model.
[0187] The class `Xfactory` is a factory class which can return all the purchaseOrder objects, or all the orderLine objects, or all objects of any class represented in the XML.
[0188] The class `MDL` reads in the MDL file for a particular XML language and stores all its information in internal form. It then makes available methods used by the classes of the semantic model, and by the factory class, to return values which reflect information in the XML document.
[0189] The XPath and DOM APIs are an implementation of these W3C standard interfaces--for instance, as provided by the Apache Xalan Xpath/XSLT implementation with the Apache Xerces XML parser.
[0190] A typical sample of application code, using the purchase order XML languages described earlier, looks like:
7
// compute the total quantity of all items in a PO int totQuant(Node root, MDL mdl) { int total = 0; Xfactory xf = new XFactory(root,mdl); Vector oLines = xf.everyOrderLine( ); if (oLines != null) for (int i = 0; i < oLines.size( ); i++) { orderLine ord = (orderLine) oLines.elementAt(i); total = total + ord.quantity( ); } return total; }
[0191] This calculates the total number of items, summed over all order lines for a purchase order--possibly not a very useful number, but sufficient to illustrate the approach. Compared with typical DOM-based XML applications, there are two remarkable things about this piece of code:
[0192] It is simple to write and understand--compared for instance to code which uses the DOM
[0193] It is completely independent of XML structure--so it will run unchanged with any XML purchase order message format, provided that XML's MDL definition is available.
[0194] The MDL instance mdl has previously been initialised and has an internal representation of the MDL file. First the method above creates an XFactory instance, and uses that instance to create a Vector oLines of all orderLine objects represented in the XML message. It then inspects the individual orderLine objects, and for each one adds its quantity to the total. All the work of navigating the XML document to find this information is done by the supporting classes.
[0195] The next layer of classes in the diagram above (XFactory and all the domain classes such as purchaseOrder) are all generated automatically from the DAML+OIL definition of the semantic model.
[0196] The class XFactory has one method for each class in the semantic model--to return a vector of all the objects of the class represented in the XML document The generated code for one of these methods looks like:
8
/* return a Vector of all orderLine objects represented in the XML document; or null if the language does not represent orderLines. */ public Vector everyOrderLine( ) { int i; Vector res = null; NodeList nl = mdl.getAllObjectNodes("orderLine", root); if (nl != null) { res = new Vector( ); for (i = 0; i < nl.getLength( ); i++) {res.addElement (new orderLine(nl.item(i),mdl));} } return res; }
[0197] As can be seen, this code can be generated just by substituting the class name at several places in a standard template.
[0198] The source code for each class of the semantic model is also generated automatically. A typical generated class has source code:
9
import org.w3c.dom.*; import java.util.*; public class orderLine { private Node objectNode; private MDL mdl; public orderLine(Node n, MDL m) {objectNode = n; mdl = m;} // String value of `quantity` property public String quantity( ) {return mdl.getPropertyValue ("orderLine","quantity",objectNode);} /* single purchaseOrder object related by [orderLine]isPartOf[purchaseOrder- ] */ public purchaseOrder isPartOf_purchaseOrder( ) { purchaseOrder res = null; Node nl = mdl.getRelatedObjectNode- s ("orderLine","isPartOf","purchaseOrder", objectNode,1); if (nl != null) {res = new purchaseOrder(n.item(0),mdl);- } return res;
[0199] For reasons of space, only one or two of the property and association methods are shown. Typically a class has many properties and associations, each with its own method.
[0200] Note that the generated code depends on the semantic model, but not at all on the XML structure or MDL. The same generated code can be used unchanged with many different XML languages.
[0201] These classes use lazy evaluation of their properties and associations. When an instance is created, its only internal state consists of the node in the XML document which represents the object. Whenever the value of a property or association is required, the value is computed by calling the MDL class instance, which navigates the XML to retrieve the values. It would of course be possible to cache values in each instance, so that repeated evaluation did not cause repeated traversal of the DOM tree, but this has not yet been done.
[0202] Again, you can see that this source code is generated quite simply by substituting various class names, property names and association names in standard code templates.
[0203] All the semantic-level generated classes rely on the class MDL to get information from the XML document. It is here that the real work is done, but it is not difficult work The MDL class reads in the MDL file, stores it in an internal form, and then makes available three core methods used by the generated classes. The three core methods retrieve objects, properties and associations from the XML document
[0204] getAllObjectNodes(String className, Node root) is given the root node of the XML document and returns a NodeList of all nodes in the document which represent objects of class `className`
[0205] getPropertyValue(String className, String propertyName, Node objectNode) is given the node object Node which represents an object, and returns (as a string) the value of one of its properties, as represented in the XML.
[0206] getRelatedObjectNodes(String class1, String relation, String class2, Node obj12, int oneOrTwo) is given the node representing one of the objects in an association, and returns a NodeList of nodes representing all the objects of some class related to the first object by some association. OneOrTwo is 1 or 2 depending on whether the input object is of class1 or class2--on the left-hand side or the right-hand side of the relation name.
[0207] The code of the MDL class is completely independent of the application, being driven by the data from the MDL file. The implementation of the three core methods is fairly straightforward, since the class MDL knows all the XPaths to be traversed in the document to retrieve the relevant information. Currently the MDL class makes use of the following XPath interfaces provided by the XPathAPI class of Apache Xalan:
[0208] selectNodeList(Node n, String xPath) returns a NodeList of all nodes reachable by following the path xPath from the node n.
[0209] selectSingleNode(Node n, String xPath) returns a single node, in cases where you know only a single node can be returned.
[0210] These interfaces make the job of the MDL class very simple.
[0211] Therefore by using the XPath interface to XML documents, and using a few simple intermediate classes (some generated, and others independent of the application) we are able to insulate the Java application completely from the details of XML document structure. With this interface, developers can work at the level of semantic model classes which they understand. They do not have to learn the intricacies of XML document structure; and their applications will work unchanged with many different AL document formats. For instance, the sample purchase order application fragment works unchanged with any of the 13 different XML purchase order message formats we have analysed with MDL. Applications can even switch dynamically to handle messages in different XML languages at the same time.
[0212] Here we have only discussed `read-only` applications which read XML but do not write it. The application of these techniques to read/write applications is a bit more complex, but very feasible.
[0213] As XML languages continue to proliferate, we believe that the benefits of this meaning-level style of application development--in quality, development costs and maintenance costs--will be overwhelming. There is no reason not to start doing it now.
[0214] 3. Meaning-Level XML Query Language
[0215] The current state of XML query languages is in a sense similar to the current state of programming APIs to XML. To use an XML query language, such as the current draft W3C recommendation XQuery, you need to understand the structure of the XML document being queried and to navigate around it retrieving the information which interests you.
[0216] This has the same drawbacks for query users as the structure-level APIs have for developers. Users need to understand the structure of XML languages--which for large languages may be costly and error-prone--and queries are not transportable across XML languages.
[0217] Using MDL, we can build XML query tools which operate at the level of meaning rather than structure. In such a language, the query is expressed in terms independent of XML structure--so users can formulate queries without knowledge of XML language structures, and the same query can be re-used across many XML languages which express the same meaning.
[0218] A small demonstrator of a meaning-level XML query language has been constructed, which works as in FIG. 8.
[0219] This demonstrator is a batch Java program which accepts as input:
[0220] A text file containing the text of the query
[0221] The MDL for the language being queried against
[0222] The program itself does not answer the query, but generates a piece of XSLT. This XSLT, when used to transform a document in the language, will transform it into a piece of HTML. When the HTML is displayed on a browser it shows the answer to the query against the document--as in the diagram.
[0223] The queries which are input to this tool are expressed in a simple language of the form:
[0224] Display class.property, class.property . . . where condition and conditi on and . . .
[0225] Names of classes and properties are taken from the semantic model. Each condition is either of the form `class.property=value` (possibly using other relations such as `contains`, `>`) or of the form `className association className`. Despite its limited nature, this simple language can express a wide range of useful queries, linking together information about objects of several related classes. Most important, it expresses these queries entirely in terms of the semantic model, and independent of XML structure.
[0226] Typical queries in this language are:
[0227] Display orderLine.quantity, product.name where orderLine is PartOf purchaseOrder and orderLine isFor product.
[0228] Display address.city, address.zip where purchasingUnit hasAddress address.
[0229] The demonstration program parses and validates queries of this form, and devises a query strategy. This strategy defines the order of classes involved in visiting and filtering the objects of the classes mentioned in the query, using the query conditions to filter objects. The query strategy is then embodied in XSLT, using the MDL to convert semantic level conditions into XPaths to navigate the document.
[0230] The XSLT is then run on a standard XSLT processor, producing the output HTML file.
[0231] This is probably not the way you would want to run XML queries for everyday use, but it does demonstrate the capability. Alternative implementations could support interactive input of queries and display of results--probably using an XPath implementation directly to navigate the document, rather than generating XSLT containing XPath expressions.
[0232] In summary, this style of meaning-level query language has two key benefits over other existing XML query languages:
[0233] Users can write queries without knowing the structure of XML documents
[0234] The same query can be freely re-used across documents in several different
[0235] XML languages, provided their MDL is known.
[0236] 4. Automated XML Translation
[0237] A core application of XSLT is to translate documents from one XML language to another. It is implicit, although rarely stated, that the intention of such translations is to preserve the meaning in the documents. Therefore we would expect a Meaning Definition Language to be very relevant to XML translation.
[0238] It is only possible to translate documents between XML languages if their meanings overlap. If one language is about cookery and another about astronomy, we could not translate at all from one to the other. At the simplest level, we can test the overlap in meaning between two languages by comparing their MDL. We can test which components of meaning (which classes, properties and associations) are represented in both languages. It is only these `overlap` components or meaning that can be translated. So the MDL overlap acts as a specification of the translation.
[0239] However, we can do much more than this. Since MDL defines not only what information is expressed by each XML language, but also how it is expressed, the MDL can tell us how to extract each component of meaning from the input document, and how to package it in the output document. Therefore the MDL for the two languages (together with their structure definitions) is sufficient to create automatically the complete XSLT translation from one to the other. Charteris have developed a translation tool, XMuLator, which does just this. The way this operates is shown in FIG. 9.
[0240] The XMuLator translator generator is represented by the shaded circle. It takes as input:
[0241] The UML (or DAML+OIL) semantic model of classes, properties and associations
[0242] The structure definition (XML Schema or XDR) for the input language--here denoted as language (1)
[0243] The MDL definition for the input language
[0244] The structure definition for the output language--here called language (2)
[0245] The MDL definition of the output language
[0246] As output it generates a complete XSLT translation between the two languages. This can be used by any standards-conformant XSLT processor (such as XT, Saxon or Xalan) to translate documents from language 1 to language 2.
[0247] We have used XMuLator to generate and test all 13*12 translations between the thirteen purchase order message formats described above. We have verified that the output documents have the required structure for their lanaguages, and correctly represent all the information that can in principle be conveyed in the translation--i.e all the information conveyed by both the languages involved in a translation.
[0248] We have also carried out a stringent `round trip` test of the translations. In this, we verify that when a document is translated through some cycle of languages (such as A=>B=>A or A=>B=>C=>D=>A) the output document is a strict subset of the input document--so that any information which survives the round trip survives it undistorted. In general, not all the information in the input document will survive a round trip, because the languages do not overlap perfectly in the information they convey.
[0249] Amongst the 13 different purchase order languages we have translated are some deeply nested languages, and some very shallow languages, such as those resulting from the use of the Oracle XML SQL Utility (XSU). Therefore the translations have involved major structural changes to the XML--not just a few changes in tag names. These major structural transformations have all passed the stringent round trip test.
[0250] There are currently two alternatives to this meaning-based generation of XSLT translations. The first is to write XSLT by hand, and the second is to generate translations by some XML to-XML mapping tool such as Microsoft's BizTalk Mapper. The meaning-based approach has major advantages over both of these.
[0251] Compared with the meaning-driven approach, writing and debugging of XSLT is much more expensive and error-prone. Even to write one XSLT translation is, we believe, more costly than to write down the MDL for the two languages involved. The XSLT is generally a much larger and more complex document than the two MDL files; and in many cases you will already have the MDL files available.
[0252] However, it is when there are several different languages that the advantages of the MDL approach become overwhelming. With N different languages, you may require as many as N*(N-1) distinct translations between them. Using MDL, the cost of creating all these translations grows only as N (this is the cost of writing all the MDL files). This can rapidly amount to a huge cost difference--especially as each different language may go through a series of versions.
[0253] We believe that in practice the MDL-based approach is much more reliable than hand-writing of XSLT. Using MDL-based translation, as long as the meaning of each language has been captured accurately, then the translation will be accurate--accurate enough to pass the stringent round-trip tests. For complex languages, debugging XSLT to that level of accuracy would be very time-consuming.
[0254] XML mapping tools such as Biztalk Mapper display two tree diagrams side by side, showing the element nesting structures of two XML languages. The user can then drag-and-drop from one tree to the other, to define `mappings` between the two languages, and these mappings are used to generate an XSLT translation between them. However, this simple node-to-node mapping technique does not capture all the ways in which the two XML languages may represent associations; therefore it is not capable of translating association information correctly. For instance, if one language represents an association by shared values, while the other represents the same association by element nesting, tools like BizTalk Mapper cannot do faithful translations in both directions. Since association information is a vital part of XML content, and XML languages represent associations in a wide variety of ways, this means that XML-to-XML mapping tools will fail for many important translation tasks. Furthermore, since these tools require mappings to be defined afresh for each pair of languages, the cost of creating all possible translations between N languages grows as N*(N-1), rather than N.
[0255] Therefore the meaning-based automatic translation method, which is enabled by MDL, has major advantages over other available methods of XML translation.
[0256] 5. MDL and the Semantic Web
[0257] The vision of the Semantic Web is that the information content of web resources should be described in machine-usable terms, so that automatic agents can do useful tasks of finding information, logical inference and negotiating transactions. Therefore work on the Semantic Web has emphasised tools for describing meanings such as RDF Schema and DAML+OIL.
[0258] The Resource Description Framework (RDF) was designed to be semantically transparent--so that an automated agent can extract and use information from any RDF document, provided the agent has knowledge of the RDF Schemas used by the RDF. For RDF documents, therefore, access by automated agents is a realisable goal.
[0259] However, RDF is designed primarily to represent metadata--information about information resources on the web. This is how RDF tends to be used, so the semantic transparency and automated processing extends only to metadata in RDF. It is widely recognised (e.g Berners-Lee 1999) that XML itself does not have this semantic transparency--precisely because XML can represent meaning in many different ways.
[0260] Therefore as it stands, automated agents cannot access the information in (non-RDF) XML documents. They cannot step outside the RDF world to access the information in the bulk of XML documents on the web. This severely limits the ability of automated agents to access the information they need.
[0261] MDL can remove the restriction. If the authors of an XML language define its meaning in MDL, then (as described in previous sections) an automated software agent can access the information in any document in the language--greatly extending the power of automated agents.
[0262] We can illustrate this by a typical usage scenario for the Semantic Web. I hear from a friend about some Norwegian ski boots, but do not know the name of the manufacturer. I want to buy them over the web. My software agent finds the leading ontologies (RDF Schema based) used to describe WWW retail sites. From these ontologies it learns that Ski boots are a subclass of footwear and of sports gear; that to buy footwear you need to specify a foot size. It then inspects the RDF descriptions (metadata) of several online catalogues. The catalogues themselves are accessible in XML, whose MDL definitions are all referenced to the same RDF Schema. From the RDF, my agent identifies those catalogues which contain information about the kind of goods I want.
[0263] The agent then needs to retrieve information of the form `footwear from manufacturer based in Norway who makes sports gear`--applying the same retrieval criteria to several XML-based catalogues, which use different XML languages, and very different representations of the associations [manufacturer]makes[product], [manufacturer]based in[country] and so on. The only automated way to make these retrievals is to know the XPaths needed to retrieve the associations from the different XML languages. The MDL definitions of the languages provide just this information, enabling my software agent to retrieve and compare what it needs from the different catalogues.
[0264] Thus the agent uses a two-stage process of (1) access RDF metadata to find out which catalogues are relevant, and (1) using MDL, access the XML catalogues themselves and extract the required information. This two-stage process is much more powerful that the first enabled by RDF on its own.
[0265] In summary, realising the Semantic Web will require not only semantics, but also a bridge between semantics and XML structure. MDL provides that bridge.
[0266] 6. Documentation and validation
[0267] There are two other important applications of MDL which we have not described in this section, but will briefly mention:
[0268] The MDL for an XML language serves as a precise form of documentation of what the language authors intend it to mean, and how it is intended to convey that meaning. Since the language authors' intentions are not always clear from the schema and associated documentation, this extra documentation can be very useful.
[0269] Since MDL forms a bridge between meaning and structure, an MDL file can be validated against the definition of possible meanings (e.g. a DAML+OIL class model), against the definition of XML structure (e.g. an XML Schema), or against both together. This validation forms a very useful check that the XML is capable of conveying the meanings which the language authors intended. We have found that in many cases, the XML structure does not match up precisely with the intended meanings; these validation checks will frequently produce useful warnings.
[0270] 7. The Meaning-Level Approach to XML
[0271] We can summarise the potential impact of MDL as follows: MDL will enable both applications and users to interface to XML at the level of its meaning, rather than its structure.
[0272] Using MDL, users and application designers need not be concerned with the details of XML structure--with elements, attributes, nesting structure and paths through a document. They can think purely in terms of the meaning of the document (the objects, properties and associations it represents) and leave it to MDL-based tools to deal with document structure. These tools will automatically navigate the XPaths necessary to extract meaning from structure.
[0273] This meaning-level approach to XML has tremendous advantages--allowing users and developers to think at the level of meaning, which they understand; freeing them from the need to understand XML document structures, which may be extremely complex; and allowing us to develop any application once and then adapt it automatically, via MDL, to new XML languages in its domain.
[0274] We believe that as XML languages continue to proliferate, the benefits of the meaning-level approach will become overwhelming. In time, all access to XML documents will move to the level of meaning rather than structure. There are many precedents for this move in the history of programming. There is an almost inevitable tendency to move up from structural, implementation-level tools to application-level, meaning-level development tools. The whole progress from assembler languages to high level languages, then to `fourth generation` languages is an example of this trend. Another example comes from databases.
[0275] In the 1970s databases were based on a Codasyl navigational model, which exposed a pointer-based database structure to users and application developers. To get at information you had to grapple with database structure, following the pointers. Relational Databases and SQL removed this tight structure dependence of data, enabling us to view data in more structure-independent ways. This was such an advance that it swept the Codasyl database model into history.
[0276] In the next few years, we will make similar advances in how we regard XML documents, seeing them in terms of their information content rather than structure. Structure-centred views of XML may become history, just as Codasyl databases are now history. MDL can be the key tool to enable this meaning-level view of XML.
[0277] Demonstration programs for the MDL-based meaning-level API to XML, and the meaning-level query language are available (as Java source code and jar files, with sample XML and MDL files) from http://www/charteris.com/mdl.
[0278] This detailed description concludes with an Appendix 1, which is the User Guide to an implementation of the present invention known as the XMuLator.TM.. Appendix 1 should be consulted for a detailed discussion of the following points:
[0279] Solving the XML Interoperability problem
[0280] The Model of business meanings
[0281] Building a business information model
[0282] Capturing the syntax of XML schemas
[0283] Recording how XML represents business information
[0284] Generating and using xslt transformations
[0285] Building the business process model
[0286] Installing and running XMuLator.TM.
[0287] Utilities
[0288] Appendix A: Sample XSL Transformation
[0289] Appendix B: XmuLator Database Schema
[0290] Appendix C: Mapping Rules
[0291] The remainder of this section of the Detailed Description will focus on the transformation algorithm.
[0292] Generating Translations
[0293] In this section a preferred embodiment of generating the translations is given. This describes the essence of the algorithm
[0294] XMuLator Algorithm Outline
[0295] The information input to the transformation generation algorithm consists of three main parts:
[0296] 1. The business information model, consisting of the definitions of classes of entities, attributes of those entities and the relations of those entities. The information content of these is just what the user inputs. This is stored in a relational database in three main tables--one for classes (including the class hierarchy, defined by storing a superclass in each class record), one for attributes and one for relations. The same information could of course be stored in an object-oriented database or in other forms. Generically, business information classes, attributes and relations will be referred to as "business model objects". Business model objects are examples of business information model logical structures.
[0297] 2. The definitions of XML-based languages, consisting of information automatically extracted from their DTDs or XDR files (and in future, XML schemas). Generically, a DTD or XDR or XML schema will be referred to as a "schema". The schema information is stored in relational form, in three main tables--one for the element types in the schema, one for the attributes and one for the content model links (in a schema, the content model of an element defines how other elements are nested inside it--what element types are allowed, any ordering and occurrence constraints, etc). One content model link is stored for every element type that can be nested immediately inside another element type. The whole of the information in a schema, including the allowed orders of elements in an element, can be reconstructed from what is stored in the three tables. Generically XML element types, attribute types and content model links will be referred to as "XML objects". XML objects are examples of XML logical structures.
[0298] 3. The definitions of how each XML-based language represents information in the business information model. One XML object (element, attribute or content model link) can represent one or more business model objects (class, attribute or relation). When it does so, there is said to be a "mapping" from the XML object to the business model object. These mappings are stored in three main tables--one of which defines which business model entities of a given class are represented by which XML objects, one defining which business model attributes are represented by which XML objects, and a third table doing the same for business model relations. These tables contain supplementary information about how the XML object represents the business model object. The complete information content of these tables is defined by the user input
[0299] The storage of these objects in relational tables is not a necessary part of the algorithm. In practice all this information is held the main memory of the computer (for instance, as Java objects which are instances of Java classes) for the duration of the calculation which generates the XSLT. In some implementations, these Java objects can be created from information read in from files (typically XML files) rather than from a Relational Database.
[0300] Consider a translation between two XML-based languages (sources) called the input and the output source respectively. If an element of type A of the input represents entities of some class X, while some element type B in the output represents entities of a class Y, and Y is a superclass of X, then it may be possible to transform the input elements A into output elements B. This is possible because every X is a Y. But transformation is generally not possible the other way round because a Y may not be an X.
[0301] Before starting to generate the XSL, the algorithm constructs a set of quadruples {output element, output class, input class, input element} where the input element represents the input class, the output element represents the output class, and the output class is equal to the input class or is a superclass of the input class.
[0302] Content-bearing elements are those elements which represent business model objects. Wrapper elements are those elements which are not content-bearing, but which have to be traversed to get to content-bearing elements. In the output XML, they appear wrapped around the content-bearing elements.
[0303] The translation generation algorithm does a traverse of the output tree structure as defined by the output XML schema. The traverse is not a pure recursive descent, but has recursive descent parts (mainly to navigate through wrapper elements). This generates XSL which will create output XML with the output tree structure, obeying the ordering constraints of the output XML schema. As it navigates the output tree, at each stage the algorithm works out which nodes in the input tree (if any) contain the required information. It creates XSL to (a) navigate the input tree from the current input node to find those nodes (using XPath syntax), and (b) extract information from those nodes (e.g. values of attributes) to include in the output XML.
[0304] The generated XSL consists of a set of templates. There is one template for the top-level element type of the output XML, and one template for each output element type which represents a business model class. If output element A is nested inside element B, then the template for B contains an xsl:apply-templates node to apply the template for A, generating the instances of A nested inside the instances of B in the output XML. The templates for A and B are both attached to the root element of the XSL document, so the XSL tree is flatter than the XML tree it will create. Other templates are also generated to fill in details of relations and attributes.
[0305] A typical template for the top-level element, as generated by the algorithm, is:
10
<!-- Outermost wrapper node --> <xsl:template match="/schools6"> <schools2> <xsl:apply-templates select="course6" mode="main"/> </schools2> </xsl:template>
[0306] In this example, all output elements and attributes have names ending in `2`, while all input attributes and examples end in `6`. The top-level template simply calls templates for all elements which represent entities and which appear at the next-to-top level in the output. Comments are always contained as <!- comment ->(this is standard XML).
[0307] The XSL is first generated as a DOM tree, which is then written out as a text file. (DOM=Domain Object Model, a W3C standard for internal program representation of XML. XSL is a form of XML and so can be represented this way). Thus instead of having to write out the two <xsl:template> lines with two <schools2> lines between them, the algorithm has to attach an `xsl:template` node to the root of the XSL document, and then attach a `schools2` node to the `xsl:template` node. Writing out this tree then produces the nested text, as in the example. This is standard practice, supported by DOM-compliant XML parsers.
[0308] For simplicity, assume the input has one top-level element type `ot`, and the input has one top-level element type `it`. With many details left out for clarity, the algorithm to generate the top-level tree is to call topTemplate(ot, it) where: topTemplate(e,g)
11
{ [attach to root] xsl:template node match = g; [attach to template] XSL node e (to generate e in the output XML); for each content model (CM) link in e: { f = output element inside the CM link; if (f is a wrapper element) topTemplate(f,g); else if (f represents class C) and (input element h represents C or a subclass D) { [attach to template] xsl:apply-templates select = (input path from g to h); } } }
[0309] For every output element f which represents a class C, and for which there is an input element h representing C or a subclass D, the algorithm generates a template. A typical one of these entity-representing templates is:
12
<!-- Entity `course` --> <xsl:template match="course6" mode="main"> <course2> <!-- Attribute `course:course name` --> <xsl:attribute name="id2"> <xsl:value-of select="@name6"/> </xsl:attribute> <!-- Relation [student]attends[course] --> <xsl:apply-templates select="parent::schools6/stu- dent6[contains(@attends6,current( )/@id)]" mode="main"/> </course2> </xsl:template>
[0310] The XPath to navigate the input tree is the stuff like `parent::schools6/student6`. These entity-representing templates are created by calls to classTemplate(f,h):
13
ClassTemplate(f,h) { [attach to root] xsl-template node match = h; [attach to template] XSL node f; for each XML attribute ao in f: { if (ao represents attribute A) and (input XML object ai represents A): { [attach to f] xsl-attribute ao; [attach to attribute] xsl:value-of select = (input path from h to ai) } else if (ao represents relation R) and (input XML object ai represents R) { [attach to f] xsl-attribute ao; [attach to attribute] xsl:apply-templates select = (input path from h to ai, with [conditions defining R]) } } doContentLinks(f,h); } doContentLinks(f,h) { (f represents class C; h represents class D) for each content model link L in f (traversed in schema order) { g = output element inside CM link; if (g is a wrapper) doContentLinks(g,h) else if (g represents attribute A) and (input XML object ai represents A): { [attach to f] XSL node g; [attach to g] xsl:value-of select = (input path from h to ai); } else if (g represents class E) and (input XML object ai represents subclass F) and (L represents relation R between C and E) and (input object ri represents R between D and F) { [attach to f] xsl:apply templates select = (input path from h to ai with [conditions defining R]); } else if (g represents relation R) and (input object ri represents R) { [attach to f] XSL node g; [attach to g] xsl:apply templates select = (input path from h to ai with [conditions defining R]) mode = `relationx`; [attach to root] xsl:template match = ai, mode = `relationx`; for each (property used to identify the entity at other end of relation) {[attach to template] xsl:value-of select(property);} } } }
[0311] These descriptions of the algorithm are highly simplified, with many details omitted t concentrate on the main principles.
[0312] Variations of the Above Embodiment
[0313] In the above embodiment, the algorithm operates in a manner analogous to that of a compiler, and in particular uses the technique known as `recursive descent`. The same effect could be achieved by using other compiler techniques, such as table driven or stack based, in which the recursion is `unwound`. Other translation approaches are also possible: the next section discuss a direct translation embodiment.
[0314] A Direct Translation Embodiment
[0315] In this embodiment, rather than outputting a text XSL file which is used by a separate XSL processor, the transformation information is used `in situ` to translate XML on the fly. In many cases this might be a very sensible thing to do anyway. A procedure or algorithm to accomplish this is now described.
[0316] 1. The XSL is generated as described elsewhere in this patent specification, and stored in memory.
[0317] 2. read the input XML to form a DOM tree of input XML.
[0318] 3. create the root of an output XML DOM tree.
[0319] 4. navigate around the XSL DOM tree (using a standard DOM API, and perhaps using a `visitor` design pattern), and at every node just follow the instructions on that node--to traverse a bit of the input tree, read a value from the input tree, apply a template, create a bit of the output tree, etc., and then
[0320] 5. output the output DOM tree to a file.
[0321] In a typical example of this direct translation embodiment, the translator program reads in XML-based definitions of the mappings onto the business information model for each language. These XML-based definitions include definitions of the XPaths to be navigated in each XML language to extract each kind of information in the business information model. When generating a piece of the output XML, the translator looks up what kind of business information that piece of output XML conveys, looks up the XPaths in the input XML needed to extract the same information, follows those paths in the input XML to extract the values of the information, and inserts those values in the output XML.
[0322] A Code Generation Embodiment
[0323] In this embodiment, the algorithm does not generate an XSL DOM tree or output file, but generates code in some programming language such as Java, C++ or Visual Basic for inclusion in a computer application. The computer application can then receive and send XML messages in the XML-based language, but can manipulate the information from the messages in terms of the classes, attributes and relations of the business information model--thus insulating the application from changes in the XML-based language.
[0324] In a Java-based implementation of this embodiment, the algorithm generates source code for a set of Java classes which correspond to the classes of the business information model. An XML parser is included in the application to read in external XML files to an internal DOM tree form, and vice versa To read information from an input message in some XML-based language, each Java class contains code which can traverse the DOM tree of the input XML message so as to read the information which the message conveys about entities of the class, their relations and attributes, and converts that information into a form which is independent of the XML-based language. The Java class makes this information available to the rest of the application