United States Patent Application20020133484
Kind CodeA1
Chau, Hoang K. ; et al.September 19, 2002

Storing fragmented XML data into a relational database by decomposing XML documents with application specific mappings
Abstract
A technique is provided for creating metadata for fast search of XML documents stored as column data. Data is stored in a data store connected to a computer. A main table is created having a column for storing a document, wherein the document has one or more elements or attributes. One or more side tables are created, wherein each side table stores one or more elements or attributes. Then, the side tables are used to locate data in the main table with scalable indexing mechanisms to facilitate search. A technique is provided for generating one or more XML documents from a single SQL query. Data stored on a data storage device that is connected to a computer is transformed. A query that selects data in the data storage device is received. The selected data is retrieved into a work space. Then, one or more XML documents are generated to consist of the selected data. A technique is provided for generating one or more XML documents from a relational database using the XPath data model. Data stored on a data storage device that is connected to a computer is transformed. Initially, a document object model tree is generated using a document access definition, which defines the mapping between an XML tree structure and relational tables. The document object model tree is traversed to obtain information to retrieve relational data. The relational data is mapped to one or more XML documents. A technique is provided to store fragmented XML data into a relational database by decomposing XML documents with application specific mappings. Data stored on a data store that is connected to a computer is transformed. Initially, an XML document containing XML data is received. A document access definition that identifies one or more relational tables and columns is received. The XML data is mapped from the application DTD to the relational tables and columns using the document access definition based on the XPath data model.

Inventors:Chau; Hoang K. (Sunnyvale, CA), Cheng; Isaac Kam-Chak  (San Jose, CA), Cheng; Josephine Miu  (San Jose, CA), Chiu; Suet Mui  (San Jose, CA), Chow; Jyh-Herng  (San Jose, CA), Pauser; Michael Leon  (Morgan Hill, CA), Xu; Jian  (San Jose, CA)
Correspondence Name and Address:Gates & Cooper LLP Howard Hughes Center 6701 Center Drive West, Suite 1050
Attention: George H. Gates
Los Angeles
CA
90045
US
Series Code:062069
Filed:January 31, 2002
U.S. Current Class:707/3
U.S. Class at Publication:707/3
Intern'l Class:G06F 007/00

Claims


What is claimed is:
1. A method of locating data in a data store connected to a computer, the method comprising the steps of: creating a main table having a column for storing a document, wherein the document has one or more elements or attributes; creating one or more side tables, wherein each side table stores one or more elements or attributes; and using the side tables to locate data in the main table.

2. The method of claim 1, wherein the document in the column is an extensible markup language document.

3. The method of claim 1, wherein one or more side tables are created after the column for storing the document is enabled.

4. The method of claim 1, further comprising generating the side tables using a data access definition.

5. The method of claim 4, further comprising providing a graphical user interface to enable a user to create the data access definition.

6. The method of claim 1, further comprising converting the elements or attributes to SQL data types.

7. The method of claim 1, further comprising generating one or more triggers to provide synchronization between the main table and side tables.

8. The method of claim 7, wherein a trigger is activated upon data being inserted into the column for storing a document.

9. The method of claim 7, wherein a trigger is activated upon data being modified in the column for storing a document.

10. The method of claim 1, wherein data is located using a location path.

11. The method of claim 1, further comprising creating an index on each side table.

12. The method of claim 11, wherein locating data in the main table further comprises: searching for the data in the side tables using the indexes; and mapping data located in the side tables to data in the main table.

13. The method of claim 1, further comprising searching from a join view.

14. The method of claim 1, further comprising receiving a query on one or more side tables and using the side tables to locate data in the queried side tables.

15. The method of claim 1, further comprising using an extracting user-defined function to locate data.

16. The method of claim 1, further comprising searching on an element or attribute with multiple occurrences.

17. The method of claim 1, further comprising performing a text search on the document.

18. The method of claim 1, further comprising performing a range search.

19. The method of claim 1, further comprising enabling the column.

20. The method of claim 1, further comprising disabling the column.

21. An apparatus for locating data in a data store, comprising: a computer having a data store coupled thereto, wherein the data store stores data; and one or more computer programs, performed by the computer, for creating a main table having a column for storing a document, wherein the document has one or more elements or attributes, for creating one or more side tables, wherein each side table stores one or more elements or attributes, and for using the side tables to locate data in the main table.

22. The apparatus of claim 21, wherein the document in the column is an extensible markup language document.

23. The apparatus of claim 21, wherein one or more side tables are created after the column for storing the document is enabled.

24. The apparatus of claim 21, further comprising generating the side tables using a data access definition.

25. The apparatus of claim 24, further comprising providing a graphical user interface to enable a user to create the data access definition.

26. The apparatus of claim 21, further comprising converting the elements or attributes to SQL data types.

27. The apparatus of claim 21, further comprising generating one or more triggers to provide synchronization between the main table and side tables.

28. The apparatus of claim 27, wherein a trigger is activated upon data being inserted into the column for storing a document.

29. The apparatus of claim 27, wherein a trigger is activated upon data being modified in the column for storing a document.

30. The apparatus of claim 21, wherein data is located using a location path.

31. The apparatus of claim 21, further comprising creating an index on each side table.

32. The apparatus of claim 31, wherein locating data in the main table further comprises: searching for the data in the side tables using the indexes; and mapping data located in the side tables to data in the main table.

33. The apparatus of claim 21, further comprising searching from a join view.

34. The apparatus of claim 21, further comprising receiving a query on one or more side tables and using the side tables to locate data in the queried side tables.

35. The apparatus of claim 21, further comprising using an extracting user-defined function to locate data.

36. The apparatus of claim 21, further comprising searching on an element or attribute with multiple occurrences.

37. The apparatus of claim 21, further comprising performing a text search on the document.

38. The apparatus of claim 21, further comprising performing a range search.

39. The apparatus of claim 21, further comprising enabling the column.

40. The apparatus of claim 21, further comprising disabling the column.

41. An article of manufacture comprising a program storage medium readable by a computer and embodying one or more instructions executable by the computer to perform method steps for locating data in a data store connected to the computer, the method comprising the steps of: creating a main table having a column for storing a document, wherein the document has one or more elements or attributes; creating one or more side tables, wherein each side table stores one or more elements or attributes; and using the side tables to locate data in the main table.

42. The article of manufacture of claim 41, wherein the document in the column is an extensible markup language document.

43. The article of manufacture of claim 41, wherein one or more side tables are created after the column for storing the document is enabled.

44. The article of manufacture of claim 41, further comprising generating the side tables using a data access definition.

45. The article of manufacture of claim 44, further comprising providing a graphical user interface to enable a user to create a data access definition.

46. The article of manufacture of claim 41, further comprising converting the elements or attributes to SQL data types.

47. The article of manufacture of claim 41, further comprising generating one or more triggers to provide synchronization between the main table and side tables.

48. The article of manufacture of claim 47, wherein a trigger is activated upon data being inserted into the column for storing a document.

49. The article of manufacture of claim 47, wherein a trigger is activated upon data being modified in the column for storing a document.

50. The article of manufacture of claim 41, wherein data is located using a location path.

51. The article of manufacture of claim 41, further comprising creating an index on each side table.

52. The article of manufacture of claim 51, wherein locating data in the main table further comprises: searching for the data in the side tables using the indexes; and mapping data located in the side tables to data in the main table.

53. The article of manufacture of claim 41, further comprising searching from a join view.

54. The article of manufacture of claim 41, further comprising receiving a query on one or more side tables and using the side tables to locate data in the queried side tables.

55. The article of manufacture of claim 41, further comprising using an extracting user-defined function to locate data.

56. The article of manufacture of claim 41, further comprising searching on an element or attribute with multiple occurrences.

57. The article of manufacture of claim 41, further comprising performing a text search on the document.

58. The article of manufacture of claim 41, further comprising performing a range search.

59. A method of transforming data stored on a data storage device that is connected to a computer, the method comprising: receiving a query that selects data in the data storage device; retrieving the selected data into a work space; and generating one or more XML documents to consist of the selected data.

60. The method of claim 59, wherein the work space comprises a table having one or more columns and wherein the one or more XML documents are generated by mapping each column to an element or attribute of one of the XML documents.

61. The method of claim 59, wherein the one or more XML documents are generated using a data access definition.

62. The method of claim 61, further comprising using, a document type definition to validate the one or more XML documents.

63. The method of claim 61, further comprising using a document type definition to prepare the document access definition.

64. The method of claim 61, wherein the document access definition further comprises an Extensible Markup Language Path data model based definition of the one or more XML documents to be generated.

65. The method of claim 59, wherein the work space comprises a table and further comprising mapping column names of a table to equivalence classes.

66. The method of claim 65, wherein the equivalence classes are defined by a user.
67 The method of claim 65, wherein the equivalence classes are defined by a heuristic approach.
68. The method of claim 59, further comprising removing duplicates from the selected data.
69. The method of claim 59, wherein generating one or more XML documents comprises using an Xcollection definition that defines how to compose the one or more XML documents from the retrieved selected data.
70. The method of claim 68, wherein the Xcollection definition is contained in a data access definition.
71. The method of claim 68, wherein the Xcollection definition comprises an SQL_query element.
72. The method of claim 59, further comprising, prior to retrieving data, parsing a document access definition.
73. The method of claim 59, wherein the one or more XML documents are generated using a query, XML composition stored procedures, and a document access definition.
74. The method of claim 59, wherein the data to generate one or more XML documents is stored in an XML collection.
75. The method of claim 59, wherein the one or more XML documents are shared between businesses.
76. The method of claim 59, wherein the one or more XML documents are generated b y stored procedures.
77. The method of claim 59, wherein the stored procedures can be called from database client code.
78. An apparatus for transforming data, comprising: a computer having a data store coupled thereto, wherein the data store stores data; and one or more computer programs, performed by the computer, for receiving a query that selects data in the data storage device, retrieving the selected data into a work space, and generating one or more XML documents to consist of the selected data.
79. The apparatus of claim 78, wherein the work space comprises a table having one or more columns and wherein the one or more XML documents are generated by mapping each column to an element or attribute of one of the XML documents.
80. The apparatus of claim 78, wherein the one or more XML documents are generated using a data access definition.
81. The apparatus of claim 80, further comprising using a document type definition to validate the one or more XML documents.
82. The apparatus of claim 80, further comprising using a document type definition to prepare the document access definition.
83. The apparatus of claim 80, wherein the document access definition further comprises an Extensible Markup Language Path data model based definition of the one or more XML documents to be generated.
84 The apparatus of claim 78, wherein the work space comprises a table and further comprising creating mapping column names of a table to equivalence classes.
85. The apparatus of claim 84, wherein the equivalence classes are defined by a user.
86. The apparatus of claim 84, wherein the equivalence classes are defined by a heuristic approach.
87. The apparatus of claim 78, further comprising removing duplicates from the selected data.
88. The apparatus of claim 78, wherein generating one or more XML documents comprises using an Xcollection definition that defines how to compose the one or more XML documents from the retrieved selected data.
89. The apparatus of claim 88, wherein the Xcollection definition is contained in a data access definition.
90. The apparatus of claim 88, wherein the Xcollection definition comprises an SQL query element.
91. The apparatus of claim 78, further comprising, prior to retrieving data, parsing a document access definition.
92. The apparatus of claim 78, wherein the one or more XML documents are generated using a query, XML composition stored procedures, and a document access definition.
93. The apparatus of claim 78, wherein the data to generate one or more XML documents is stored in an XML collection.
94. The apparatus of claim 78, wherein the one or more XML documents are shared between businesses.
95. The apparatus of claim 78, wherein the one or more XML documents are generated by stored procedures.
96. The apparatus of claim 78, wherein the stored procedures can be called from database client code.
97. An article of manufacture comprising a program storage medium readable by a computer and embodying one or more instructions executable by the computer to perform method steps for transforming data in a data store connected to the computer, the method comprising the steps of: receiving a query that selects data in the data storage device; retrieving the selected data into a work space; and generating one or more XML documents to consist of the selected data.
98. The article of manufacture of claim 97, wherein the work space comprises a table having one or more columns and wherein the one or more XML documents are generated by mapping each column to an element or attribute of one of the XML documents.
99. The article of manufacture of claim 97, wherein the one or more XML documents are generated using a data access definition.
100. The article of manufacture of claim 99, further comprising using a document type definition to validate the one or more XML documents.
101. The article of manufacture of claim 99, further comprising using a document type definition to prepare the document access definition.
102. The article of manufacture of claim 99, wherein the document access definition further comprises an Extensible Markup Language Path data model based definition of the one or more XML documents to be generated.
103. The article of manufacture of claim 97, further comprising mapping column names of a table to equivalence classes.
104. The article of manufacture of claim 103, wherein the equivalence classes are defined by a user.
105. The article of manufacture of claim 103, wherein the equivalence classes are defined by a heuristic approach.
106. The article of manufacture of claim 97, further comprising removing duplicates from the selected data.
107. The article of manufacture of claim 97, wherein generating one or more XML documents comprises using an an Xcollection definition that defines how to compose the one or more XML documents from the retrieved selected data.
108. The article of manufacture of claim 107, wherein the Xcollection definition is contained in a data access definition.
109. The article of manufacture of claim 107, wherein the Xcollection definition comprises an SQLquery element.
110. The article of manufacture of claim 97, further comprising, prior to ietrieving data, parsing a document access definition.
111. The article of manufacture of claim 97, wherein the one or more XML documents are generated using a query, XML composition stored procedures, and a document access definition.
112. The article of manufacture of claim 97, wherein the data to generate one or more XML documents is stored in an XML collection.
113. The article of manufacture of claim 97, wherein the one or more XML documents are shared between businesses.
114. The article of manufacture of claim 97, wherein the one or more XML documents are generated by stored procedures.
115. The article of manufacture of claim 97, wherein the stored procedures can be called from database client code.
116. A method of transforming data stored on a data storage device that is connected to a computer, the method comprising: generating a document object model tree using a document access definition: traversing the document object model tree to obtain information to retrieve relational data; and mapping the relational data to one or more XML documents.
117. The method of claim 116, wherein the document access definition defines a mapping between the relational data and one or more XML documents.
118. The method of claim 116, wherein the document object model tree comprises one or more relational database nodes.
119. The method of claim 118, wherein a relational database node comprises an attribute node that maps to a column of a relational database table.
120. The method of claim 118, wherein a relational database node comprises an element node that maps to a column of a relational database table.
121. The method of claim 118, wherein a relational database node comprises a text node that maps to a column of a relational database table.
122. The method of claim 118, wherein the relational database node identifies a relational table into which XML document data is to be stored.
123. The method of claim 118, wherein the relational database node identifies a column in a relational table that contains XML document data to be retrieved.
124. The method of claim 118, wherein the relational database node identifies one or more predicates used to select column data from a relational table to store into one or more XML documents.
125. The method of claim 118, wherein the relational database node identifies a join relationship for joining multiple tables.
126. The method of claim 125, wherein the relational database node identifies a primary and foreign key relationship for the join relationship.
127. The method of claim 116, further comprising generating queries to obtain relational data using the document object model tree.
128. The method of claim 116, wherein the relational data comprises an attribute value to be written to an XML document.
129. The method of claim 116, wherein the relational data comprises element text to be written to an XML document.
130. The method of claim 116, further comprising a stored procedure that receives the document access definition and outputs a table populated with the one or more XML documents.
131. An apparatus for transforming data, comprising: a computer having a data store coupled thereto, wherein the data store stores data; and one or more computer programs, performed by the computer, for generating a document object model tree using a document access definition, traversing the document object model tree to obtain information to retrieve relational data, and mapping the relational data to one or more XML documents.
132. The apparatus of claim 131, wherein the document access definition defines a mapping between the relational data and one or more XML documents.
133. The apparatus of claim 131, wherein the document object model tree comprises one or more relational database nodes.
134. The apparatus of claim 133, wherein a relational database node comprises an attribute node that maps to a column of a relational database table.
135. The apparatus of claim 133, wherein a relational database node comprises an element node that maps to a column of a relational database table.
136. The apparatus of claim 133, wherein a relational database node comprises a text node that maps to a column of a relational database table.
137. The apparatus of claim 133, wherein the relational database node identifies a relational table into which XML document data is to be stored.
138. The apparatus of claim 133, wherein the relational database node identifies a column in a relational table that contains XML document data to be retrieved.
139. The apparatus of claim 133, wherein the relational database node identifies one or more predicates used to select column data from a relational table to store into one or more XML document s.
140. The apparatus of claim 133, wherein the relational database node identifies a join relationship for joining multiple tables.
141. The apparatus of claim 140, wherein the relational database node identifies a primary and foreign key relationship for the join relationship.
142. The apparatus of claim 131, further comprising generating queries to obtain relational data using the document object model tree.
143. The apparatus of claim 131. wherein the relational data comprises an attribute value to be written to an XML document.
144. The apparatus of claim 131, wherein the relational data comprises element text to be written to an XML document.
145. The apparatus of claim 131, further comprising a stored procedure that receives the document access definition and outputs a table populated with the one or more XML documents.
146. An article of manufacture comprising a program storage medium readable by a computer and embodying one or more instructions executable by the computer to perform steps for transforming data in a data store connected to the computer, comprising: generating a document object model tree using a document access definition; traversing the document object model tree to obtain information to retrieve relational data; and mapping the relational data to one or more XML documents.
147. The article of manufacture of claim 146, wherein the document access definition defines a mapping between the relational data and one or more XML documents.
148. The article of manufacture of claim 146, wherein the document object model tree comprises one or more relational database nodes.
149. The article of manufacture of claim 148, wherein a relational database node comprises an attribute node that maps to a column of a relational database table.
150. The article of manufacture of claim 148. wherein a relational database node comprises an element node that maps to a column of a relational database table.
151. The article of manufacture of claim 148, wNhlerein a relational database node comprises a text node that maps to a column of a relational database table.
152. The article of manufacture of claim 148, wherein the relational database node identifies a relational table into which XML document data is to be stored.
153. The article of manufacture of claim 148,wherein the relational database node identifies a column in a relational table that contains XML document data to be retrieved.
154. The article of manufacture of claim 148, wherein the relational database node identifies one or more predicates used to select column data from a relational table to store into one or more XML documents.
155. The article of manufacture of claim 148, wherein the relational database node identifies a join relationship for joining multiple tables.
156. The article of manufacture of claim 155, wherein the relational database node identifies a primary and foreign key relationship for the join relationship.
157. The article of manufacture of claim 146, further comprising generating queries to obtain relational data using the document object model tree.
158. The article of manufacture of claim 146, wherein the relational data comprises an attribute value to be written to an XML document.
159. The article of manufacture of claim 146, wherein the relational data comprises element text to be written to an XML document.
160. The article of manufacture of claim 146, further comprising a stored procedure that receives the document access definition and outputs a table populated with the one or more XML documents.
161. A method of transforming data stored on a data store that is connected to a computer, comprising: receiving an XML document containing XML data; receiving a document access definition that identifies one or more relational tables and columns; and mapping the XML data to the relational tables and columns using the document access definition.
162. The method of claim 161, further comprising generating a first document object model tree using the XML document.
163. The method of claim 162, wherein the first document object model tree is generated by parsing the XML document.
164. The method of claim 161, further comprising generating a second document object model tree using the document access definition.
165. The method of claim 164, wherein the second document object model tree is generated by parsing the document access definition.
166. The method of claim 161, wherein mapping further comprises: generating a first document object model tree using data from the XML document; generating a second document object model tree using a document access definition; and mapping the data from the first document object model tree into columns in one or more relational tables using the second document object model tree.
167. The method of claim 161, wherein the XML data is stored untagged into the relational tables.
168. The method of claim 161, wherein the relational tables are new tables.
169. The method of claim 161, wherein the relational tables are existing tables.
170. The method of claim 161, wherein the mapping is performed by a stored procedure.
171. An apparatus for transforming data, comprising: a computer having a data store coupled thereto, wherein the data store stores data; and one or more computer programs, performed by the computer, for receiving an XML document containing XML data, receiving a document access definition that identifies one or 5 more relational tables and columns, and mapping the XML data to the relational tables and columns using the document access definition.
172. The apparatus of claim 171, further comprising generating a first document object model tree using the XML document.
173. The apparatus of claim 172, wherein the first document object model tree is generated by parsing the XML document.
174. The apparatus of claim 171, further comprising generating a second document object model tree using the document access definition.
175. The apparatus of claim 174, wherein the second document object model tree is generated by parsing the document access definition.
176. The apparatus of claim 171, wherein mapping further comprises: generating a first document object model tree using data from the XML document: generating a second document object model tree using a document access definition; and mapping the data from the first document object model tree into columns in one or more relational tables using the second document object model tree.
177. The apparatus of claim 171, wherein the XML data is stored untagged into the relational tables.
178. The apparatus of claim 171, wherein the relational tables are new tables.
179. The apparatus of claim 171, wherein the relational tables are existing tables.
180. The apparatus of claim 171, wherein the mapping is performed by a stored procedure.
181. An article of manufacture comprising a program storage medium readable by a computer and embodying one or more instructions executable by the computer to perform steps for transforming data in a data store connected to the computer, comprising: receiving XML document containing XML data; receiving a document access definition that identifies one or more relational tables and columns; and mapping the XML data to the relational tables and columns using the document access definition.
182. The article of manufacture of claim 181, further comprising generating a first document object model tree using the XML document.
183. The article of manufacture of claim 182, wherein the first document object model tree is generated by parsing the XML document.
184. The article of manufacture of claim 181, further comprising generating a second document object model tree using the document access definition.
185. The article of manufacture of claim 184. wherein the second document object model tree is generated by parsing the document access definition.
186. The article of manufacture of claim 181, wherein mapping further comprises: generating a first document object model tree using data from the XML document; generating a second document object model tree using a document access definition; and mapping the data from the first document object model tree into columns in one or more relational tables using the second document object model tree.
187. The article of manufacture of claim 181, wherein the XML data is stored untagged into the relational tables.
188. The article of manufacture of claim 181, wherein the relational tables are new tables.
189. The article of manufacture of claim 181, wherein the relational tables are existing tables.
190. The article of manufacture of claim 181, wherein the mapping is performed by a stored procedure.

Description



PROVISIONAL APPLICATION

[0001] This application claims the benefit of U.S. Provisional Application No. 60/168,659. entitled "XML DOCUMENT PROCESSING," filed on Dec. 2, 1999, by Isaac Cheng. et al., attorney's reference number ST9-99-106, which is incorporated by reference herein.

FIELD OF THE INVENTION

[0002] This invention relates in general to computer-implemented database systems, and, in particular, to processing Extensible Markup Language (XML) documents.

BACKGROUND OF THE INVENTION

[0003] The Internet is a collection of computer networks that exchange information via Hyper Text Transfer Protocol (HTTP). The Internet computer network consists of many internet networks. Currently, the use of the Internet computer network for commercial and noncommercial uses is exploding. Via its networks, the Internet computer network enables many users in different locations to access information stored in data sources (e.g., databases) stored in different locations.

[0004] The World Wide Web (i.e., the "WWW" or the "Web") is a hypertext information and communication system used on the Internet computer network with data communications operating according to a client/server model. Typically, a Web client computer will request data stored in data sources from a Web server computer, at which Web server software resides. The Web server software interacts with an interface connected to, for example. a Database Management System ("DBMS"), which is connected to the data sources. These computer programs residing at the Web server computer will retrieve the data and transmit the data to the client computer. The data can be any type of information, including database data, static data, HTML data, or dynamically generated data.

[0005] With the fast growing popularity of the Internet and the World Wide Web (also known as "WWW" or the "Web"), there is also a fast growing demand for Web access to databases.

[0006] Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. Relational databases are organized into physical tables which consist of rows and columns of data. The rows are formally called tuples. A database will typically have many physical tables and each physical table will typically have multiple tuples and multiple columns. The physical tables are typically stored on random access storage devices (RASED) such as magnetic or optical disk drives for semi-permanent storage. Additionally, logical tables or "views" can be generated based on the physical tables and provide a particular way of looking at the database. A view arranges rot s in some order, without affecting the physical organization of the database.

[0007] RDBMS software using a Structured Query Language (SQL) interface is shell known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).

[0008] The SQL interface allows users to formulate relational operations on the tables either interactively, in batch files, or embedded in host languages, such as C and COBOL. SQL allows the user to manipulate the data. The definitions for SQL provide that a RDBMS should respond to a particular query with a particular set of data given a specified database content, but the technique that the RDBMS uses to actually find the required information in the tables on the disk drives is left up to the RDBMS. Typically, there will be more than one technique that can be used by the RDBMS to access the required data. The RDBMS will optimize the technique used to find the data requested in a query in order to minimize the computer time used and, therefore, the cost of performing the query.

[0009] Additionally, an index is an ordered set of references to the records or rows in a database file or table. The index is used to access each record in the file using a key (i.e., one of the fields of the record or attributes of the row). When data is to be retrieved, an index is used to locate records. Then, the data is sorted into a user-specified order and returned to the user.

[0010] Extensible Markup Language (XML) is a new specification that is quickly gaining popularity for creating what are termed "XML documents". XML documents comprise structured data. XML documents are being shared between multiple businesses and between businesses and customers.

[0011] When XML documents are stored as column data, searching for desired XML data can be time-consuming. Typically, a search for XML data would require searching each XML document. This is usually called a document scan. Thus, there is a need in the art for an improved technique for searching for XML documents stored as column data.

[0012] With the longstanding use of relational databases, many businesses have stored their data in relational tables. In order to share this data with businesses that are using XML documents, the data in the relational databases may be manually selected, retrieved. and stored into XML documents. This is a long, tedious task. Thus, there is a need for an improved technique of selecting, retrieving, and storing relational data into XML documents.

[0013] In order to share relational data with other businesses that are using XML documents, a user may manually convert the relational data into XML documents. This is time consuming and inefficient. Thus, there is a need for an improved technique of generating XML documents from relational data.

[0014] Additionally, when an XML document is received, a user may need to store the data from the XML document into a relational database. Currently, this is a time consuming processing in which a user manually transfers the data from the XML document to the relational database. Thus, there is a need for an improved technique of decomposing an XML document and storing the decomposed data into a relational database.

SUMMARY OF THE INVENTION

[0015] To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for a computer implemented technique for processing XML documents.

[0016] In accordance with one aspect of the present invention, data is stored in a data store connected to a computer. A main table is created having a column for storing a document, wherein the document has one or more elements or attributes. One or more side tables are created, wherein each side table stores one or more elements or attributes. Then, the side tables are used to locate data in the main table.

[0017] In accordance with another aspect of the present invention, data stored on a data storage device that is connected to a computer is transformed. A query that selects data in the data storage device is received. The selected data is retrieved into a work space. Then, one or more XML documents are generated to consist of the selected data.

[0018] In accordance with yet another aspect of the present invention, data stored on a data storage device that is connected to a computer is transformed. Initially. a document object model tree is generated using a document access definition. The document object model tree is traversed to obtain information to retrieve relational data. The relational data is mapped to one or more XML documents.

[0019] In accordance with a further aspect of the present invention, data stored on a data store that is connected to a computer is transformed. Initially an XML document containing XML data is received. A document access definition that identifies one or more relational tables and columns is received. The XML data is mapped from the application DTD to the relational tables and columns using the document access definition based on the XPath data model.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020] Referring now to the drawings in which like reference numbers represent corresponding parts throughout:

[0021] FIG. 1 schematically illustrates the hardware environment of an embodiment of the present invention, and more particularly, illustrates a typical distributed computer system using a network.

[0022] FIG. 2 is a diagram illustrating a computer software environment that could be used in accordance with the present invention.

[0023] FIG. 3 illustrates an application or main table and its four side tables.

[0024] FIG. 4 is a flow diagram illustrating steps performed by the XML System in creating and maintaining XML document data as column data.

[0025] FIG. 5 is a flow diagram of steps performed by the XML System to enable a column.

[0026] FIG. 6 is a flow diagram of steps performed by the XML System to disable a column.

[0027] FIG. 7 is a diagram illustrating code organization to compose XML documents.

[0028] FIG. 8 is a block diagram illustrating components of the XML System in one embodiment of the invention.

[0029] FIG. 9 is a flow diagram illustrating the steps performed by the XML System to transform relational data into one or more XML documents using SQL mapping.

[0030] FIG. 10 is a flow diagram illustrating the process performed by the XML system using RDB_node mapping to compose XML documents.

[0031] FIG. 11 is a flow diagram illustrating the steps performed by the XML System to decompose XML documents with application specific mappings.

DETAILED DESCRIPTION

[0032] In the following description of an embodiment of the invention, reference is made to the accompanying drawings which form a part hereof and which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized as structural changes may be made without departing from the scope of the present invention.

A. HARDWARE ARCHITECTURE

[0033] FIG. 1 schematically illustrates the hardware environment of an embodiment of the present invention, and more particularly, illustrates a typical distributed computer system using a network 100 to connect client computers 102 executing client applications to a server computer 104
executing software and other computer programs, and to connect the server system 104 to data sources 106. A typical combination of resources may include client computers 102 that are personal computers or workstations, and a server computer 104 that is a personal computer, workstation, minicomputer, or mainframe. These systems are coupled to one another by various networks, including LANs, WANs, SNA networks, and the Internet. Each client computer 102 and the server computer 104 additionally comprise an operating system and one or more computer programs.

[0034] A client computer 102 typically executes a client application and is coupled to a server computer 104 executing one or more server software. The server software may include an XML system 110. The server computer 104 also uses a data store interface and, possibly, other computer programs, for connecting to the data sources 106. The client computer 102 is bi-directionally coupled with the server computer 104
over a line or via a wireless system. In turn, the server computer 104 is bi-directionally coupled with data sources 106.

[0035] The data store interface may be connected to a Database Management System (DBMS), which supports access to a data store 106 by executing, for example, RDBMS software. The interface and DBMS may be located at the server computer 104 or may be located on one or more separate machines. The data sources 106 may be geographically distributed.

[0036] The operating system and computer programs are comprised of instructions which, when read and executed by the client and server computers 102 and 104, cause the client and server computers 102 and 104
to perform the steps necessary to implement and/or use the present invention. Generally, the operating system and computer programs are tangibly embodied in and/or readable from a device, carrier, or media, such as memory, other data storage devices, and/or data communications devices. Under control of the operating system. the computer programs may be loaded from memory, other data storage devices and/or data communications devices into the memory of the computer for use during actual operations.

[0037] Thus, the present invention may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term "article of manufacture" (or alternatively, "computer program product") as used herein is intended to encompass a computer program accessible from any computer-readable device, carrier, or media. Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope of the present invention.

[0038] Those skilled in the art will recognize that the exemplary environment illustrated in FIG. 1 is not intended to limit the present invention. Indeed, those skilled in the art will recognize that other alternative hardware environments may be used without departing from the scope of the present invention.

B. XML BACKGROUND

[0039] Extensible Markup Language (XML) is a subset of Standard Generalized Markup Language (SGML). XML works in conjunction with Extensible Stylesheet Language Transformation, (XSLT) and Extensible Markup Language Path (XPath). XML may also work in conjunction with a Document Object Model (DOM) or Namespace.

[0040] Extensible Markup Language (XML) is a subset of Standard Generalized Markup Language (SGML). XML is described in XML 1.0, found at the following web site: http://www.w3.org/TR/REC-xml. Extensible Markup Language (XML) is a set of rules or guidelines for designing text formats for structured data using tags. Additional detail may be found at the following web site: http://www.w3.org/XML/1999/XML-in-10-points. For interoperability, domain-specific tags called a vocabulary can be standardized using a Document Type Definition, so that applications in that domain understand the meaning of the tags.

[0041] Extensible Style Language Transformer or XSLT is a language for transforming XML documents into other XML documents. The XSLT specification defines the syntax and semantics of the XSLT language. XSLT-defined elements are distinguished by belonging to a specific XML namespace, which is referred to as the XSLT namespace. A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. Further detail about XSLT may be found at http://www.w3.org/TR/xslt.

[0042] XML Path or XPath addresses parts of an XML document. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document. Further detail about XML path may be found at http://www.w3.org/TR/xpath.

[0043] A Document Object Model (DOM) is a standard set of function calls for manipulating XML files from a programming language. Additional detail may be found at the following web site: http://www.w3.org/TR/REC-DOM-Leve- l-1/.

C. OVERVIEW OF THE XML SYSTEM

[0044] In one embodiment of the invention, the XML System comprises the XML Extender from International Business Machines, Corporation, of Armonk, N.Y. The XML System offers the capability of XML storage and data interchange. By storage, the XML System provides mechanisms for storing and retrieving XML documents in a relational database (e.g., DB2.RTM. from International Business Machines, Corporation) and searching the content of XML with high performance. By data interchange, the XML System provides a mapping between new and existing relational tables and XML formatted documents. Thus, the XML System allows customers to do e-business anywhere, enabling XML with Business to Business (B2B) and Business to Consumer (B2C) applications. For B2B applications, application data flows between database servers, via any network (e.g., the internet or an intranet), either directly without client interaction or indirectly via some client systems. For B2C applications, application data flows between a consumer at, for example, a workstation, and a server connected via a network (e.g., between database servers and web clients via the internet). Thus, the XML System supports Business to Business (B2B) and Business to Client (B2C) applications. In both cases, the following requirements will apply:

[0045] Performance

[0046] Scalability

[0047] Integration with existing business logic

[0048] Smart query support

[0049] Legacy file support

[0050] Developer efficiency

[0051] FIG. 2 is a diagram illustrating a computer hardware environment that could be used in accordance with the present invention. In one embodiment, the DB2 XML Extender 200, a product from International Business Machines, Corporation, is at the center of the architecture. An application program 202 and a document access definition (DAD) 204 are received by the DB2 XML Extender 200. The DB2 XML Extender 200 takes an XML document 206 as the input, stores the XML document 206 in DB2 210
(i.e., a relational database) either internally inside DB2 210 or externally on the file system as one or more XML files 208. Then the stored XML document 206 can also be retrieved from DB2 210 or the file system through the DB2 Extender 200. The processing performed by the DB2
XML Extender 200 will be described in more detail below.

[0052] In another embodiment, an application program 202 and a document access definition (DAD) 204 are received by the DB2 XML Extender 200. The DB2 XML Extender 200 takes an XML document 206 as input, decomposes the XML document 206 into fragmented data and stores the fragmented data in DB2 210 (i.e., a relational database). Then, the fragmented data stored in DB2 210 can be regenerated from DB2 210 through the DB2 Extender 200. The processing performed by the DB2 XML Extender 200 will be described in more detail below.

[0053] Those skilled in the art will recognize that the environment illustrated in FIG. 2 is not intended to limit the present invention. Indeed, those skilled in the art will recognize that other alternative hardware environments may be used without departing from the scope of the present invention.

[0054] C.1 Applications

[0055] Different types of applications can benefit from the use of the XML System. Some illustrations follow:

[0056] Business to Business (B2B) Applications for E-Commerce:

[0057] B2B applications mainly use XML as their interchange format, such as Electronic Data Interchange (EDI). The XML System enables maintaining native XML formatted documents, as well as mapping data into/from relational tables. With native XML formatted documents XML enables storing entire XML documents into a database and searching on known elements or attributes. With mapping, XML System enables an application builder who knows the relational data model of particular business tables to custom map XML content to or from existing tables.

[0058] Web Information Retrieval Applications:

[0059] These are B2C applications which are often used in interactive Web sites, such as sites for insurance and real-estate industries. The XML documents are usually not very large in size, but have structured information.

[0060] The XML System enables storing entire XML documents into a database and using SQL to do a fast search on desired XML elements or attributes with rich data types. Range search for rich data types is often important. Additionally the XML System enables retrieving data from existing business tables and from XML documents and putting them on a web site for viewing.

[0061] For example, an insurance company may set up a call center system in which agents retrieve phone calls from their customers. The information is collected, and the case is archived. The XML System is used to store entire XML documents in a database. Then, an insurance agent can easily display an insurance case on a screen. The XML System also provides a fast and powerful search of these insurance cases, so the insurance agent can quickly retrieve information while still on the phone with a customer. Additionally, alternative ways of searching for information, i.e. numbers, text wildcards, key words, etc., are provided by XML System.

[0062] Content Management:

[0063] This type of application provides advanced content management functions to a user. A user could use XML System as physical storage, and have fast search with indexing. The XML documents are usually large in size. In some cases, it is desirable to partition the XML documents into multiple pieces and perform update in place.

[0064] As an extender to DB2.RTM., XML System enhances DB2.RTM. functionality for XML enablement. That is, XML System enables use of SQL as the main access technique, along with database features of: stored procedures, user defined types (UDT) and user defined functions (UDF).

[0065] The XML System meets the following requirements:

[0066] Physical Storage: for entire document, or shredded structured data, with data types.

[0067] Support of flat files: allows data to be stored in flat files, and imported/exported to/from database.

[0068] Access via SQL: supports field search, full text search, structural search.

[0069] Indexing facility: builds on different data types for better query performance.

[0070] Updates XML element/attribute.

[0071] Mapping to/from relational: composes/decomposes XML documents from/to data store in relational tables.

[0072] NLV support: support of double byte characters.

[0073] C.2 XColumns and XCollections

[0074] XML System provides good data and metadata management solutions to handle traditional and non-traditional data. With the content of structured XML documents in a database, a user can combine structured XML information with traditional relational data. Based on the application, a user can choose whether to store entire XML documents in a database as a non-traditional distinct data type or map the XML content as traditional data in relational tables. For non-traditional XML data types, the XML System adds the power to search rich data types of XML element or attribute values. For traditional SQL data, that is either decomposed from incoming XML documents or in existing relational tables to be used to create outgoing XML documents, the XML System provides a custom mapping mechanism to allow the transformation between XML documents and relational data.

[0075] The XML System offers the flexibility to store entire XML documents as column data or transform between XML documents and data in existing tables. The transformation includes decomposing an XML document into one or multiple pieces and storing the pieces in the form of relational data, as well as, composing XML documents from the data in existing relational tables. A user can decide how structured XML documents are to be stored or created through a Document Access Definition (DAD).

[0076] The DAD itself is an XML formatted document. The DAD associates XML documents to a database through two major access and storage techniques by defining elements Xcolumn and Xcollection. Xcolumn defines how to store and retrieve entire XML documents as column data of the XML user defined type (UDT). An XML column is a column of XML System's user defined type (UDT). Applications can include an XML column in any user table. Operations on the XML column can be processed after the column is enabled with the XML System. A user can access XML column data mainly through the SQL statements and XML System's user defined function (UDF). With the different access and storage techniques, the XML System provides the flexibility of XML data storage and retrieval.

[0077] In particular, an XML column is used to store entire XML documents in the native XML format. This approach treats XML format as an non-traditional data type and offers user defined types (UDTs) and user defined functions (IDFs) for a fast, versatile, and intelligent technique for searching through XML documents. The XML System gives applications the freedom to specify a list of XML elements/attributes as general SQL data types for fast search. The XML System will extract these values from the XML documents and store them in side tables so that a user can create indices on them. The application can query these side tables or join them with the application (i.e., "main") table to do a fast search. For example, a user can input a query such as: "give me all the documents whose prices are greater than $2500.00", providing 2500.00 is the value of an XML element or attribute inside the XML documents.

[0078] The XML System provides several user defined types (UDTs) for XML columns. These data types are used to identify the storage types of XML documents in the application table. The XML System supports legacy flat files, and a user is not restricted to storing XML documents inside a database. A user can also store XML documents as files on the local or remote file system, specified by a URL or a local file name.

[0079] The XML System provides powerful user-defined function (UDF)s to store and retrieve XML documents in XML columns, as well as to extract XML element/attribute values. The UDFs are applied to XML user defined types (UDTs), thus, these are mainly used for XML columns.

[0080] An Xcollection defines how to decompose XML documents into a collection of relational tables or to compose XML documents from a collection of relational tables An XML collection is a virtual name of a set of relational tables. Applications can enable an XML collection of any user tables. These user tables can be existing tables of legacy business data or the ones newly created by the XML System. A user can access XML collection data mainly through the stored procedures provided by the XML System.

[0081] An XML collection is used to transform data between database tables and XML documents. An XML collection achieves the goal of data interchange via XML. For applications that want to compose or decompose XML documents from/into a set of relational tables, the XML System offers a technique to enable an XML collection through a Document Access Definition (DAD). In the Document Access Definition, applications can make a custom mapping between database column data in new or existing tables to XML elements or attributes. The access to an XML collection is by calling XML System's stored procedures or directly querying to the tables of the collection.

[0082] The XML System also allows overrides of query conditions explicity or implicitly defined in the DAD, by parsing the SQL or XML XPath based override parameter to the composition stored procedures. In this way, it supports dynamic query for generating XML documents.

[0083] With the XML System, an application can:

[0084] Store entire XML documents as column data in an application table, either internally or externally as a local file or URL, while extracting desired XML element or attribute values into side tables for search.

[0085] Compose or decompose contents of XML documents from/into an XML collection which consists of one or more relational tables.

[0086] Perform fast search on XML elements or attributes of SQL general data types by converting character string in XML documents to SQL data types for indexing.

[0087] Update the content of an XML element, or the value of an XML attribute.

[0088] Extract XML elements or attributes dynamically in SQL query.

[0089] Validate XML documents during insertion and update.

[0090] The XML System also serves as an XML document type definition (DTD) repository. When a database is XML enabled, a DTD Reference Table (DTD_REF) is created. Each rots of this table represents a DTD, with additional metadata information. This table is accessible by users, and allows them to insert their own DTDs. The DTDs in the DTD_REF table are used to validate XML documents and to help applications to define a document access definition (DAD).

[0091] C.3 Terminology

[0092] This section clarifies some terminology used in this specification.

[0093] The XML System uses a subset of Extensive Stylesheet Language Transformation (XSLT) and XML Path Language (XPath), Version 1.0, the W3C working draft of 06/17/1999, to identify XML elements or attributes. The content of the XPath is originally in the XSLT and now is referred by the XSLT, as a part of the stylesheet transformation language. Location path is used to define XML elements and attributes. The XSLT/XPath's abbreviated syntax of the absolute location path is used.

[0094] The following is not a formal data model, but a set of abbreviated syntax. The notation of the absolute location path with abbreviated syntax supported by the XML System is listed below.

[0095] This section clarifies some terminology used in this specification.

[0096] The XML System uses a subset of Extensive Stylesheet Language Transformation (XSLT) and XML Path Language (XPath), Version 1.0, the W3C working draft of 06/17/1999, to identify XML elements or attributes. The content of the XPath is originally in the XSLT and now it is referred to by XSLT as a part of the stylesheet transformation language. Previously, the term "path expression" was used. Now, a subset of the term location path is used in XSLT and XPath to define XML elements and attributes. The XSLT XPath's abbreviated syntax of the absolute location path is used.

[0097] The following is not a formal data model. but a set of abbreviated syntax. An absolute location path with abbreviated syntax is listed below. This is supported by the XML System. Again, these are not formal definitions.

[0098] a. "/":

[0099] Represents the XML root element.

[0100] b. "/tag1":

[0101] Represents the element tag1 under root.

[0102] c. "/tag1/tag2/ . . . /tagn":

[0103] Represents an element with the name tagn as the child with the descending chain from root, tag1, tag2, . . . , tagn-1

[0104] d. "//tagn"

[0105] Represents any element with the name tagn, where "//" denotes zero or more arbitrary tags.

[0106] e. "/tag1//tagn"

[0107] Represents any element with the name tagn which is a child of element with the name tag1 under root, where "//" denotes zero or more arbitrary tags.

[0108] f. "/tag1/tag2/@attr1"

[0109] Represents the attribute attr1 of element with the name tag2 as a child of element tag1 under root.

[0110] g. "/tag1 /tag2/[@attr1="5"]"

[0111] Represents the element with the name tag2 whose attribute attr1 has the value `5` and it is a child of element with the name tag1 under root.

[0112] h. "/tag1/tag2/[@attr1="5"]/ . . . /tagn"

[0113] Represents the element with the name tagn which is a child of the descending chain from root, tag1, tag2, . . . where the attribute attr1
of tag2 has the value `5`.

[0114] i. "/tag1/tag2/tag3="Los Angeles"/ . . . /tagn"

[0115] Represents the element with the name tagn which is a child of the descending chain from root, tag1, tag2 . . .where tag3 has the value "Los Angeles".

[0116] j. "/tag1/tag2/*[@attr1="5"

[0117] Represents all elements as children of element "/tag1/tag2" with attr1 of value "5".

[0118] There are restrictions on the location path when used by the XML System, and these are listed in the table below.

1TABLE Restriction of Location Path Supported Use of the Location Path Location Path Supported Extracting UDFs a-j Text Extender's search UDF a-j DAD column definition c, f (simple location path)

[0119] Note that there is a restriction in the DAD column definition because there is a one-to-one mapping between an element or attribute to a column.

[0120] The term simple location path refers to the c and f notations in the table for Restriction of Location Path Supported. The simple location path is a sequence of element type names connected by the "/" notation. Each element type may be qualified by its attribute values.

2TABLE Simple Location Path of an Element and an Attribute Subject location path Description XML /tag_1/tag_2/.... an element content identified by Element /tag_n-1/tag_n the tag_n and its parents XML /tag_1/tag_2/.... an attribute with name "attr 1" of Attribute /tag_n-1/tag_n/@attr1
the element identified by tag_n and it parents

[0121] The location path identifies the structure part that indicates the document context to be found. An empty path signals the structure to search or extract against is the whole document (same effect as if the location path is the root element).

[0122] The XML System provides users the ability to create SQL queries on XML documents. Based on the nature of XML documents and the functionality of the XML System. the following terminology is used:

3
Document Access The definition used to enable an XML Definition(DAD): System column or an XML collection, which is XML formatted. Partition: The term partition used means the full partition. In other words, the union of all partitioned parts forms the original document. Location path: A subset of the abbreviated syntax of the location path defined by XPath. A sequence of XML tags to identify an XML element or attribute. It is used in the extracting UDFs to identify the subject to be extracted. The terms of path expression and location path may be used interchangeably. Side table: Additional tables created by the XML System to store searchable elements/attributes for an enabled XML Column. Valid Document: An XML document that has an associated DTD. To be valid, the XML document cannot violate the syntactic rules specified in its DTD. Well-formed document: An XML document that does not contain a DTD. A document with a DTD (valid) must also be well-formed. XML Attribute: Any attribute specified by the ATTLIST under the XML element in the DTD. The XML System uses the location path to identify an attribute. XML Column: A column in the application table of the XML System UDT type. The term of XML enabled column and XML column will be used interchangeably. XML Collection: A collection of relational tables which present the data to compose XML documents or to be decomposed from XML documents. XML Element: Any XML tag or ELEMENT as specified in the XML DTD. The XML System uses the location path to identify an element. XML Tag: Any valid XML markup language tag, mainly the XML element. The term tag and element are used interchangeably. XML Table: An XML Table is an application table which includes XML System column(s). The terms XML enabled table and XML table are used interchangeably. XML Object: The terms XML Object and XML document are used interchangeably. XML UDT: User defined type provided by the XML System. XML UDF: User defined function provided by the XML System.

[0123] C.4 Example of an XML DTD

[0124] The following DTD is provided as an example:

[0125] LineItem.dtd

4
<?xml encoding="US-ASCII"?> <!ELEMENT Order (customer,Part+)> <!ATTLIST Order Key CDATA #REQUIRED> <!ELEMENT Customer #PCDATA> <!ELEMENT Part(Quantity,ExtendedPrice,Tax,Shipment*)> <!ATTLIST Part Key CDATA> <!ELEMENT Quantity (#PCDATA)> <!ELEMENT ExtendedPrice (#PCDATA)> <!ELEMENT Tax (#PCDATA)> <!ELEMENT Shipment (ShipDate,ShipMode,Comment)> <!ELEMENT ShipDate (#PCDATA)> <!ELEMENT ShipMode (#PCDATA)> <!ELEMENT Comment (#PCDATA)>

[0126] In the above LineItem.dtd, the term LineItem.dtd is the title of the Document Type Definition. The term <?xml encoding="US-ASCII"?> indicates that encodin; is in US-ASCII. The terms beginning with ELEMENT refer to elements of an XML document, and the terms beginning with ATTLIST refer to attributes of an XML document. The DTD is used to verify a Document Access Definition.

[0127] C.5 Example of an XML Document

[0128] The following is an example of an XML formatted document:

[0129] order.xml

5
<?xml version="1.0"?> <!DOCTYPE Litem_DTD SYSTEM "E:.backslash.dxx.backslash.test.backslash.dtd.backslash.LineItem.- dtd"> <Order Key="1"> <Customer>General Motor</Customer> <Part Key="156"> <Quantity>17</Quantity> <ExtendedPrice>17954.55&- lt;/ExtendedPrice> <Tax>0.02</Tax> <Shipment> <ShipDate>1998-03-13</ShipDate> <ShipMode>TRUCK</ShipMode> <Comment>This is the first shipment to service of GM</Comment> </Shipment> <Shipment> <ShipDate>1999-01-16</ShipDate>- ; <ShipMode>FEDEX</ShipMode> <Comment>This the second shipment to service of GM.</Comment> </Shipment> </Part> <Part Key="68"> <Quantity>36</Quantity> <ExtendedPrice>34850.16- </ExtendedPrice> <Tax>0.06</Tax> <Shipment> <ShipDate>1996-04-12</ShipDate> <ShipMode>BOAT</ShipMode> <Comment>This shipment is requested by a call. from GM marketing.</Comment> </Shipment> <shipment> <ShipDate>1998-08- -19</ShipDate> <ShipMode>AIR</ShipMode> <Comment>This shipment is ordered by an email.</Comment> <Shipment> </Part> </Order>

[0130] In the above XML document, the term order.xml is the title of the XML document. The term <?xml version="1.0"?> indicates that this document is based on XML Version 1.0. The term <!DOCTYPE Litem_DTD SYSTEM "E:.backslash.dxx.backslash.test.backslash.dtd.backslash.LineItem.- dtd"> is text for the XML document type definition and references the example Document Type Definition, entitled LineItem.dtd, in C.4, which is used for validation.

[0131] The remaining terms define the data in the XML document. For example, the term <Quantity>17<Quantity> indicates that quantity has a value of 17. Also, note that <Quantity> without a slash at the beginning defines a start tag and <Quantity> with a slash at the beginning defines an end tag. Similarly, other terms in the XML document use such tags.

[0132] C.6 The Document Access Definition (DAD)

[0133] A user decides how XML document data is to be accessed in a database. That is, the 150

[0134] user defines a DAD. With the help of a Graphical User Interface (GUI) tool, the user can create a DAD to define a mapping and indexing scheme.

[0135] A Document Access Definition(DAD) is defined by the following Document Type Definition (DTD):

[0136] dad.dtd

6
<?xml encoding="US-ASCII"?> <!ELEMENT DAD (dtdid?, validation, (Xcolumn.vertline.Xcollection))> <!ELEMENT dtdid (#PCDATA)> <!ELEMENT validation (#PCDATA)> <!ELEMENT Xcolumn (table*)> <!ELEMENT table (column*)> <!ATTLIST table name CDATA #REQUIRED key CDATA #IMPLIED orderBy CDATA #IMPLIED> <!ELEMENT column EMPTY> <!ATTLIST column name CDATA #REQUIRED type CDATA #IMPLIED path CDATA #IMPLIED multi_occurrence CDATA #IMPLIED> <!ELEMENT Xcollection (SQL_stmt*, prolog, doctype, root_node)> <!ELEMENT SQL_stmt (#PCDATA)> <!ELEMENT prolog (#PCDATA)> <!ELEMENT doctype (#PCDATA.vertline.RDB_node)">- ; <!ELEMENT root_node (element_node)> <!ELEMENT element_node (RDB_node?, attribute_node*, text_node?, element_node*, namespace_node*, process_instruction_node*, comment_node*)> <!ATTLIST element_node name CDATA #REQUIRED ID CDATA #IMPLIED multi_occurrence CDATA "NO" BASE_URI CDATA #IMPLIED> <!ELEMENT attribute_node (column.vertline.RDB_node)> <!ATTLIST attribute_node name CDATA #REQUIRED> <!ELEMENT text_node (column.vertline.RDB_node)> <!ELEMENT RDB_node (table+, column?, condition?)> <!ELEMENT condition (#PCDATA)> <!ELEMENT comment_node (#PCDATA)> <!ELEMENT namespace_node EMPTY> <!ATTLIST namespace_node name CDATA #IMPLIED value CDATA #IMPLIED> <!ELEMENT process_instruction_node (#PCDATA)>

[0137] The XML System Administration GUI will provide an interface to create DAD files. The DAD itself is a tree structured XML document. The important elements and attributes of the DAD are:

[0138] DTDID:

[0139] The identifier of the DTD stored in the dtd_ref table. It represents the DTD which validates the XML documents or guides the mapping between XML collection tables and XML documents. DTDID must be specified for XML collections. For XML columns, it is optional and is only needed if you want to create side tables for indexing on elements/attributes or validate nets XML documents. The DTDID must be the same as the SYSTEM ID specified in the "doctype" of the XML documents.

[0140] Validation

[0141] For validating XML documents with the DTD, and "No" for no validation. If "Yes", then the DTDID must also be specified.

[0142] Xcolumn

[0143] An Xcolumn defines the indexing scheme for an XML Column. It is composed by zero or more tables.

[0144] table:

[0145] The relational side table(s) created for indexing elements or attributes of documents stored in an XML column. You can have one or more tables. A table is specified by:

[0146] name:

[0147] name of the side table.

[0148] column:

[0149] The column of the side table, which contains the value of a location path of the specified type.

[0150] name: name of the column. It is the alias name of the location path which identifies an element or attribute,

[0151] type: the data type of the column. It can be any SQL data type.

[0152] path: the location path of an XML element or attribute. Only a simple location path defined in Section C.3 is allowed here.

[0153] multi_occurrence: "YES" or "NO" to specify whether this element or attribute will have in one XML document. For Xcolumn, if multi_occurrence is specified as "YES", the XML System will add another column "DXX_SEQNO" with type Integer in the side table which this column belong to. This DXX_SEQNO keeps track of the order of elements occurred for the path expression in each inserted XML documents. With DXX_SEQNO, the user can retrieve a list of the elements with the same order as the original XML document using "ORDER BY DXX_SEQNO" in SQL.

[0154] Xcollection

[0155] The Xcollection defines mapping between XML documents and an XML collection of relational tables. It is composed by the following elements:

[0156] SQL stmt

[0157] The SQL statement to specify the operation needed to achieve the mapping. It must be a valid SQL statement. It is only needed for composition, and only one SQL_stmt of query is allowed.

[0158] objids

[0159] A list of identifiers, each of which conceptually identifies a row object in the database table, so that the row to be selected is ordered by this unique value. It is only needed when SQL_stmt is supplied. The ID can be a column name, or a value generated from the generate_unique( ) function or a UDF. It is recommended but not necessary to be the primary key of the table.

[0160] prolog

[0161] Text for the XML prolog. The same prolog is supplied to all documents in the entire collection. It is a fixed text. This because only XML 1.0 is supported. and UDB.RTM. only supports UTF-8.

[0162] doctype

[0163] Text for the XML document type definition. The doctype can be specified in one of the following two ways:

[0164] The same doctype is supplied to all documents in the entire collection. In this case, it is a fixed text.

[0165] When decomposing, the doctype can be stored as a column data of a table. In this case, the RDB_node should be specified.

[0166] root node

[0167] The virtual root node which must has one and only one element_node. The element_node under the root node is actually the root_node of the XML document.

[0168] RDB_node:

[0169] The node defines the mapping between an XML element or attribute and relational data. It consists of:

[0170] table:

[0171] name: the name of a relational table in which the data of an XML element or attribute reside.

[0172] key: the primary single key of the table. It must be specified for decomposition.

[0173] For the root element_node. all tables storing its attribute or all child element data should be specified.

[0174] orderBy: names of columns that determine the sequence order of multiple-occurring element text or attribute value when generating XML documents.

[0175] column:

[0176] It must be specified for text_node or attribute_node. but not for the element_node.

[0177] name: name of the column which contains the value of an XML element or attribute. It must be specified for both composition and decomposition.

[0178] type: the data type of the column. It is needed only for decomposition.

[0179] path: the location path of the element or attribute. It is not needed for Xcollection, only for Xcolumn.

[0180] multi_occurrence: multiple occurrence of the element or attribute.

[0181] condition: the predicate to specify query condition. It serves two purposes:

[0182] In RDB_node of a text_node or attribute_node: if specified, it qualifies the condition to select the column data to be used to compose or decompose XML element text or attribute value. It is optional.

[0183] In RDB_node of the root element node: if more than one tables are supplied, it must be specified as the condition to join tables.

[0184] element_node

[0185] Representing an XML element. It must be defined in the specified DTD. For the RDB_node mapping, the root element_node must have a RDB_node to specify all tables containing XML data for itself and all its children nodes. It can have zero or more attribute_nodes and child element_nodes, as well as zero or one text_node. In the next release, an element_ode can also contain namespace_nodes, process_instruction_nodes and comment_node.

[0186] An element_node is defined by:

[0187] Attributes:

[0188] name: The name of the XML element. It is the tag name.

[0189] ID: The unique ID. This is adapted from XPTH.

[0190] BASE_URI: The base URI for the name space. This is also adapted from XPTH.

[0191] Optional RDB_node:

[0192] The RDB_node is only needed for the root element_node when using RDB_node mapping. In this case, all tables involved to generate or decompose XML documents must be specified. The column is not needed. The condition must be specified to show the join relationship among tables.

[0193] Optional child nodes:

[0194] An element_node can also have the following child nodes:

[0195] element_node(s): representing child element(s) of this element,

[0196] attribute_node(s): representing attribute(s) of this element;

[0197] text node: represent the CDATA text of this element,

[0198] comment_node: representing the comment for this element,

[0199] namespace node: representing the namespace of this element,

[0200] process_instruction_node: representing the process instruction,

[0201] attribute node:

[0202] Representing an XML attribute. It is the node defining the mapping between an XML attribute and the column data in a relational table. It must has a name, and a column or a RDB_node.

[0203] Attribute:

[0204] name: the name of the attribute. It must be defined in the DTD.

[0205] Column or RDB_node:

[0206] Column: needed for the SQL mapping. In this case, the column must be in the SQL_stmt's SELECT clause.

[0207] RDB_node: needed for the RDB_node mapping. The node defines the mapping between this attribute and the column data in the relational table. The table and column must be specified. The condition is optional.

[0208] text_node:

[0209] Representing the text content of an XML element. It is the node defining the mapping between an XML element content and the column data in a relational table. It must be defined by a column or a RDB_node.

[0210] Column: needed for the SQL mapping. In this case, the column must be in the SQL_stmt's SELECT clause.

[0211] RDB_node: needed for the RDB_node mapping. The node defines the mapping between this text content and the column data in the relational table. The table and column must be specified. The condition is optional.

D. CREATING METADATA FOR FAST SEARCH OF XML DOCUMENTS STORED AS COLUMN DATA

[0212] One embodiment of the invention provides an XML System which solves the problem of fast searching and indexing of XML element/attribute values of XML documents when they are stored inside a database as column data.

[0213] An XML document is a structured document. XML lets a user structure a document by elements or attributes (e.g., title or author). Once a document is structured in this manner, a structured search man be performed based on element or attribute values (or content).

[0214] The embodiment of the invention converts the characters of element/attribute values to any general SQL data type. Additionally, the embodiment of the invention provides a technique for performing a range search on the data. That means the element or attribute values are converted to SQL types (e.g., number of pages may be an integer). With this embodiment of the invention, indices can be created on XML element/attribute values, thus the search operation is scalable.

[0215] The embodiment of the invention permits application programmers to define a Data Access Definition (DAD) which identifies the XML elements or attributes that need to be indexed and defines the mapping between XML elements or attributes to columns in one or more side tables. The DAD is an XML formatted document that is used to specify within an XML document which elements or attributes are to be searched. The DAD also provides a location path or XPath. For example, if elements of a book are structured as follows:

[0216] .vertline.-----Book

[0217] .vertline.-----Title

[0218] .vertline.-----Author

[0219] The location path for the above structure would be: /Book/Title/Author.

[0220] Additionally, the embodiment of the invention stores XML document data in an application table, while storing particular elements or attributes in side tables. The data stored in the side tables is referred to as "metadata" and is used to search for elements or attributes in the XML documents stored as column data in the application table. During the enabling of a column which contains XML documents, side tables are created (based on the DAD) to store duplicate data of these elements or attributes. Several triggers are created so that values of these elements or attributes are extracted when operations are performed on XML documents in columns of an application table. The operations include, for example, insert operations on the application table, which trigger insert operations to also store the inserted XML data into the side tables. Triggers also manage the synchronization of XML data between the side table data during the deleting and updating operations on the column containing the XML documents in the application table.

[0221] D.1 Indexing for Searching XML Columns

[0222] The indexing mechanism is applied on XML columns. In particular, the indexing mechanism discussed here is a technique to create an index on XML element or attribute values when entire XML documents are stored in XML columns.

[0223] With a large collection of XML documents, search performance is a critical user requirement. Index support provides fast query performance at the cost of slower update performance due to index updates. The XML System provides an indexing mechanism that allows search predicates at query-time to be evaluated through indices, without reading document sources.

[0224] The XML column indexing mechanism allows frequently queried data of general data types, such as integer, decimal, or date, to be indexed using the native database index supports from the database engine. This is achieved by extracting the values of XML elements or attributes from XML documents, storing them in the side tables. then allowing application programmers to create indices on these side tables. In a DAD, a user can define Xcolumns by specifying each column of a side table with a location path that identifies an XML element or attribute and a desired SQL data type. The XML System then will populate these side tables when data is inserted into the application table. An application can create an index on these columns for fast search, using the database B-tree indexing technology. The technique and options for creating an index may vary across platforms. Application programmers have the freedom to create a desired index as they usually do with a database on their platform.

[0225] For elements/attributes in an XML document which occur multiple times, a separate table is created for each XML element/attribute with multiple occurrences, due to the complex structure of XML documents.

[0226] For example, a user may want to create an index on `/Order/Part/ExtendedPrice`, and specify `/Order/Part/ExtendedPrice` to be of data type REAL. In this case, XML System will store the value of `/Order/Part/ExtendedPrice` in the specified column `price` in a side table. Multiple indices on an XML column are allowed. In the example, a user can create two columns in two side tables, one for `ExtendedPrice` and one for "ShipDate".

[0227] When side tables are created, they are tied together with the main (or application) table through the notion of root_id. A user can decide whether the primary key of the application table is to be the "root_id". If the primary key does not exist in the application table, or for some reason a user doesn't want to use the primary key, then XML System will alter application table to add a column DXXROOT_ID for storing a unique identifier created at insertion time (i.e., when data is inserted into the application or main table). All side tables will have a "DXXROOT_ID" column and have the unique identifiers stored. If the primary key is used as the root_id, then all side tables will have a column with the same name and type as the primary key column in the application table, and the values of the primary keys are stored.

[0228] D.2 Sample DAD for an XML Column

[0229] Assuming the XML documents need to be stored are like the one shown in C.5. Example of an XML Document, the following example DAD will store the XML documents in an XML column and create several side tables for indexing.

[0230] Litem_DAD1.dad

7
<?xml version="1.0"?> <!DOCTYPE Order SYSTEM "E:.backslash.dtd.backslash.dxxdad.dtd"> <DAD> <dtdid>E:.backslash.dtd.backslash.lineItem.dtd</dtdid> <validation>YES</validation> <Xcolumn> <table name="order_tab"> <column name="order_key" type="integer" path="/Order/@Key" multi_occurrence="NO"/> <column name="customer" type="varchar(50)" path="/Order/Customer" multi_occurrence="NO"/> </table> <table name="part_tab"> <column name="part_key" type="integer" path="/Order/Part/@Key" multi_occurrence="YES"/> </table> <table name="price_tab"> <column name="price" type="double" path="/Order/Part/ExtendedPrice" multi_occurrence="YES"/> </table> <table name="ship_tab"> <column name="date" type="date" path="/Order/Part/Shipment/ShipDate" multi_occurrence="YES"/- > </table> </Xcolumn> </DAD>

[0231] In the above DAD, Litem_DAD1.dad is the name of the DAD. The phrase <?.xml version="1.0"?> identifies the version, and the phrase <!DOCTYPE Order SYSTEM "E:.backslash.dtd.backslash.dxxdad.dtd"> is text for the XML document type definition. The first DAD and the second DAD tags indicate that the information between these tags comprise the data access definition. The phrase <dtdid>E:.backslash.dtd.backslas- h.lineItem.dtd</dtdid> identifies the document type definition (DTD) to be used. The phrase <validation>YES</validation> indicates that this DAD is to be validated against the DTD. The four table name terms identify the four side tables to be created.

[0232] In this example, the four side tables created for indexing are as follows:

[0233] order_tab: with columns of order_key and customer; representing attribute "/Order/@Key" and element "/Order/Customer".

[0234] part_tab: with column of part_key, representing attribute "/Order/Part/@Key".

[0235] price_tab: with column of price, representing element "/Order/Part/Price"

[0236] ship_tab: with column of date, representing element "/Order/Part/Shipment/ShipDate".

[0237] For this example, it is assumed that the columns in the tables are the elements and attributes which need to be searched frequently.

[0238] FIG. 3 illustrates an application or main table and its four side tables. The Application table 300 has a root_id in common with each side table 302, 304, 306, and 308. The side tables 302, 304, 306, and 308
correspond to the side tables defined in the DAD above.

[0239] D.3 XML Column/User Defined Types

[0240] An XML column is designed to store XML documents in their native format in the database as column data. After a database is enabled, the following user defined types (UDTs) are created:

[0241] XMLCLOB: XML document content stored as a CLOB inside the database,

[0242] XMLVarchar: XML document content stored as a VARCHAR inside the database,

[0243] XMLDBCLOB: XML document content stored as double byte CLOB inside the database,

[0244] XMLFile: XML document stored in a file on a local file system

[0245] XMLURL: XML document stored as a uniform resource locator (URL) via Data Link.

[0246] A user can use these UDTs as the data type of an XML column. An XML column is created when a user creates or alters an application table.

[0247] D.4 Creating an XML Table

[0248] An XML table is a table that includes one or more columns created with the XML System UDT. To create such a table, an XML column is included in the column clause of the CREATE TABLE statement.

[0249] Consider a line item order book keeping application. The XML formatted line item order is to be stored in a column called "order" of an application table called "sales_tab". The sales_tab table also includes other columns of invoice_number and sales_person. Since the order is not very long, a user may decide to store it in the XMLVarchar type. The user may also decide to let the invoice_number be the primary key. The following create table statement can be used, where XMLVarchar is the XML System UDT:

8
CREATE TABLE sales_tab (invoice _number char(6) NOT NULL PRIMARY KEY, sales_person varchar(20), order XML Varchar);

[0250] D.5 Defining Xcolumn in DAD

[0251] In order to use an XML column, a DAD needs to be prepared and enabled. In DAD preparation, a user first needs to define an "Xcolumn". The following steps guide a user to define an `Xcolumn", using the examples: XML document order.xml in C.5, DTD LineItem.dtd in C.4, and DAD Litem_DAD1.dad in D.2.

[0252] Identify the XML elements and attributes which will be frequently searched in the application.

[0253] In the above examples, the "/Order/@Key", "/Order/Customer", "/Order/Part/@Key", "/Order/Part/ExtendedPrice". "/Order/Part/Shipment/Sh- ipDate" are mostly like to be searched and range search is needed for some of them.

[0254] Decide how many side tables will be created for indexing. This is based on the understanding of the DTD and XML documents.

[0255] In the above examples, since "/Order" has unique attribute "Key" and only one element "Customer", they are put in the same side table "order_tab". One "Order" can have one or more "Part" items (see DTD definition in C.4), and each "Part" will have unique attribute "Key" and element "ExtendedPrice", and so these are separates into two tables: "part_tab" and "price_tab". Now, since one "Part" can have multiple "Shipment" items and each "Shipment" has one "ShipDate", the "ShipDate" is put into another table "ship tab".

[0256] Define the column of each side table by specifying the column name, the matching XML element or attribute by location path, and the data type.

[0257] In the examples, the ability to perform range search is desired on "ExtendedPrice" and "ShipDate", thus the data type is specified to be double and date respectively. Because there will be multiple occurrences of the /Order/Part/@key, /Order/Part/ExtendedPrice and /Order/Part/Shipment/ShipDate, specify the multi_occurrence="YES" for these elements or attributes. By doing so, the XML System will create an additional column DXX_SEQNO for side table price_tab and ship_tab so that a query can be performed using "order by DXX_SEQNO" to get the element or attribute with the same order as that in the original XML documents.

[0258] D.6 Enabling Parameters

[0259] A column can be enabled through the XML System administration GUI or using, a dxxadm command with the enable_column option. The syntax of the option is as follows s:

[0260] dxxadm enable_column db_name tab_name column_name DAD_file

[0261] -t tablespace -v default_view -r root_id]

[0262] where:

[0263] db_name: the database name

[0264] tab_name: table name in which the XML column resides.

[0265] column_name: name of the XML column.

[0266] DAD_file: name of the file that contains Data Access Definition (DAD).

[0267] tablespace: optional, but if specified, a previously created tablespace which will contain side tables created by the XML System.

[0268] default_view: optional, but if specified, it is the name of the default view created by XML System to join application table and all side tables.

[0269] root_id: optional, but recommended, and if specified, it is the column name of the primary key in the application table, and XML System will use it as the unique "root_id" to tie all side tables with the application table. If not specified, XML System will add the column of DXXROOT_ID in the application table. Note: if the application table happened to have a column name as "DXXROOT_ID", the primary key must be specified as the "root_id". otherwise, an error will be returned.

[0270] Here is an example for enabling the column order in the table sales_tab in database mydb with the DAD_file Litem_DAD1.dad in C.4, default view sales_order_view and root_id invoice_number.

[0271] /home/u1>dxxadm enable_column mydb sales_tab order Litem_DAD1.dad

[0272] -v sales_order_view -r invoice_number

[0273] DXXA007I XML Extender is enabling column order. Please wait.

[0274] DXXA008I XML Extender has successfully enabled the column order.

[0275] /home/u1>

[0276] D.7 Results of the Column Enabling

[0277] The enabling of an XML column mainly does the following things to a database:

[0278] Read the DAD_file and do the following:

[0279] if DTDID is specified, retrieve the DTD from the dtd_ref table.

[0280] process Xcolumn to create side tables

[0281] create triggers for insert, update and delete on the XML column so that the side tables will be populated or updated.

[0282] Create a default_view if specified.

[0283] If root_id not specified. alter application table to add DXXROOT_ID column.

[0284] Update the XML_USAGE and dtd_ref table to reflect the enabling of this XML column.

[0285] Based on the above examples, the user table sales_tab has the following schema:

[0286] Based on the above examples, the user table sales_tab has the following schema:

[0287] User table sales_tab:

9
Column Name invoice_number sales_person order Data Type char(6) varchar(20) XMLVarchar

[0288] The enabling column operation will create the following side tables based on the DAD:

10
Side_table order_tab: Column Name order_key customer invoice_number Data Type integer varchar(50) char(6) Location Path /Order/@Key /Order/Customer N/A Side table part_tab: Column Name part_key invoice_number Data Type integer char(6) Location Path /Order/Part/@Key N/A Side table price_tab: Column Name price invoice_number Data Type double char(6) Location Path /Order/Part/ExtendedPrice N/A Side table ship_tab: Column Name date invoice-number Data Type date char(6) Location Path /Order/Part/Shipment/ShipDate N/A

[0289] Note that because the root_id is specified by the primary key invoice_number in the application table sales_tab, all side tables have the column invoice_number of the same type. Also, the value of the invoice_number of each row in the sales_tab will be inserted into the side tables.

[0290] Since the default_view parameter is specified when enabling the XML column order, a default view sales_order_view is created by the XML System. It joins the above five tables by the following statement:

[0291] CREATE VIEW sales_order_view(invoice_number,sales_person,order, order_key,customer,part_key,price,date)

[0292] AS

[0293] SELECT sales_tab.invoice_number, sales_tab.sales_person, sales_tab.order, order_tab.order_key, order_tab.customer, part_tab.part_key, price_tab.price, ship_tab.date)

[0294] FROM sales_tab, order_tab, part_tab, price_tab, ship_tab

[0295] WHERE sales_tab.invoice_number =order_tab.invoice_number

[0296] AND sales_tab.invoice_number =part_tab.invoice_number

[0297] AND sales_tab.invoice_number =price tab.invoice_number

[0298] AND sales_tab.invoice_number =ship_tab.invoice_number.

[0299] Because the tablespace in the enable_column command was not enabled. the default tablespace is used to create side tables. If the tablespace is specified and it does exist in the database, then the side tables will be created in the specified side tables.

[0300] D.8 Inserting XML Documents

[0301] For XML columns, an entire XML document is always stored as the column data. The insertion can be achieved in the following ways:

[0302] Using the default cast function:

[0303] For each UDT, there is a default cast function to convert the SQL base type to the UDT. The following cast functions can be used in a VALUES clause.

11
Input Default UDT Parameter Return Cast Function Type Type Description db2xml. varchar XMLVarchar Input from memory XMLVarchar() buffer of varchar db2xml.XMLCLOB() clob XMLCLOB Input from memory buffer of clob db2xml. dbclob XMLDBCLOB Input from memory XMLDBCLOB() buffer of dbclob db2xml.XMLFile() varchar XMLFile Only store file name db2xml.XMLURL() datalink XMLURL data type

[0304] The following SQL statement inserts the casted varchar type in the host variable xml_buff into the XMLVarchar.

[0305] INSERT INTO sales_tab

[0306] VALUES(`123456`, `Sriram Srinivasan`, db2xml.XML Varchar(:xml_buff))

[0307] Using the Storage UDF:

[0308] For each XML System UDT, there is a storage(or import) UDF to import data from a resource other than its base type. For example, if to import an XML document in a file to the database as a XMLVarchar, then the function XMLVarcharFromFile( ) is used.

[0309] In the example below, a record is inserted into the sales_tab table. The function XMLVarcharFromFile( ) imports the XML document from a file into the database and stores it as a XML Varchar.

[0310] EXEC SQL INSERT INTO sales_tab VALUES(`123456`,

[0311] `Sriram Srinivasan`

[0312] XMLVarcharFromFile(`/home/u1/xml/order.xml`))

[0313] The above example imports the XML object from the file "/home/u1/xml/order.xml" to the column order in the table sales_tab.

[0314] D.9 Retrieving XML Documents

[0315] The XML table is ready to use when the XML column is enabled. Retrieving an XML column directly returns the UDT as the column type. A user can always use the default cast function provided by The database for distinct types to convert a UDT to an SQL base type, then operate on it. In addition to that, a user can also use overloaded UDF Content( ) to retrieve document content from a file or URL to a memory buffer.

[0316] Using the default cast function:

[0317] The following cast functions, which are automatically created by the database for the XML UDT, may be used in a SELECT statement.

12
Default Cast Input Return Function Parameter Type Type Description db2xml.varchar() XMLVarchar varchar XML document in variable length of char db2xml.clob() XMLCLOB clob XML document in CLOB db2xml.dbclob() XMLDBCLOB dbclob XML in double byte CLOB db2xml.varchar() XMLFiLe varchar XML filename in variable lenght of char db2xml.datalink() XMLURL datalink URL of XML document

[0318] The following SQL statement shows how to use the default cast function in a simple query.

[0319] EXEC SQL SELECT db2xml.varchar(order)from sales_tab

[0320] Using the content( ) UDF:

[0321] Suppose XML documents are stored as XMLFile or XMLURL, to operate on these XML documents in memory, the UDF content( ), which takes XMLFile or XMLURL as input and returns a varchar or CLOB, is used.

[0322] In the example below, a small sqc program segment illustrates how an XML document is retrieved from a file to memory. This example assumes that the column order is of XMLFile type.

13
EXEC SQL BEGIN DECLARE SECTION; varchar(3k) xml_buff; EXEC SQL END DECLARE SECTION; EXEC SQL CONNECT TO mydb EXEC SQL DECLARE c1 CURSOR FOR SELECT Content(order) from sales_tab WHERE sales_person=`Sriram Srinivasan` EXEC SQL OPEN c1; do { EXEC SQL FETCH c1 INTO :xml_buff; if(SQLCODE !=0) { break; } else { /* do whatever is needed to do with the XML doc in buffer */ } } EXEC SQL CLOSE c1;

[0323] D.10 Updating XML Documents

[0324] With the XML System, an entire XML document can be updated by replacing the XML column data. The XML System provides two techniques for update:

[0325] Using cast functions or storage UDFs in the set clause of the SQL update statement:

[0326] In this case, a cast function or a UDF is used in the Set clause. Here is an example:

[0327] UPDATE sales_tab

[0328] set order=XMLVarcharFromFile(`/home/u1/xml/order2.xml`)

[0329] WHERE sales_person=`Sriram Srinivasan`

[0330] Using the Update( ) UDF:

[0331] The XML System provides a UDF Update( ) which allows a user to specify a location path and the value of the element or attribute represented by the location path to be replaced. In this case, a user does not need to retrieve the XML document and use an editor to change the content. The XML System will do it automatically.

[0332] Here is an example of using the UDF Update( ). In this example, the content of "/Order/Customer" is updated to NewMart".

14
UPDATE sales_tab set order=Update(order,`/Order/- Customer`,`NewMart`) WHERE sales_person=`Sriram Srinivasan`

[0333] For an XML Column, the XML System will update side tables of extracted data when the XML column is updated. However, a user should not update these side tables directly without updating original XML documents stored in the XML column by changing the corresponding XML element or attribute value. Otherwise, there may be data inconsistency problems.

[0334] D.11 Retrieving XML Element Contents and Attribute Values

[0335] For XML columns, the XML System provides a UDF to extract element or attribute values from entire XML documents. The retrieval is performed on an XML document. It is a single document search. The XML System provides extracting UDFs to retrieve XML elements or attributes in the SQL select clause. This is very useful after search filtering on a collection of XML documents to further obtain desired elements or attributes.

[0336] Suppose there are more than 1000 XML documents stored in the column order in the table sales_tab. To find all customers who have ordered items which have the ExtendedPrice greater than $2500.00, the following SQL statement with the extracting UDF in the select clause can be used:

[0337] SELECT extractVarchar(Order,`/Order/Customer`) from sales_order_view

[0338] WHERE price>2500.00

[0339] where the UDF extractVarchar( ) takes the order as the input, and the location path "/Order/Customer" as the select identifier, and returns the names of the customer. Note, in this statement, only the orders with ExtendedPrice greater than $2500, say maybe 11 such orders, will be the input to the extracting function. The WHERE clause did the filtering on the collection of 1000 XML documents already. Again, the sales_order_view is the default view to join the application table sales_tab and all its side tables, where price is the part_tab.price, representing the "/Order/Part/ExtendedPrice".

[0340] D.12 Searching an XML Document

[0341] The above sections have described how the XML System may be used as a document repository for storage and retrieval, as well as for element or attribute selection. Here, searching using indices created on side table columns, which contain XML element contents or attribute values extracted from XML documents, is illustrated. Since the data type of an element or attribute can be specified, searches can be performed on SQL general data types and range searches can be performed.

[0342] D.13 Search from Join View

[0343] If desired and specified when an XML column is enabled, the XML System provides a default read-only view which joins the application table with all created side tables through the same unique identifier. With the default view, or any view created by the application, a user can search XML documents by a query on the side tables.

[0344] The above examples have referenced an application table sales_tab and side tables order_tab, part_tab and ship_tab. The name of a default view sales_order_view is specified at the enabling column time. XML System had created a default view sales_order_view which joins these tables by the statement shown in the previous section.

[0345] The following example SQL statement will return the sales_persons of the sales_tab who have line item orders stored in the column order where the ExtendedPrice is greater than $2500.00.

[0346] SELECT sales_person FROM sales_order_view

[0347] WHERE price>2500.00

[0348] The advantage of a query on the join view is that it provides a virtual single view of the application table and side tables. However, when more side tables are created. the more expensive the query will be. Therefore, it is only reco