United States Patent Application20020010715
Kind CodeA1
Chinn, Garry ; et al.January 24, 2002

System and method for browsing using a limited display device
Abstract
A method comprising: providing a navigation tree comprising a semantic, hierarchical structure, having one or more paths associated with content of a conventional markup language document and a grammar comprising vocabulary including one or more keywords; receiving a request to access the content; responsive to the request, traversing a path in the navigation tree, if the request includes at least one keyword of the vocabulary, is provided.

Inventors:Chinn; Garry (San Mateo, CA), Dugan; Benedict R.  (Seattle, WA), Hagen; Roger E.  (San Francisco, CA), Sexton; Michael R.  (San Francisco, CA), Khatri; Sven H.  (Oakland, CA), King; Tim J.  (San Francisco, CA)
Correspondence Name and Address:SKJERVEN MORRILL MacPHERSON LLP 25 Metro Drive, Suite 700
F. Jason Far-Hadian
San Jose
CA
95110-1349
US
Series Code:916095
Filed:July 26, 2001
U.S. Current Class:707/514; 707/907
U.S. Class at Publication:707/514; 707/907
Intern'l Class:G06F 015/00

Claims


1. A method comprising: providing a navigation tree comprising a semantic, hierarchical structure, having one or more paths associated with content of a conventional markup language document and a grammar comprising vocabulary including one or more keywords; receiving a request to access the content; and responsive to the request, traversing a path in the navigation tree, if the request includes at least one keyword of the vocabulary.

2. The method of claim 1, wherein the vocabulary dynamically changes based on the path traversed in the navigation tree.

3. The method of claim 1, wherein the grammar further includes one or more rules corresponding to said one or more keywords of the vocabulary, the method further comprising: retrieving the content according to one or more rules corresponding to said at least one keyword included in the request.

4. The method of claim 1 wherein the request is in the form of speech.

5. The method of claim 1 further comprising: determining if the request for accessing the content includes at least one keyword of the vocabulary by searching the vocabulary to find a match for said at least one keyword in the request.

6. The method of claim 5 further comprising: confirming that the match for the keyword is correct; and traversing the path in the navigation tree to retrieve content related to said at least one keyword in the request.

7. The method of claim 6 further comprising: providing a prompt including one or more keywords of the vocabulary if a match for the keyword is not found.

8. The method of claim 7 further comprising: traversing a path in the navigation tree to retrieve content related to a keyword selected from said one or more keywords included in the prompt.

9. The method of claim 1 further comprising: narrowing the vocabulary of the grammar if the request does not include at least one keyword of the vocabulary.

10. The method of claim 9 further comprising: providing a prompt including one or more keywords of the narrowed vocabulary; and traversing a path in the tree to retrieve content related to a keyword selected from said one or more keywords in the narrowed vocabulary.

11. The method of claim 10 further comprising: expanding the vocabulary of grammar based on the path traversed in the navigation tree.

12. The method of claim 1 wherein the conventional markup language is HyperText Markup Language.

13. A method performed on a computer for browsing content available from a communication network comprising: receiving a document containing content in a conventional markup language format and a style sheet for the document; generating a document tree from the document; generating a style tree from the style sheet, the style tree comprising a plurality of style sheet rules; converting the document tree into a navigation tree using the style sheet rules, navigation tree associated with a vocabulary having one or more keywords, the navigation tree including one or more content nodes and routing nodes defining paths of the navigation tree, each content node including some portion of the content and a keyword associated with the respective portion of the content, each routing node including at least one keyword referencing other nodes in the navigation tree; receiving a request to access the content; and traversing a path in the navigation tree, adding keywords included in any node along the traversed path to the vocabulary in response to the request.

14. The method of claim 13 wherein the request is in the form of speech.

15. The method of claim 13 comprising: generating a first speech recognition result indicating whether the request includes any keyword of the vocabulary; assigning a first confidence score to the first speech recognition result; and rejecting the request, if the first confidence score is below a rejection threshold.

16. The method of claim 11 comprising: accepting the request if the first confidence score is greater than a recognition threshold.

17. The method of claim 16 wherein the first confidence score is between the rejection threshold and the recognition threshold, the method comprising searching the vocabulary to find one or more matches for any keyword including in the request.

18. The method of claim 15 comprising: providing a first group of keywords included in the vocabulary from which to select if the first confidence score is below the rejection threshold; generating a second speech recognition result in response to a selection from the first group; and assigning a second confidence score to the second speech recognition result.

19. The method of claim 15 wherein generating comprises: deriving a first phonetic pronunciation based on the request; deriving a second phonetic pronunciation based on at least one keyword of the vocabulary; and comparing the first phonetic pronunciation with the second phonetic pronunciation.

20. A method of claim 19 further comprising selecting a keyword from the vocabulary based on said comparison.

21. A method of navigating a navigation tree derived from a document having content in conventional markup language format, the navigation tree having a plurality of nodes, the navigation tree associated with a grammar comprising a vocabulary and corresponding rules, said method comprising: visiting a first node in the navigation tree; moving from the first node to a second node in the navigation tree in response to the user request, the second node having at least one keyword; and expanding the grammar by adding to the vocabulary the keyword of the second node.

22. The method of claim 21, wherein the keyword of the second node identifies content included in the second node.

23. The method of claim 21 comprising providing an error message, if the user request is not recognized.

24. The method of claim 21, comprising: comparing the request against one or more keywords included in the vocabulary; and recognizing the request if the request is sufficiently similar to one of the keywords.

25. The method of claim 24, wherein recognizing comprises: selecting a number of keywords from the vocabulary that are similar to the request; for each selected keyword, assigning a value to the selected keyword based on how similar selected keyword is to the request; and recognizing the keyword with the highest value.

26. The method of claim 25, comprising resolving an ambiguity in recognizing the request if the selected keyword with the highest value is below a recognition threshold.

27. The method of claim 26, wherein resolving comprises prompting the user to choose from one of the selected keywords.

28. The method of claim 21, comprising expanding the grammar by adding to the vocabulary any keywords associated with the nodes proximate the first node.

29. The method of claim 21, wherein the grammar is generated after the first node is visited.

30. The method of claim 21, wherein the grammar is generated before the first node is visited.

31. The method of claim 21, comprising building a greeting based on the keyword of the second node.

32. The method of claim 21, further comprising: generating a prompt based on the portion of the content included in the first node; playing the prompt to provide a plurality of options to select from the portion of the content included in the first node.

33. The method of claim 21, wherein the first node is a routing node which refers to other nodes in the navigation tree.

34. The method of claim 33, further comprising: generating a prompt based on the other nodes referred to by the first node; and playing the prompt to provide a plurality of options for moving from the first node to one of the other nodes.

35. The method of claim 21, wherein the first node is a form node associated with one or more editable fields.

36. The method of claim 35, comprising generating a prompt based on the editable fields.

37. The method of claim 36, comprising playing the prompt to provide a plurality of options for selecting from the editable fileds.

38. The method of claim 36, comprising moving through the editable fields in a prearranged order.

39. A method of navigating a navigation tree derived from a document having content in conventional markup language format, the navigation tree having a plurality of nodes, the navigation tree associated with a grammar comprising a vocabulary and corresponding rules, said method comprising: visiting a first node in the navigation tree; moving from the first node to a second node in the navigation tree in response to the user request, the second node having at least one keyword; and expanding the grammar by adding to the vocabulary the keyword of the second node; indicating that the first node is visited by providing a first message; and indicating that no user request has been received by providing a second message.

40. The method of claim 39, comprising providing a third message with one or more options if no user request is received in response to the second message.

41. The method of claim 39, wherein the first node is a content node having at least a portion of the content, the method comprising: providing a third message with one or more options to select from the portion of the content associated with the content node.

42. The method of claim 39, wherein the first node is a routing node which refers to the nodes of the navigation tree, the method comprising: providing a third message with one or more options for moving to the other nodes.

43. The method of claim 42, wherein the first node is a form node having one or more editable fields, the method comprising: providing a third message, with one or more options to select from one or more editable fields.

44. A method of navigating a routing node in a navigation tree derived from a document having content formatted in conventional markup language format, the navigation tree having a default grammar and a plurality of nodes, each node associated with one or more keywords, said method comprising: visiting a first node in the navigation tree, the first node referencing at least a second node; generating a navigation grammar by adding to the default grammar one or more keywords associated with the second node; generating an output message based on said one or more keywords; playing the output message; waiting to receive a user request responsive to the output message; matching the request against the keywords included in the navigation grammar; recognizing the request, if a match is found between the request and one or more of the keywords included in the navigation grammar; rejecting the request, if a close match is not found; and resolving ambiguities in the request, if the request is neither recognized nor rejected.

45. The method of claim 44, wherein the navigation grammar includes rules corresponding to said one or more keywords, the method further comprising: visiting the second node based on navigation rules corresponding with the keyword matched with the request, if the request is recognized.

46. The method of claim 45, wherein the second node references at least a third node associated with one or more keywords, said method further comprising: expanding the navigation grammar by adding to the navigation grammar the keywords associated with the third node.

47. The method of claim 45, further comprising: narrowing the navigation grammar by deleting from the navigation grammar the keywords associated with the second node; and expanding the navigation grammar by adding to the navigation grammar keywords associated with the third node.

48. The method of claim 44, further comprising: waiting to receive a user request regardless of whether or not the output message is generated or played.

49. The method of claim 45, further comprising: initializing a timeout counter when visiting the second node.

50. The method of claim 48, further comprising: playing a first timeout message, if a first time period has passed and no user request is received; and incrementing the timeout counter.

51. The method of claim 50, further comprising: playing a second timeout message, if a second time period has passed and no user request is received, wherein the second timeout message is different from the first timeout message; and incrementing the timeout counter.

52. The method of claim 51, further comprising: playing a last resort timeout message, if the timeout counter has reached a threshold value.

53. The method of claim 45, further comprising: initializing a help counter when visiting the second node.

54. The method of claim 53, further comprising: playing a first help message, in response to a first help request submitted while visiting the first node; and incrementing the help counter.

55. The method of claim 54, further comprising: playing a second help message, in response to a second help request submitted while visiting the first node, wherein the second help message is different from the first help message; and incrementing the help counter.

56. The method of claim 55, further comprising: playing a last resort help message, if the help counter has reached a threshold value.

57. The method of claim 45, further comprising: initializing the rejection counter when visiting the second node.

58. The method of claim 57, further comprising: playing a first rejection message, if the user request is not accepted, while visiting the first node; and incrementing the rejection counter.

59. The method of claim 58, further comprising: playing a second rejection message, if the user request is not accepted a second time, while visiting the first node; incrementing the rejection counter; and playing a last resort rejection message if the rejection counter has reached a threshold.

60. A method of navigating a form node in a navigation tree derived from a document having content formatted in conventional markup language format, the navigation tree having a default grammar and one or more nodes, said method comprising: visiting a first node in a navigation tree, said first node referencing one or more fields, each field defined by at least a keyword; building a navigation grammar by adding to the default grammar one or more keywords defining said one or more fields; determining if the first node is navigable; if the first node is navigable then performing the following actions: generating a first output message based on the keywords defining the fields, providing the option to select from one or more of said fields; playing the first output message; receiving a user request responsive to the first output message; matching the request against the keywords included in the navigation grammar; recognizing the request, if a close match is found between the request and one or more keywords included in the navigation grammar; rejecting the request, if a close match is not found; resolving ambiguities in the request, if a match is not recognized or rejected; visiting a field defined by the keyword matched with the request, if the request is recognized; building a second output message based on the keyword matched with the request, providing an option to edit the field visited; playing the second output message; receiving a second user request to edit the field visited, responsive to the second output message; and editing the field visited in response to said second user request.

61. The method of claim 60, further comprising: if the first node is not navigable then performing the following actions: visiting said one or more fields; building a second output message for a visited field based on the keyword defining that field; playing the second output message providing an option to edit the field; receiving a second user request to edit the field, responsive to said second output message; and editing the field in response to said second user request.

62. A method of navigating a content node in a navigation tree derived from a document having content formatted in conventional markup language format, the navigation tree associated a default grammar, said method comprising: visiting a first node in a navigation tree, said first node referencing first content and a second content included in a conventional markup language document, each content defined by at least a keyword; generating a navigation grammar by adding to the default grammar keywords defining the first content and the second content; playing the first content; and playing the second content.

63. The method of claim 62, further comprising: building an output message based on the keywords defining the first content and the second content, providing the option to select one of the contents; playing the output message; receiving a user request responsive to the output message; matching the request against the keywords included in the navigation grammar; recognizing the request, if a match is found between the request and one or more of the keywords included in the navigation grammar and playing the content defined by the keyword matching the request; rejecting the request, if a close match is not found; and resolving any ambiguities in the request, if the request is not recognized or rejected.

Description



FIELD OF THE INVENTION

[0001] The invention relates generally to data communications and, in particular, to a system and method for browsing using a limited display device.

BACKGROUND

[0002] The advent of a worldwide communication network known as the Internet has provided us with relatively instant access to an abundance of information, such as daily news, stock quotes, and other content in electronic documents available in the public domain. This information is stored in electronic file systems that are connected to create what is known as the World Wide Web (WWW). The content stored in these file systems is provided in the form of web pages that are typically linked to create one or more web sites. A person can access and view the content of a web page using a conventional web browser program, such as Microsoft's Internet Explorer or Netscape's Communicator that runs on a computer system.

[0003] Web pages typically include electronic files or documents formatted in a programming language such as Hyper-Text Markup Language (HTML) or eXtensible Markup Language (XML). Although these languages are suitable for presenting information on a desktop computer, they are generally not well suited for devices such as cellular telephones or web enabled personal digital assistants (PDAs) with limited display capability. Furthermore, neither conventional web browsers nor conventional markup languages support or allow users to readily access typical web pages available on the Internet via voice commands or commands from limited display devices.

[0004] Efforts have been made to address such problems. For example, voice-enabling languages, such as, Voice Extensible Markup Language (VoiceXML) have been developed. Unlike the conventional markup languages (e.g., HTML and XML), VoiceXML enables the delivery of information via voice commands. However, any information which is desirably delivered with VoiceXML must be separately constructed in that language, apart from the conventional markup languages. Because most web sites on the Internet do not provide separate VoiceXML capability, much of the information on the Internet is still largely inaccessible via voice commands, or limited display devices.

[0005] Systems and corresponding methods for efficiently accessing content stored on communication networks using voice commands or limited display devices are desirable.

SUMMARY

[0006] According to an embodiment of the invention, systems and corresponding methods are provided to allow a user to access web content stored on a web server in a communications network, by using voice commands or a web enabled limited display device. The system includes an interface for receiving requests for content from the user and a processor coupled to the interface for retrieving one or more conventional markup language documents stored on a web server. The processor converts the conventional markup language document into a navigation tree that provides a semantic, hierarchical structure that includes some or all of the content included in the web pages presented by the conventional markup language documents. The system prunes out or converts unsuitable information, such as high definition images, that cannot be practically displayed or communicated to the user on a limited display device or via voice.

[0007] A technical advantage of the invention includes browsing content available from a communication network (e.g., the Internet) using voice commands, for example, from any telephone, wireless personal digital assistant, or other device with limited display capability. This system and method for voice browsing navigates through the content and delivers the same, for example, in the form of generated speech. The system and method can voice-enable any content formatted in a conventional, Internet-accessible markup language (e.g., HTML and XML), thus offering an unparalleled experience for users.

[0008] In one embodiment, the system generates one or more navigation trees from the conventional markup language documents. A navigation tree organizes the content of a web page into an outline or hierarchical structure that takes into account the meaning of the content, and thus can be used for semantic retrieval of the content. A navigation tree supports voice-based browsing of web pages. For documents formatted in various conventional markup languages, respective default style sheet (e.g., xCSS) documents may be provided for use in generating the navigation trees. Each style sheet document may contain metadata, such as declarative statements and procedural statements.

[0009] For each conventional markup language document, the system may construct a document tree comprising a number of nodes. The rules or declarative statements contained in a suitable style sheet document are used to modify the document tree, for example, by adding or modifying attributes at each node of the document tree, deleting unnecessary nodes, or filtering other nodes. If procedural statements are present in the style sheet document, the system and method may apply these procedures directly to construct the navigation tree. If there are no such procedural statements, the system and method may apply a simple mapping procedure to convert the document tree into the navigation tree.

[0010] In certain embodiments of the system, the navigation tree includes one or more branches. Each branch includes one or more nodes. Each node includes or is associated with one or more keywords, phrases, commands, or other information. These keywords, phrases, or commands are associated with corresponding web pages of a web site based on the content included in the web site and established connections or links among the web pages. A user, using the system, can navigate through the web pages and access the content stored on the site by traversing the nodes in the navigation tree.

[0011] Using voice commands, in one embodiment, a user may direct the system to perform the following operations, for example: browse the content of a web page, jump to a specific web page, move forward or backward within web pages or websites, make a selection from the content of a web page, edit input fields in a web page, and confirm selections or inputs to a web page. Each operation is associated with a separate command, keyword, or phrase. Once the system recognizes such command, keyword, or phrase provided by a user then the operation is performed.

[0012] A command is recognized if it is included in the system's navigation grammar. The navigation grammar includes vocabulary and navigation rules corresponding to the contents of the vocabulary. In some embodiments, to improve recognition efficiency, the system is implemented to include more than one voice recognition mode. In some modes the grammar is expanded while in other modes the grammar is narrowed. Expanding the grammar's vocabulary allows for more commands to be recognized. The larger the vocabulary, however, the higher are the possibilities for failure in accurate recognition.

[0013] Thus, in some embodiments the grammar is narrowed to maximize recognition. For example, in one recognition mode the grammar's vocabulary includes basic navigation commands that allow a user to navigate from a node to the node's immediate children, siblings, or parents. In another recognition mode, in addition to the basic navigation commands, the vocabulary may be expanded to include terms that allow navigating to nodes other than children, siblings, or parents of a node. As such, in the latter mode, navigation is not limited only to the immediately neighboring nodes.

[0014] In accordance with one embodiment, a method of accessing content from a communication network comprises: providing a navigation tree comprising a semantic, hierarchical structure, having one or more paths associated with content of a conventional markup language document and a grammar comprising vocabulary including one or more keywords; receiving a request to access the content; responsive to the request, traversing a path in the navigation tree, if the request includes at least one keyword of the vocabulary.

[0015] In certain embodiments, if the keyword included in the request is not included in the navigation vocabulary, the vocabulary is searched to find a close match for the command. If a match is found and confirmed, then the system operates to satisfy the command. If a match is not found or not confirmed, then one or more other commands included in the vocabulary are provided for selection. If the commands provided are not confirmed, then the system rejects the user request.

[0016] In accordance with one or more embodiments, a method performed on a computer for browsing content available from a communication network comprises: receiving a document containing the content in a conventional markup language format and a style sheet for the document; generating a document tree from the document; generating a style tree from the style sheet, the style tree comprising a plurality of style sheet rules; converting the document tree into a navigation tree using the style sheet rules, navigation tree associated with a vocabulary having one or more keywords the navigation tree including one or more content nodes and routing nodes defining paths of the navigation tree, each content node including some portion of the content and a keyword associated with the respective portion of the content, each routing node including at least one keyword referencing other nodes in the navigation tree; receiving a request to access the content; and traversing a path in the navigation tree, adding any key words included in any node along the traversed path to the vocabulary in response to the request.

[0017] In one embodiment, speech recognition is used to recognize the command or keyword included in the request and a confidence score is assigned to the result of the speech recognition. If the confidence score is below a rejection threshold, the request is rejected. Alternatively, if the confidence score is greater than a recognition threshold, then the request is accepted. Where the confidence score is between the rejection threshold and the recognition threshold, the result is considered ambiguous. To resolve the ambiguity of the result, the system searches the grammar's vocabulary to find one or more close matches for the command or keyword and narrows the grammar to include said one or more close matches.

[0018] If any close matches are found, then the system provides said one or more close matches for selection. The system then queries the user to confirm whether or not the closest match recognized by the system is in fact the command meant to be conveyed by the user. If so, the command is recognized and performed. Otherwise, the system fails to recognize the command and provides the user with one or more help messages. The help messages are designed to narrow the grammar, guide the user, and allow him/her to repeat the request. The system counts the number of recognition failures and provides a variety of different help messages to assist the user. As a last resort, the system reverts back to a previous navigation step and allows the user to start over, for example.

[0019] The system is designed to dynamically build the navigation grammar based on keywords or other vocabulary included in the nodes of the navigation tree. Since the grammar is built dynamically, in certain embodiments, the grammar built at each navigation instance is specific to the navigation route selected by the user. In some navigation modes the system is designed to streamline and narrow the vocabulary included in the grammar to those keywords and commands that are relevant to the tree branch being traversed at the time. A smaller grammar maximizes recognition accuracy by reducing the possibilities of failure in recognition. As such, narrowing the grammar at each stage allows the system to detect and process user commands more accurately and efficiently.

[0020] In some embodiments, the system includes a default grammar. The default grammar includes the basic commands and rules that allow a user to perform basic navigable operations. Examples of basic navigable operations include moving forward or backward in navigation steps or returning to the home page of a web site. Help and assist features are included in one or more embodiments of the system to detect commands that are ambiguous or vague and to guide a user on how to properly navigate or command the system.

[0021] According to another embodiment of the invention, a computer system for allowing a user of a limited display device to browse content available from a communication network includes a gateway module. The gateway module is operable to receive a user request, and to recognize the request. A browser module, in communication with the gateway module, is operable to retrieve a conventional markup language document and a style sheet document from the communication network in response to the request.

[0022] The conventional markup language document contains content; the style sheet document contains metadata. The browser module is operable to generate a navigation tree using the conventional markup language document and the style sheet document. The navigation tree provides a semantic, hierarchical structure for the content. The gateway module and the browser module cooperate to enable the user to browse the content using the navigation tree and to generate output conveying the content to the user via the limited display device.

[0023] Other aspects and advantages of the invention will be more fully understood from the following descriptions and accompanying drawings.

[0024] BRIEF DESCRIPTION OF THE DRAWINGS

[0025] FIG. 1A illustrates an exemplary environment in which a voice browsing system, according to an embodiment of the invention, may operate.

[0026] FIG. 1B illustrates another exemplary environment in which a voice browsing system, according to an embodiment of the invention, may operate.

[0027] FIG. 2 is a block diagram of a voice browsing system, according to an embodiment of the invention.

[0028] FIG. 3 is a block diagram of a navigation tree builder component, according to an embodiment of the invention.

[0029] FIG. 4 is a block diagram of a tree converter, according to an embodiment of the invention.

[0030] FIG. 5 illustrates an exemplary document tree, according to an embodiment of the invention.

[0031] FIG. 6 illustrates an exemplary navigation tree, according to an embodiment of the invention.

[0032] FIG. 7 illustrates a computer-based system which is an exemplary hardware implementation for the voice browsing system, according to an embodiment of the invention.

[0033] FIG. 8 is a flow diagram of an exemplary method for browsing content with voice commands, according to an embodiment of the invention.

[0034] FIG. 9 is a block diagram of exemplary nodes in a navigation tree, according to an embodiment of the invention.

[0035] FIG. 10 is a flow diagram illustrating a method of navigating a routing node, according to an embodiment of the invention.

[0036] FIG. 11 is a flow diagram illustrating a method of navigating a form node, according to an embodiment of the invention.

[0037] FIG. 12 is a flow diagram illustrating a method of navigating a content node, according to an embodiment of the invention.

[0038] FIG. 13 is a flow diagram illustrating a method of providing a user with assistance, according to an embodiment of the invention.

[0039] FIG. 14 is a flow diagram illustrating a method of processing a user request, according to an embodiment of the invention.

[0040] FIG. 15 is a flow diagram illustrating one or more navigation modes, according to an embodiment of the invention.

[0041] FIG. 16 is a flow diagram illustrating a method of voice recognition, according to an embodiment of the invention.

[0042] FIG. 17 is a flow diagram of an exemplary method for generating a navigation tree, according to an embodiment of the invention.

[0043] FIG. 18 is a flow diagram of an exemplary method for applying style sheet rules to a document tree, according to an embodiment of the invention.

[0044] FIG. 19 is a flow diagram of an exemplary method for applying heuristic rules to a document tree, according to an embodiment of the invention.

[0045] FIG. 20 is a flow diagram of an exemplary method for mapping a document tree into a navigation tree, according to an embodiment of the invention.

[0046] Features, elements, and aspects of the invention that are referenced by the same numerals in different figures represent the same, equivalent, or similar features, elements, or aspects in accordance with one or more embodiments of the system.

DETAILED DESCRIPTION

[0047] The invention and its advantages, according to one or more embodiments, are best understood by referring to FIGS. 1-20 of the drawings. Like numerals are used for like and corresponding parts of the various drawings. The invention, its advantages, and various embodiments are described in detail below. Certain aspects of the invention are described in more detail in U.S. patent application Ser. No. 09/614,504
(Attorney Matter No. M-8247 US), filed Jul. 11, 2000, entitled "System And Method For Accessing Web Content Using Limited Display Devices," with a claims of priority under 35 U.S.C. .sctn. 119(e) to Provisional Application No. 60/142,429, (Attorney Matter No. P-8247 US), filed Nov. 9, 1999, entitled "System And Method For Accessing Web Content Using Limited Display Devices." The entire content of the above-referenced applications is incorporated by referenced herein.

[0048] Turning first to the nomenclature of the specification, the detailed description which follows is represented largely in terms of processes and symbolic representations of operations performed by conventional computer components, such as a local or remote central processing unit (CPU) or processor associated with a general purpose computer system, memory storage devices for the processor, and connected local or remote pixel-oriented display devices. These operations include the manipulation of data bits by the processor and the maintenance of these bits within data structures resident in one or more of the memory storage devices. Such data structures impose a physical organization upon the collection of data bits stored within computer memory and represent specific electrical or magnetic elements. These symbolic representations are the means used by those skilled in the art of computer programming and computer construction to most effectively convey teachings and discoveries to others skilled in the art.

[0049] For purposes of this discussion, a process, method, routine, or sub-routine is generally considered to be a sequence of computer-executed steps leading to a desired result. These steps generally require manipulations of physical quantities. Usually, although not necessarily, these quantities take the form of electrical, magnetic, or optical signals capable of being stored, transferred, combined, compared, or otherwise manipulated. It is conventional for those skilled in the art to refer to these signals as bits, values, elements, symbols, characters, text, terms, numbers, records, files, or the like. It should be kept in mind, however, that these and some other terms should be associated with appropriate physical quantities for computer operations, and that these terms are merely conventional labels applied to physical quantities that exist within and during operation of the computer.

[0050] It should also be understood that manipulations within the computer are often referred to in terms such as adding, comparing, moving, searching, or the like, which are often associated with manual operations performed by a human operator. It must be understood that no involvement of the human operator may be necessary, or even desirable, in the invention. The operations described herein are machine operations performed in conjunction with the human operator or user that interacts with the computer or computers.

[0051] In addition, it should be understood that the programs, processes, methods, and the like, described herein are but an exemplary implementation of the invention and are not related, or limited, to any particular computer, apparatus, or computer language. Rather, various types of general purpose computing machines or devices may be used with programs constructed in accordance with the teachings described herein. Similarly, it may prove advantageous to construct a specialized apparatus to perform the method steps described herein by way of dedicated computer systems with hard-wired logic or programs stored in non-volatile memory, such as read-only memory (ROM).

[0052] Exemplary Environment

[0053] FIG. 1A illustrates an exemplary environment in which a voice browsing system 10, according to an embodiment of the invention, may operate. In this environment, one or more content providers 12 may provide content to any number of interested users. Each content provider can be an entity which operates or maintains a portal or any other web site through which content can be delivered. Each portal or web site, which can be supported by a suitable computer system or web server, may include one or more web pages at which content is made available. Each web site or web page can be identified by a respective uniform resource locator (URL).

[0054] Content can be any data or information that is presentable (visually, audibly, or otherwise) to users. Thus, content can include written text, images, graphics, animation, video, music, voice, and the like, or any combination thereof. Content can be stored in digital form, such as, for example, a text file, an image file, an audio file, a video file, etc. This content can be included in one or more web pages of the respective portal or web site maintained by each content provider 12.

[0055] These web pages can be supported by documents formatted in a conventional, Internet-accessible markup language, such as, for example, Hyper-Text Markup Language (HTML) and eXtensible Markup Language (XML). HTML and XML are markup language standards set by the World Wide Web Consortium (W3C) for Internet-accessible documents. In general, conventional markup languages provide formatting and structure for content that is to be presented visually. That is, conventional markup languages describe the way that content should be displayed, for example, by specifying that text should appear in boldface, which location a particular image should appear, etc. In markup languages, tags are added or embedded within content to describe how the content should be formatted and displayed. A conventional, Internet-accessible markup language document can be the source page for any browser on a computer.

[0056] Along with the content, each content provider 12 may also maintain metadata that can be used to guide the construction of a semantic representation for the content. Metadata may include, for example, declarative statements (rules) and procedural statements. This metadata can be contained in one or more style sheet documents, which are essentially templates that apply formatting and style information to the elements of a web page. A style sheet document can be, for example, an extended Cascading Style Sheet (xCSS) document. In one embodiment, a separate default style sheet documents may be provided for each conventional markup language (e.g., HTML or XML). As an alternative to style sheets, metadata can be contained in documents formatted in a suitable descriptive language such as Resource Description Framework. Using style sheet documents (or other appropriate documents), auxiliary metadata can be applied to a web page supported by a conventional markup language document.

[0057] One or more communication networks, such as the Internet 14, can be used to deliver content. Internet 14 is an interconnection of computer clients and servers located throughout the world and exchanging information according to Transmission Control Protocol/Internet Protocol (TCP/IP), Internetwork Packet eXchange/Sequence Packet eXchange (IPX/SPX), AppleTalk, or other suitable protocol. Internet 14 supports the distributed application known as the "World Wide Web." As described herein, web servers maintain web sites, each comprising one or more web pages at which information is made available for viewing.

[0058] Each web site or web page may be supported by documents formatted in any suitable conventional markup language (e.g., HTML or XML). Clients may locally execute a conventional web browser program. A conventional web browser is a computer program that allows exchange information with the World Wide Web. Any of a variety of conventional web browsers are available, such as NETSCAPE NAVIGATOR from Netscape Communications Corp., INTERNET EXPLORER from Microsoft Corporation, and others that allow convenient access and navigation of the Internet 14. Information may be communicated from a web server to a client using a suitable protocol, such as, for example, Hypertext Transfer Protocol (HTTP) or File Transfer Protocol (FTP).

[0059] A service provider 16 is connected to Internet 14. As used herein, the terms "connected," "coupled," or any variant thereof, mean any connection or coupling, either direct or indirect, between two or more elements; such connection or coupling can be physical or logical. Service provider 16 may operate a computer system that appears as a client on Internet 14 to retrieve content and other information from content providers 12.

[0060] In general, service provider 16 can be an entity that delivers services to one or more users. These services may include telephony and voice services, including plain old telephone service (POTS), digital services, cellular service, wireless service, pager service, etc. To support the delivery of services, service provider 16 may maintain a system for communicating over a suitable communication network, such as, for example, a telecommunications network. Such telecommunications network allows communication via a telecommunications line, such as an analog telephone line, a digital T1 line, a digital T3 line, or an OC3
telephony feed.

[0061] The telecommunications network may include a public switched telephone network (PSTN) and/or a private system (e.g., cellular system) implemented with a number of switches, wire lines, fiber-optic cable, land-based transmission towers, space-based satellite transponders, etc. In one embodiment, the telecommunications network may include any other suitable communication system, such as a specialized mobile radio (SMR) system. As such, the telecommunications network may support a variety of communications, including, but not limited to, local telephony, toll (i.e., long distance), and wireless (e.g., analog cellular system, digital cellular system, Personal Communication System (PCS), Cellular Digital Packet Data (CDPD), ARDIS, RAM Mobile Data, Metricom Ricochet, paging, and Enhanced Specialized Mobile Radio (ESMR)).

[0062] The telecommunications network may utilize various calling protocols (e.g., Inband, Integrated Services Digital Network (ISDN) and Signaling System No. 7 (SS7) call protocols) and other suitable protocols (e.g., Enhanced Throughput Cellular (ETC), Enhanced Cellular Control (EC.sup.2), MNP10, MNP10-EC, Throughput Accelerator (TXCEL), Mobile Data Link Protocol, etc.). Transmissions over the telecommunications network system may be analog or digital. Transmission may also include one or more infrared links (e.g., IRDA).

[0063] One or more limited display devices 18 may be coupled to the network maintained by service provider 16. Each limited display device 18
may comprise a communication device with limited capability for visual display. Thus, a limited display device 18 can be, for example, a wired telephone, a wireless telephone, a smart phone, a wireless personal digital assistant (PDA), and Internet televisions. Each limited display device 18 supports communication by a respective user, for example, in the form of speech, voice, or other audible information. Limited display devices 18 may also support dual tone multi-frequency (DTMF) signals.

[0064] Voice browsing system 10, as depicted in FIG. 1A, may be incorporated into a system maintained by service provider 16. Voice browsing system 10 is a computer-based system which generally functions to allow users with limited display devices 18 to browse content provided by one or more content providers 12 using, for example, spoken/voice commands or requests. In response to these commands or requests, voice browsing system 10, acting as a client, interacts with content providers 12 via Internet 14 to retrieve the desired content. Then, voice browsing system 10 delivers the desired content in the form of audible information to the limited display devices 18. To accomplish this, in one embodiment, voice browsing system 10 constructs or generates navigation trees using style sheet documents to supply metadata to conventional markup language (e.g., HTML or XML) documents.

[0065] Navigation trees are semantic representations of web pages that serve as interactive menu dialogs to support voice-based search by users. Each navigation tree may comprise a number of content nodes and routing nodes. Content nodes contain or are associated with content from a web page that can be delivered to a user. Content included or associated with a node is stored in the form of electrical signals on a storage medium such that when a node is visited by a user the content is accessible by a user. Routing nodes implement options that can be selected to move to other nodes. For example, routing nodes may provide prompts for directing the user to content at content nodes. Thus, routing nodes link the content of a web page in a meaningful way. Navigation trees are described in more detail herein.

[0066] Voice browsing system 10 thus provides a technical advantage. A voice-based browser is crucial for users having limited display devices 18 since a visual browser is inappropriate for, or simply cannot work with, such devices. Furthermore, voice browsing system 10 leverages on the existing content infrastructure (i.e., documents formatted in conventional markup languages, such as, HTML or XML) maintained by content providers 12. That is, the existing content infrastructure can serve as an easy-to-administer, single source for interaction by both complete computer systems (e.g., desktop computer) and limited display devices 18 (e.g., wireless telephones or wireless PDAs). As such, content providers 12 are not required to re-create their content in other formats, deploy new markup languages (e.g., VoiceXML), or implement additional application programming interfaces (APIs) into their back-end systems to support other formats and markup languages.

[0067] Another Exemplary Environment

[0068] FIG. 11B illustrates another exemplary environment within which a voice browsing system 10, according to an embodiment of the invention, can operate. In this environment, voice browsing system 10 may be implemented within the system of a content provider 12. Content provider 12 can be substantially similar to that previously described with reference to FIG. 1A. That is, content provider 12 can be an entity which operates or maintains a portal or any other web site through which content can be delivered. Such content can be included in one or more web pages of the respective portal or web site maintained by content provider 12.

[0069] Each web page can be supported by documents formatted in a conventional markup language, such as Hyper-Text Markup Language (HTML) or eXtensible Markup Language (XML). Along with the conventional markup language documents, content provider 12 may also maintain one or more style sheet (e.g., extended Cascading Style Sheet (xCSS)) documents containing metadata that can be used to guide the construction of a semantic representation for the content.

[0070] A network 20 is coupled to content provider 12. Network 20 can be any suitable network for communicating data and information. This network can be a telecommunications or other network, as described with reference to FIG. 1A, supporting telephony and voice services, including plain old telephone service (POTS), digital services, cellular service, wireless service, pager service, etc.

[0071] A number of limited display devices 18 are coupled to network 20. These limited display devices 18 can be substantially similar to those described with reference to FIG. 1A. That is, each limited display device 18 may comprise a communication device with limited capability for visual display, such as, for example, a wired telephone, a wireless telephone, a smart phone, or a wireless personal digital assistant (PDA). Each limited display device 18 supports communication by a respective user, for example, in the form of speech, voice, or other audible information.

[0072] In operation for this environment, voice browsing system 10 again generally functions to allow users with limited display devices 18 to browse content provided by one or more content providers 12 using, for example, spoken/voice commands or requests. In this environment, however, because voice browsing system 10 is incorporated at content provider 12, content provider 12 may directly receive, process, and respond to these spoken/voice commands or requests from users. For each command/request, voice browsing system 10 retrieves the desired content and other information at content provider 12. The content can be in the form of markup language (e.g., HTML or XML) documents, and the other information may include metadata in the form of style sheet (e.g., xCSS) documents. Voice browsing system 10 may construct or generate navigation trees using the style sheet documents to supply metadata to the conventional markup language documents. These navigation trees then serve as interactive menu dialogs to support voice-based search by users.

[0073] Voice Browsing System

[0074] FIG. 2 is a block diagram of a voice browsing system 10, according to an embodiment of the invention. In general, voice browsing system 10
allows a user of a limited display device 18 to browse the content available from any one or more content providers 12 using spoken/voice commands or requests. As depicted, voice browsing system 10 includes a gateway module 30 and a browser module 32.

[0075] Gateway module 30 generally functions as a gateway to translate data/information between one type of network/computer system and another, thereby acting as an interface. In the context for the invention, gateway module 30 translates data/information between a network supporting limited display devices 18 (e.g., a telecommunications network) and the computer-based system of voice browsing system 10. For the network supporting the limited display devices, data/information can be in the form of speech or voice.

[0076] The functionality of gateway module 30 can be performed by one or more suitable processors, such as a main-frame, a file server, a work station, or other suitable data processing facility supported by memory (either internal or external), running appropriate software, and operating under the control of any suitable operating system (OS), such as MS-DOS, Macintosh OS, Windows NT, Windows 95, OS/2, Unix, Linux, Xenix, and the like. Gateway module 30, as shown, comprises a computer telephony interface (CTI)/personal digital assistant (PDA) component 34, an automated speech recognition (ASR) component 36, and a text-to-speech (TTS) component 38. Each of these components 34, 36, and 38 may comprise one or more programs which, when executed, perform the functionality described herein. CTI/PDA component 34 generally functions to support communication between voice browsing system 10 and limited display devices. CTI/PDA component 34 may comprise one or more application programming interfaces (API) for communicating in any protocol suitable for public switch telephone network (PSTN), cellular telephone network, smart phones, pager devices, and wireless personal digital assistant (PDA) devices. These protocols may include hypertext transport protocol (HTTP), which supports PDA devices, and PSTN protocol, which supports cellular telephones.

[0077] Automated speech recognition component 36 generally functions to recognize speech/voice commands and requests issued by users into respective limited display devices 18. Automated speech recognition component 36 may convert the spoken commands/requests into a text format. Automated speech recognition component 36 can be implemented with automatic speech recognition software commercially available, for example, from the following companies: Nuance Corporation of Menlo Park, Calif.; Speech Works International, Inc. of Boston, Mass.; Lernout & Hauspie Speech Products of leper, Belgium; and Phillips International, Inc. of Potomac, Md. Such commercially available software typically can be modified for particular applications, such as a computer telephony application.

[0078] Text-to-speech component 36 generally functions to output speech or vocalized messages to users having a limited display device 18. This speech can be generated from content that has been retrieved from a content provider 12 and reformatted within voice browsing system 10, as described herein. Text-to-speech component 38 synthesizes human speech by "speaking" text, such as that which can be part of the content. Software for implementing text-to-speech component 76 is commercially available, for example, from the following companies: Lemout & Hauspie Speech Products of leper, Belgium; Fonix Inc. of Salt Lake City, Utah; Centigram Communications Corporation of San Jose, Calif.; Digital Equipment Corporation (DEC) of Maynard, Mass.; Lucent Technologies of Murray Hill, N.J.; and Microsoft Inc. of Redmond, Wash.

[0079] Browser module 32, coupled to gateway module 30, functions to provide access to web pages (of any one or more content providers 12) using Internet protocols and controls navigation of the same. Browser module 32 may organize the content of any web page into a structure that is suitable for browsing by a user using a limited display device 18. Afterwards, browser module 32 allows a user to browse such structure, for example, using voice or speech commands/requests.

[0080] The functionality of browser module 32 can be performed by one or more suitable processors, such as a main-frame, a file server, a work station, or other suitable data processing facility supported by memory (either internal or external), running appropriate software, and operating under the control of any suitable operating system (OS), such as MS-DOS, Macintosh OS, Windows NT, Windows 95, OS/2, Unix, Linux, Xenix, and the like. Such processors can be the same or separate from that which perform the functionality of gateway module 30.

[0081] As depicted, browser module 32 comprises a navigation tree builder component 40 and a navigation agent component 42. Each of these components 40 and 42 may comprise one or more programs which, when executed, perform the functionality described herein.

[0082] Navigation tree builder component 40 may receive conventional, Internet-accessible markup language (e.g., XML or HTML) documents and associated style sheet (e.g., xCSS) documents from one or more content providers 12. Using these markup language and style sheet documents, navigation tree builder component 40 generates navigation trees that are semantic representations of web pages. In general, each navigation tree provides a hierarchical menu by which users can readily navigate the content of a conventional markup language document. Each navigation tree may include a number of nodes, each of which can be either a content node or a routing node. A content node comprises content that can be delivered to a user. A routing node may implement a prompt for directing the user to other nodes, for example, to obtain the content at a specific content node.

[0083] Navigation agent component 42 generally functions to support the navigation of navigation trees once they have been generated by navigation tree builder component 40. Navigation agent component 42 may act as an interface between browser module 32 and gateway module 30 to coordinate the movement along nodes of a navigation tree in response to any commands and requests received from users.

[0084] In exemplary operation, a user may communicate with voice browsing system 10 to obtain content from content providers 12. To do this, the user, via limited display device 18, places a call which initiates communication with voice browsing system 10, as supported by CTI/PDA component 34 of gateway module 30. The user then issues a spoken command or request for content, which is recognized or interpreted by automatic speech recognition component 36. In response to the recognized command/request, browser module 32 accesses a web page containing the desired content (at a web site or portal operated by a content provider 12) via Internet 14 or other communication network. Browser module 32
retrieves one or more conventional markup language and associated style sheet documents from the content provider.

[0085] Using these markup language and style sheet documents, navigation tree builder component 40 creates one or more navigation trees. The user may interact with voice browsing system 10, as supported by navigation agent component 42, to navigate along the nodes of the navigation trees. During navigation, gateway module 30 may convert the content at various nodes of the navigation trees into audible speech that is issued to the user, thereby delivering the desired content. Browser module 32 may generate and support the navigation of additional navigation trees in the event that any other command/request from the user invokes another web page of the same or a different content provider 12. When a user has obtained all desired content, the user may terminate the call, for example, by hanging up.

[0086] Navigation Tree Builder Component

[0087] FIG. 3 is a block diagram of a navigation tree builder component 40, according to an embodiment of the invention. Navigation tree builder component 40 generally functions to construct navigation trees 50 which can be used to readily and orderly provide the content of respective web pages to a user via a limited display device 18. As depicted, navigation tree builder 40 comprises a markup language parser 52, a style sheet parser 54, and a tree converter 56. Each of markup language parser 52, style sheet parser 54, and tree converter 56 may comprise one or more programs which, when executed, perform the functionality described herein.

[0088] Markup language parser 52 receives conventional, Internet-accessible markup language (e.g., HTML or XML) documents 58 from a content provider 12. Conventional markup languages describe how content should be structured, formatted, or displayed. To accomplish this, conventional markup languages may embed tags to specify spans, frames, paragraphs, ordered lists, unordered lists, headings, tables, table rows, objects, and the like, for organizing content. Each markup language document 58 may serve as the source for a web page. Markup language parser 52 parses the content contained within a markup language document 58 in order to generate a document tree 60. In particular, markup language parser 52 can map each markup language document into a respective document tree 60.

[0089] Each document tree 60 is a basic data representation of content. An exemplary document tree 60 is illustrated in FIG. 5. Document tree 60
organizes the content of a web page based on, or according to, the formatting tags of a conventional markup language. The document tree is a graphic representation of a HTML document. A typical document tree 60
includes a number of document tree nodes. As depicted, these document tree nodes include an HTML designation (HTML), a header (<HEAD>) and a body (<BODY>), a title (<TITLE>), metadata (<META>), one or more headings (<H1>, <H2>), lists (<LI>), unordered list (<UL>), a paragraph (<P>). The nodes of a document tree may comprise content and formatting information. For example, each node of the document tree may corresponds to either HTML markup tags or plain text. The content of a markup element appears as its child in the document tree. For example, the header (<HEAD>) may have content in the form of the phrase "About Our Organization" along with formatting information which specifies that the content should be presented as a header on the web page.

[0090] Document tree 60 is designed for presenting a number of content elements simultaneously. That is, the organization of web page content according to the formatting tags of conventional markup language documents is appropriate, for example, for a visual display in which textual information can be presented at once in the form of headers, lines, paragraphs, tables, arrays, lists, and the like, along with images, graphics, animation, etc. However, the structure of a document tree 60 is not particularly well-suited for presenting content serially, for example, as would be required for a audio presentation in which only a single element of content can be presented at a given moment.

[0091] Specifically, in an audio context, the formatting information of a document tree 60 does not provide meaningful connections or links for the content of a web page. For example, formatting information specifying that content should be displayed as a header does not translate well for an audio presentation of the content. In addition, much of the formatting information of a document tree 60 does not constitute meaningful content which may be of interest to a user. For example, the nodes for header (<HEAD>) and body (<BODY>) are not intrinsically interesting. In fact, the header (<HEAD>)--comprising title (<TITLE>) and metadata (<META>)--does not generally contain information that should be presented directly to the user.

[0092] Style sheet parser 54 receives one or more style sheet (e.g., xCSS) documents 62. Style sheet documents 62 provide templates for applying style information to the elements of various web pages supported by respective conventional markup language documents 58. Each style sheet document 62 may supply or provide metadata for the web pages. For example, using the metadata from a style sheet document 62, audio prompts can be added to a standard web page. This metadata can also be used to guide the construction of a semantic representation of a web page.

[0093] The metadata may comprise or specify rules which can be applied to a document tree 60. Style sheet parser 54 parses the metadata from a style sheet document 62 to generate a style tree 64. Each style tree 64
may be associated with a particular document tree 60 according to the association between the respective style sheet documents 62 and conventional markup language documents 58. A style tree 64 organizes the rules (specified in metadata) into a structure by which they can be efficiently applied to a document tree 60. A tree structure for the rules is useful because the application of rules can be a hierarchical process. That is, some rules are logically applied only after other rules have been applied.

[0094] Tree converter 56, which is in communication with markup language parser 52 and style sheet parser 54, receives the document trees 60 and style trees 64 therefrom. Using the document trees 60 and style trees 64, tree converter 56 generates navigation trees 50. Among other things, tree converter 56 may apply the rules of a style tree 64 to the nodes of a document tree 60 when generating a navigation tree 50. Furthermore, tree converter 56 may apply other rules (heuristic rules) to each document tree, and thereafter, may map various nodes of the document tree into nodes of a navigation tree 50.

[0095] A navigation tree 50 organizes content of a conventional markup language document 58 into a hierarchical or outline structure. With the hierarchical structure, the various elements of content are separated into various levels (e.g., parts, sub-parts, sub-sub-parts etc.). Appropriate mechanisms are provided to allow movement from one level to another and across the levels. The hierarchical arrangement of a navigation tree 50 is suitable for presenting content sequentially, and thus can be used for "semantic" retrieval of the content at a web page. As such, the navigation tree 50 can serve as an index that is suitable for browsing content using voice commands.

[0096] An exemplary navigation tree 50 is illustrated in FIG. 6. A navigation tree 50 is, in general, made up of routing nodes and content nodes. Content nodes may comprise content that can be delivered to a user. Content nodes can be of various types, such as, for example, general content nodes, table nodes, and form nodes. Table nodes present a table of information. Form nodes can be used to assist in the filling out of respective forms. Routing nodes are unique to navigation trees 50 and are generated according to rules applied by tree converter 56.

[0097] Routing nodes direct navigation between nodes by providing logical connections between them. The routing nodes are interconnected by directed arcs (edges or links). These directed arcs are used to construct the hierarchical relationship between the various nodes in the navigation tree 50. That is, these arcs specify allowable navigation traversal paths to move from one node to another. In FIG. 6, for example, an unordered list node UL is a routing node for moving to list nodes <LI1> or <LI2>. The options for other nodes may be explicitly included in the routing node.

[0098] Content nodes, in certain but not all embodiments, are reachable by tree traversal operations. For example, in some embodiments, the data found in content nodes is accessed through a parent routing node called a group node <P>. The group node organizes content nodes into a single presentational unit. The group node can be used for organizing multi-media content. For example, rather than present text and links as disjointed content, a group node can be used to organize a collection of text, audio wave files, and URI links together such as the following:

1
For more information about <A href = "http:///www.vocalpoint.com/sound.wav">vocalpoint </A>, send email to: <A href = info@vocalpoint.com> info@vocalpoint.com </A>.

[0099] As such, routing nodes provide the nexus or connection between content nodes, and thus provide meaningful links for the content of a web page. In this way, routing nodes support or provide a semantic, hierarchical relationship for web page content in a navigation tree 50. An exemplary object-oriented implementation for routing and content nodes of a navigation tree is provided in attached Appendix A and FIG. 9.

[0100] In one embodiment, a navigation tree 50 can be used to define a finite state machine. In particular, various nodes of the navigation tree may correspond to states in the finite state machine. Navigation agent component 42 may use the navigation tree to directly define the finite state machine. The finite state machine can be used by navigation agent 42 of browser module 32 to move throughout the hierarchical structure. At any current state/node, a user can advance to another state/node.

[0101] Tree Converter

[0102] FIG. 4 is a block diagram of a tree converter 56, according to an embodiment of the invention. Tree converter 56 generally functions to convert document trees 60 into navigation trees 50, for example, using style trees 64. As depicted, tree converter 56 comprises a style sheet engine 68, a heuristic engine 70, and a mapping engine 72. Each of style sheet engine 68, heuristic engine 70, and mapping engine 72 may comprise one or more programs which, when executed, perform the functionality described herein.

[0103] Style sheet engine 68 generally functions to apply style sheet rules to a document tree 60. Application of style sheet rules can be done on a rule-by-rule basis to all applicable nodes of the document tree 60. These style sheet rules can be part of the metadata of a style sheet document 62. Each style sheet rule can be a rule generally available in a suitable style sheet language of style sheet document 62.

[0104] In one embodiment, these style sheet rules may include, for example, clipping, pruning, filtering, and converting. In a clipping operation, a node of a document tree is marked as special so that the node will not be deleted or removed by other operations. Clipping may be performed for content that is important and suitable for audio presentation (e.g., text which can be "read" to a user). In a pruning operation, a node of a document tree is eliminated or removed. Pruning may be performed for content that is not suitable for delivery via speech or audio. This can include visual information (e.g., images or animation) at a web page. Other content that can be pruned may be advertisements and legal disclaimers at each web page.

[0105] In a filtering operation, auxiliary information is added at a node. This auxiliary information can be, for example, labels, prompts, etc. In a conversion operation, a node is changed from one type into another type. For example, some content in a conventional markup language document can be in the form of a table for presenting information in a grid-like fashion. In a conversion, such table may be converted into a routing node in a navigation tree to facilitate movement among nodes and to provide options or choices.

[0106] As depicted, style sheet engine 68 comprises a selector module 74
and a rule applicator module 76. In general, selector module 74 functions to select or identify various nodes in a document tree 60 to which the rules may be applied to modify the tree. After various nodes of a particular document tree 60 have been selected by selector module 74, rule applicator module 76 generally functions to apply the various style tree rules (e.g., clipping, pruning, filtering, or converting) to the selected nodes as appropriate in order to modify the tree.

[0107] Heuristic engine 70 is in communication with style sheet engine 68. Heuristic engine 70 generally functions to apply one or more heuristic rules to the document tree 60 as modified by style sheet engine 68. In one embodiment, these heuristic rules may be applied on a node-by-node basis to various nodes of document tree 60. Each heuristic rule comprises a rule which may be applied to a document tree according to a heuristic technique.

[0108] A heuristic technique is a problem-solving technique in which the most appropriate solution of several found by alternative methods is selected at successive stages of a problem-solving process for use in the next step of the process. In the context of the invention, the problem-solving process involves the process of converting a document tree 60 into a navigation tree 50. In this process, heuristic rules are selectively applied to a document tree after the application of style sheets rules and before a final mapping into navigation tree 50, as described below).

[0109] In one embodiment, heuristic rules may include, for example, converting paragraph breaks and line breaks into space breaks (white space), exploiting image alternate tags, deleting decorative nodes, merging content and links, and building outlines from headings and ordered lists. The operation for converting paragraph breaks and line breaks into space breaks is done to eliminate unnecessary formatting in the textual content at a node while maintaining suitable delineation between elements of text (e.g., words) so that the elements are not concatenated. The operation for exploiting image alternative tags identifies and uses any image alternative tags that may be part of the content contained at a particular node.

[0110] An image alternative tag is associated with a particular image and points to corresponding text that describes the image. Image alternative tags are generally designed for the convenience of users who are visually impaired so that alternative text is provided for the particular image. The operation for deleting decorative nodes eliminates content that is not useful in a navigation tree 50. For example, a node in the document tree 60 consisting of only an image file may be considered to be a decorative node since the image itself cannot be presented to a user in the form of speech or audio, and no alternative text is provided. The operation for merging content and links eliminates the formatting for a link (e.g., a hypertext link) is done so that the text for the link is read continuously as part of the content delivered to a user.

[0111] The operation for building or generating outlines from headings and ordered lists is performed to create the hierarchical structure of the navigation tree 50. A headline--which can be, for example, a heading for a section of a web page--is identified by suitable tags within a conventional markup language document. In a visually displayed web page, multiple headings may be provided for a user's convenience. These headings may be considered alternatives or options for the user's attention. An ordered list is a listing of various items, which in some cases, can be options. Heuristic engine 70 may arrange or organize headings and ordered lists so that the underlying content is presented in the form of an outline.

[0112] Mapping engine 72 is in communication with heuristic engine 70. In general, mapping engine 72 performs a mapping function that changes certain elements in a modified document tree 60 into appropriate nodes for a navigation tree 50. Mapping engine 72 may operate on a node-by-node basis to provide such mapping function. In one embodiment, the content at a node in document tree 60 is mapped to create a content node in the navigation tree 50. Ordered lists, unordered lists, and table rows are mapped into suitable routing nodes of the navigation tree 50.

[0113] Any table in document tree 60 may be mapped to create a table node in the navigation tree 50. A form in a document tree 60 can be mapped to create a form node in the navigation tree 50. A form may comprise a number of fields which can be filled in by a user to collect information. Form elements in the document tree 60 can be mapped into a form handling node in navigation tree 50. Form elements provide a standard interface for collecting input from the user and sending that information to a Web server.

[0114] Computer-Based System

[0115] FIG. 7 illustrates a computer-based system 80 which is an exemplary hardware implementation for voice browsing system 10. In general, computer-based system 80 may include, among other things, a number of processing facilities, storage facilities, and work stations. As depicted, computer-based system 80 comprises a router/firewall 82, a load balancer 84, an Internet accessible network 86, an automated speech recognition (ASR)/text-to-speech (TTS) network 88, a telephony network 90, a database server 92, and a resource manager 94.

[0116] These computer-based system 80 may be deployed as a cluster of networked servers. Other clusters of similarly configured servers may be used to provide redundant processing resources for fault recovery. In one embodiment, each server may comprise a rack-mounted Intel Pentium processing system running Windows NT, UNIX, or any other suitable operating system.

[0117] For purposes of the invention, the primary processing servers are included in Internet accessible network 86, automated speech recognition (ASR)/text-to-speech (TTS) network 88, and telephony network 90. In particular, Internet accessible network 86 comprises one or more Internet access platform (IAP) servers. Each IAP servers implements the browser functionality that retrieves and parses conventional markup language documents supporting web pages.

[0118] Each IAP servers builds the navigation trees 50 (which are the semantic representations of the web pages) and generates the navigation dialog with users. Telephony network 90 comprises one or more computer telephony interface (CTI) servers. Each CTI server connects the cluster to the telephone network which handles all call processing. ASR/TTS network 88 comprises one or more automatic speech recognition (ASR) servers and text-to-speech (TTS) servers. ASR and TTS servers are used to interface the text-based input/output of the IAP servers with the CTI servers. Each TTS server can also play digital audio data.

[0119] Load balancer 84 and resource manager 94 may cooperate to balance the computational load throughout computer-based system 10 and provide fault recovery. For example, when a CTI server receives an incoming call, resource manager 94 assigns resources (e.g., ASR server, TTS server, and/or IAP server) to handle the call. Resource manager 94 periodically monitors the status of each call and in the event of a server failure, new servers can be dynamically assigned to replace failed components. Load balancer 84 provides load balancing to maximize resource utilization, reducing hardware and operating costs.

[0120] Computer-based system 80 may have a modular architecture. An advantage of this modular architecture is flexibility. Any of these core servers--i.e., IAP servers, CTI servers, ASR servers, and TTS servers--can be rapidly upgraded ensuring that voice browsing system 10
always incorporate the most up-to-date technologies.

[0121] Method For Browsing Content With Voice Commands

[0122] FIG. 8 is a flow diagram of an exemplary method 100 for browsing content with voice commands, according to an embodiment of the invention. Method 100 may correspond to an aspect of operation of web browsing system 10, in which a navigation tree is generated as a map for the content. The navigation tree is then used for browsing the content. FIG. 9 is a block diagram of an exemplary navigation tree 1020 comprising a plurality of branches extending from a root node 1021.

[0123] Each branch may comprise or connect one or more nodes, including routing nodes, group nodes, and/or content nodes. Routing Nodes 1, 2, and 3, which can be "children" of root node 1021, form or define three branches of navigation tree 1020. Each branch, for example, includes group nodes and content nodes implemented to form sub-branches and "leaves" for tree 1020. The routing nodes include information that allows a user to traverse navigation tree 1020 based on the content included in the content nodes.

[0124] Referring again to FIG. 8, method 100 begins at step 102 where voice browsing system 10 receives at gateway module 30 a call from a user, for example, via a limited display device 18. In the call, the user either issues a command or submits a request or is prompted to provide a response. The terms "response," "command," and "request" that indicate the interaction of the user with the system are used interchangeably throughout the document. For simplicity and consistency, however, the term "request" is primarily used hereafter to refer to any user interaction with the system. This usage should not, however, be construed as a limitation. A user request can be in the form of voice or speech and may pertain to particular content.

[0125] This content may be contained in a web page at a web site or portal maintained by a content provider 12. The content can be formatted in HTML, XML, or other conventional markup language format. Automatic speech recognition (ASR) component 36 of gateway module 30 operates on the voice/speech to recognize the user request for content, for example. Gateway module 30 forwards the request to browser module 32. By way of example, one or more embodiments of the system have been described as applicable to a voice browsing system. This application, however, is exemplary and should not be construed as a limitation. The user may interact with the system via any interactive communication interface (e.g., graphic interface, touch tone interface).

[0126] At step 104, responsive to the user request, voice browsing system 10 initiates a web browsing session to provide a communication interface for the user. At step 106, browser module 32 loads or fetches a markup language document 58 supporting the web page that contains the desired content. This markup language document can be, for example, an HTML or an XML document. Browser module 32 may also load or retrieve one or more style sheet documents 62 which are associated with the markup language document 58.

[0127] At step 108, browser module 32 adds an identifier (e.g., a uniform resource locator (URL)) for the web page to a list maintained within voice browsing system 10. This is done so that voice browsing system 10
can keep track of each web page from which it has retrieved content; thus, at least some of the operations which voice browsing system 10
performs for any given web page in response to an initial request do not need to be repeated in response to future requests relating to the same web page.

[0128] At step 110, navigation tree builder component 40 of browser module 32 builds a navigation tree 1020 for the target web page. In one embodiment, to accomplish this, navigation tree builder component 40 may generate a document tree 60 from the conventional markup language document 58 and a style tree 64 from the style sheet document 62. The document tree 60 is then converted into a navigation tree (e.g., navigation tree 1020), in part, using the style tree 64. The navigation tree 1020 provides a semantic representation of the content contained in the target web page that is suitable for voice or audio commands.

[0129] The navigation tree 1020 includes a plurality of nodes, as shown in FIG. 9. Each node either contains or is associated with certain content of the target web page. Each node further includes or is associated with commands, keywords, and/or phrases that correspond with the web page content. The terms "commands," "keywords," and "phrases" may be used interchangeably throughout the document. For simplicity and consistency, the term "keyword" has been used, when proper, to refer to one or all the above collectively. This usage, however, should not be construed to limit the scope of the invention.

[0130] Keywords are used to identify and classify the respective nodes based on contents of the nodes and to allow a user to browse the content of the web page. Further, these keywords are also used by the system to build prompts or greetings for each node, when a node is visited. As provided in further detail below, the system in certain embodiments, also uses the keywords to build a dynamic navigation grammar with vocabulary that is expanded or narrowed based on the hierarchical position of nodes in instance of navigation. The grammar built at each navigation instance is specific to the user and the navigation route selected by the user at that instance. As such, in one or more embodiments of the system, each node visited in a navigation route corresponds with a navigation instance represented by a unique navigation grammar for that node at that instance.

[0131] The system 10 utilizes the navigation grammar to recognize a user request for access to the content included or associated with various nodes in the navigation tree 1020. Using voice commands, in one embodiment, a user may direct the system to do the following, for example: browse the content of a web page, jump to a specific web page, move forward or backwards within one or more web pages or websites, make a selection from the content of a web page, fill out specific fields in a web page, or confirm selections or inputs to a web page. Furthermore, navigation tree 1020 may provide a user with the means to readily browse the content of a web page by submitting voice requests, as provided in further detail below.

[0132] At step 112, navigation agent component 42 of browser module 32
begins traversing navigation tree 1020 by setting root node 1021 as the node being currently visited. Root node 1021, in accordance with one aspect of the invention, is a routing node that can comprise a number of different options from which a user can select, for example, to obtain content or to move to another node. To present these various options to the user, text-to-speech (TTS) component 38 of gateway module 30 may generate speech for the options, which is then delivered to the user via limited display device 18. For example, a greeting may be played to notify the user of the name, nature, or content of the web site or web page accessed, followed by a list of selectable options, such as weather, sports, stock quotes, and mail. The user may then select one of the presented options, for example, by issuing a request which is recognized by automatic speech recognition component 36.

[0133] At step 114, browsing module 32 browses (i.e., visits or moves to) the node in navigation tree 1020 that corresponds with the selected option by the user. When the browsing module 32 visits a node, the browsing module 32 retrieves information included in the node to determine the node type (e.g., routing node, content node, form node, etc.) and/or the content included or referenced by the node. For example, referring to FIG. 9, if in the above example the user selects the "weather" option, then browsing module 32 visits Routing Node 1 if that node is associated with weather information. A search table or alternate data structure may be utilized to store information about the content and type of nodes included in the tree, so that node searches and selections are performed more efficiently by referencing the table, for example. If Routing Node 1 is not associated with the selected option, the rest of the nodes in the tree (or the corresponding data structure including node information) are searched to find the proper node to visit.

[0134] At step 124, navigation agent component 42 determines whether the current node is a routing node. If so, then the system moves to step A to process the content of that node and its children, if any. A routing node is a node that may comprise a plurality of options from which the user may select in order to navigate or move from one node to another. For example, in FIG. 9, if Routing Node 2 is the routing node associated with the "sports" option, then it can include children nodes that provide further options in the sports category. For example, Routing Nodes 2.1
and 2.3 may reference group nodes that include information about "football" and "basketball," respectively. Thus, processing Routing Node 2.1 will provide information related to football games, such as, for example, team scores and standing, while processing Routing Node 2.3 will provide information related to basketball games. Routing Node 2 may also reference a Content Node 2.2 that includes content such as a calendar of sports events, for example.

[0135] Referring back to FIG. 8, if it is determined at step 124 that the current node is not a routing node, then at step 126 browser module 32
determines, based on type information associated with the node, whether the current node is a form node. If so, then the system moves to step B. A form node is a node that relates to an electronic form implemented for collecting information--typically information of textual nature such as name, telephone number, and address. Such form may comprise a number of fields for separate pieces of information that can be edited by a user. For example, an order form may be edited as part of an electronic transaction via a web site or portal associated with content provider 12.

[0136] At step 126, if it is determined that the current node is not a form node, then the system moves to step 136, and voice browsing system 10 determines whether the current node is a content node. A content node generally includes information or content that can be presented to a user. If the current node is a content node, then at step C voice browsing system 10 plays the content to the user. The content of a content node may be provided to the user in one or more ways. For example, one embodiment of the system, uses text-to-speech component 38
to play the content of a node to a user. The text-to-speech component 38
is provided herein by way of example. Other ways for conveying or playing the content to the user may be utilized.

[0137] If, at step 136, it is determined that the current node is not a content node, then at step 144 voice browsing system 10 determines whether the current node is unknown to the system. A node may be unknown to the system due to an error in the system, or if the web page associated with that node is not valid or available. If the current node is unknown, then voice browsing system 10 may deliver an appropriate message or prompt for notifying the user of such fact.

[0138] In certain embodiments, if the current node is unknown, at step 146
voice browsing system 10 computes the next page to be presented to a user. This page may be implemented to inform the user that the current selection or request is not appropriate or available. Alternatively, the next page may be chosen by the system as the page that can be most closely matched with the user request. After the next page has been computed, method 100 moves to step 106, to fetch or retrieve the conventional markup language document 58 supporting the computed next page.

[0139] At step 148, it is determined whether the current interactive session with the user should be ended. A session is terminated if, for example, a predetermined time has elapsed in which a user has either not submitted a request or not provided a response to a system prompt. Alternatively, a user may actively taken action to end the session by, for example, terminating the communication connection. At step 148, if the session is not ended, then method 100 returns the user to the main menu or other node in navigation tree 1020.

[0140] Various steps in method 100 may be repeated throughout an interactive session to generate one or more navigation trees 1020 and allow a user to obtain content and to traverse the nodes within each navigation tree 1020. As such, a user is able to browse the content available at the web pages of a web site or portal maintained by content provider 12 using voice, tone, or other interface commands. Method 100
can be implemented to comply with the existing infrastructure of conventional markup language documents of a web site. Accordingly, content provider 12 is not required to set up and maintain a separate site in order to provide access and content to users.

[0141] Method For Navigating a Routing Node

[0142] Referring to FIGS. 8 and 10, once the system at step 124 determines that the visited node is a routing node, then at step 1305 the system initializes the counters for that node. In accordance with one aspect of the invention, each node, particularly each routing node, is associated with one or more counters. These counters include a help counter, a timeout counter, and a rejection counter.

[0143] The help counter keeps track of the number of times help messages are played for a node currently being visited. A help message is usually provided to the user in case the system does not recognize the user's request or at the user's request. Thus, the help counter is incremented until the system successfully moves to the next node or the session ends. If the system browses that node again at a later time, then the counter would be reset, at step 1305.

[0144] A timeout counter keeps track of the number of times the system does not receive or recognize a user request while visiting the current node. In one or more embodiments, the system allows the user to submit a request or provide a response to a prompt within a certain number of seconds. If no request is submitted by the user, or if the delay in providing the request is longer than the allotted threshold, then the system plays a timeout message and increments the timeout counter. The timeout counter is incremented for the current node until the system successfully moves to the next node or the session ends. If the system browses that node again at a later time, then the counter would be reset at step 1305.

[0145] The rejection counter is a counter that keeps track of the number of times one or more user requests are rejected by the system while visiting the current node. A user request can be rejected by the system if the system does not recognize the request or if the system attempts to correct or resolve any ambiguity related to (i.e., disambiguate) an unacceptable or unrecognizable request. The rejection counter is incremented for the current node until the system successfully moves to the next node or the session ends. If the system browses that node again at a later time, then the counter would be reset at step 1305. The help, timeout, and rejection counters are incremented by a constant value (e.g., one), whenever help, timeout, or rejection messages are played.

[0146] Referring back to FIG. 10, at step 1310, the system determines whether an explicit greeting is included in the routing node visited by the system. An explicit greeting is a greeting that is included in the routing node when the navigation tree is built. An explicit greeting is played verbatim from the node. Referring to FIG. 9, for example, if Routing Node 1 is associated with a web page that includes information about the weather, then an explicit greeting may be included in Routing Node 1 that would welcome the user and indicate to the user that weather information can be obtained at this node. An exemplary greeting for such node would be: "Weather information." In one embodiment, an explicit greeting is included in the node when navigation tree 1020 is being generated.

[0147] If at step 1310, the system determines that an explicit greeting is not included in the routing node, then at step 1315 the system builds a greeting based on the keywords included in or associated with the routing node. For example, if Routing Node 1 is associated with a web page that includes weather information, then in accordance with one embodiment of the system, when the navigation tree is built, a keyword such as, for example, "weather" is included in or associated with Routing Node 1. This keyword is chosen based on the attributes and properties defined for that node in the style sheet. The keyword may also be automatically generated by analyzing the content of the HTML page. To build a greeting, at step 1315, the system may include the keyword (in this case "weather") in a default greeting phrase. For example, a greeting for Routing Node 1 may be "Weather Information" wherein the additional phrase "Information" is added to the keyword "weather" by default.

[0148] Once a greeting has been built by the system, then the system moves to step 1320 to determine whether an explicit prompt is included in the routing node. A prompt is typically provided to the user to elicit a response. An explicit prompt is played verbatim by the system. For example, an explicit prompt for Routing Node 1 could be "What city's weather are you checking?" Alternatively, in some embodiments of the invention, a prompt may provide a user with a list of choices from which to choose. For example, the following prompt may be provided: "Choose weather for Los Angeles, New York, or Dallas." If an explicit prompt is not included in the routing node, then at step 1325, the system builds a prompt based on keywords included in the routing node. The prompt built by the system could be, for example, "What city, please?" or "Choose weather for Los Angeles, New York, or Dallas." In certain embodiment, the manner in which prompts are built are based on the attributes and properties defined in the style sheet.

[0149] Once the system has determined the greeting and the prompt for the current node, then at step 1330 the system builds a default navigation grammar. The default navigation grammar includes default vocabulary and corresponding rules defining navigation behavior. The default vocabulary includes keywords that are commonly used to navigate the nodes of the navigation tree or perform operations that correspond with certain tree features. Examples of such navigation commands are: "Next," "Previous," "Goto," "Back," and "Home." Using these keywords, a user may direct a system to perform the following operations, for example: browse the content of a web page, jump to a specific web page, move forward or backward within a web page or between web pages, make a selection from the content in a web page, fill out specific fields in a web page, or confirm selections and input to a web page.

[0150] Certain commands may allow the user to change certain node attributes or characteristic. For example, a user may in accordance with one embodiment delete or add content to a node, or even delete or add a node to the navigation tree by utilizing commands such as "add" or "delete," for example. It should be understood that said keywords are provided by way of example and that other vocabulary may be used to perform same or other operations. Each operation may be associated with a certain command. In some embodiments, the default vocabulary may be built so that more than one keyword is associated with a single operation. For example, the keywords "Goto, Jump, or Move to" may all be used to command the system to visit another node.

[0151] A default grammar, in one embodiment, is built prior to a node being visited instead of being built at the time the node is visited. Referring back to FIG. 10, after the default navigation grammar is built, at step 1335, the system determines whether the routing node has a child. If so, at step 1340, the system adds the keywords associated with the child to the grammar's vocabulary. For example, referring to FIG. 9, Routing Node 1 may have a child node that includes information about the weather conditions in the most popular cities in the world. The child node, for example, may include the phrase "World Weather." In this example, keywords "world" and "weather" are added to the node's grammar, at step 1340. If a keyword is added to a node's grammar, then a request submitted to the system including that keyword is recognized while the user is visiting that node.

[0152] In certain embodiments, the navigation grammar is built dynamically for each node at the time the node is visited. That is, each individual node is associated with a unique grammar. Thus, a keyword included in one node's grammar may not be recognized by the system, while a user is visiting another node. In other embodiments, a global grammar is dynamically built as the tree branches are navigated forward or traversed backward. That is, when a new node is visited, the keywords included in the current node are added to a global grammar. A global grammar is not uniquely assigned to an individual node, but is shared by all the nodes in the navigation tree. Thus, when a keyword is added to the grammar, then a user request including that keyword may be recognized while the user is visiting any node in the navigation tree.

[0153] In certain embodiments, the dynamically built grammar is not associated with all the nodes in the tree, but only those that are visited up to a certain point in time. That is, the grammar's vocabulary corresponds with the hierarchical position of a node in the navigation tree. Thus, while the navigation tree is navigated towards the leaves of the tree the vocabulary is expanded as keywords are dynamically added to it for each node visited. Conversely, while the navigation tree is traversed towards the root of the navigation tree, the vocabulary is narrowed as keywords associated with the nodes on the path of reverse traverse are deleted from the vocabulary.

[0154] At step 1345, the system verifies whether the current node has another child. If so, the system repeats step 1340 for that child as described above, by for example including the keywords associated with that child to the grammar's vocabulary. If at step 1335, the system determines that the current node has no children or at step 1345 the system determines that the current node has no more children, then the system moves to step 1350 and plays the greeting for the current routing node. In certain embodiments of the invention, the system is implemented to listen while playing the greeting for any user requests, utterances, or inputs. As such, at step 1355, if the system determines that the user is attempting to interact with the system, the system stops playing the greeting and services the user input or request.

[0155] The act of a user interrupting the system while the system is playing a greeting or a prompt is referred to as "barging in." Thus, if while the system at step 1350 is playing the greeting "Weather information," the user interrupts the system by barging in and saying the key phrase "World Weather," for example, then the system would skip over step 1360 and directly go to step 1365 and play a list of choices based on the navigation grammar available at that point of navigation. For example, the system may provide the user with the following list: "Los Angeles, New York, Dallas, Tokyo, Frankfurt." If the user does not barge in at step 1355, however, then the system moves to step 1360 and plays the prompt for the current routing node, before playing the list at step 1365.

[0156] The prompt may be an exclusive prompt or a general prompt created by the system, as discussed earlier. A general prompt, for example, may say "Choose from the following" Once the system has played the prompt at steps 1360, then at step 1365 the system plays a list of choices based on the navigation grammar for the current node, as provided above. Thereafter, the system waits for the user's response.

[0157] Method For Navigating a Form Node

[0158] Referring to FIGS. 8 and 11, once the system at step 126 determines that the current node is a form node, then at step 1405 it initializes the counters for that node, as discussed earlier. A form node includes one or more fields that can be edited by the user. The system, at step 1410, determines whether the form node is a navigable node. A form node is navigable if the user can choose the order in which the fields are visited. In embodiments of the system, a form node includes information (e.g., a tag) that indicates whether the node is navigable.

[0159] A form node is non-navigable, if the user has to go through each field in the form before it can exit that node. For example, a user may have to edit a form including fields for first name, last name, address, and telephone number. In a navigable form, the user may have the choice to go to the name field first, the telephone field second, the address field third, and skip over the last name field. In a non-navigable form, the user will have to, for example, start with the name field first, then proceed to the last name field, and thereon to the other fields in the form node in the order provided by the system.

[0160] Thus, at step 1410, if the system determines that the form node is navigable, then the system moves to step 1415 and plays the greeting for that node. For example, the greeting may provide "Registration Form." In one embodiment, the system at step 1425 prompts the user to select a field to visit. At step 1435, the system listens for the selection. At step 1445, the system goes to the field selected by the user. As discussed earlier, at steps 1420 and 1430, the user may barge in to interrupt the system from playing a greeting or prompt. If the user's request or response includes a keyword recognized by the system for a specific field within the form node, then at step 1445 the system goes to the selected field.

[0161] If the user request, however, includes a keyword that indicates that the user has completed editing the form, then the system at step 1440 determines that the user is done. The system then moves to step 1470
to submit the form and play a prompt indicating that the task has been completed. The submission of the form may be performed in a well-known manner by including the submitted information in a communication packet and sending it to a destination.

[0162] Referring back to step 1445, when the system goes to a selected field requested by the user, then at step 1450 the system collects the input based on the input interface implemented for that field. Various methods may be used to collect input for a field in a form node. The form node may include various field types such as, text, check box, drop down menu, or another type of input field. In certain embodiments, an input field is associated with one or more counters in the same manner that a node in the navigation tree is associated with help, rejection, and timeout counters. These counters are reset when a field is visited and are incremented by a constant value every time the system provides a help, timeout, or rejection message for the field, until the next field is visited or the input session is aborted.

[0163] When a field is visited, a greeting for the field is selected. This greeting may be an explicit or general greeting depending on implementation. For example, a greeting played for a text field may be "Enter first name." The greeting for a check box may be "Select one or more of the following two options." And, the greeting for a drop down menu may be "Select one of the following options." Once the greeting is selected, the system then determines if the field includes or is associated with a default value. For example, a check box field may include a default value indicating that the check box is checked. If so, a prompt is built for that field by the system to indicate the status of the check box, for example. Alternatively, a prompt may be built for the field based on an explicit prompt provided for that field or based on keywords associated with the prompt. For example, a prompt for a check box field in a registration form relating to marriage status may indicate: "The check box for `Single` is already checked, please say uncheck if you are married."

[0164] Once the greeting and the prompt are determined for a field, then the system builds a navigation grammar for that field or for the form node being visited. The default navigation grammar for a field includes different or additional vocabulary in comparison to the navigation grammar for a tree. That is, navigation grammar for a field includes vocabulary that suits the functions and procedures associated with editing a field. For example, the grammar vocabulary for navigating among fields in a form may include: "check, uncheck, enter, delete, replace, next, forward, back." Other words or phrases may be included in the vocabulary in association with edit and navigation rules to allow a user to edit fields or to navigate between fields in a form node.

[0165] Once the navigation grammar is built, then the greeting selected for the field is played. The user may choose to barge in either before or after the greeting has been played. The system is implemented to listen for the user's input or commands. If the system recognizes a command to skip the field then the current field is skipped and the system starts over again by resetting the counters for the next field and selecting the appropriate greeting or prompts. If the system recognizes an input for the field then the recognized input is entered into the current field. In certain embodiments of the system, the user is prompted to confirm the input results. For example, if a user after being prompted to provide an input for the check box relating to the user's marriage status, responds "uncheck," then the system may provide a confirmation message indicating "You have chosen to uncheck single status." Alternatively, if the user chooses to skip over the fie