Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
5809318
Rivette , ; et al.
September 15, 1998
Title
Method and apparatus for synchronizing, displaying and manipulating text and image documents
Abstract
The system synchronizes, displays, and manipulates text and image documents in electronic form for display. The text and image files are synchronized to produce Equivalent Files using heuristic algorithms to create an approximate equivalence relationship between the text and the image files. The graphic user interface of the present system allows a user to selectively view an Equivalent File in a window while simultaneously viewing an image file within one or ore image windows on the display. The user may also create and manipulate notes and subnotes as annotations to document objects or document portions. The notes and subnotes may be linked to text documents, image documents, text objects, or other non-text objects or documents, such as images, audio clips, etc. The user may create new subnotes associated with selected objects or selected portions, and the user may enter information pertaining to the selected objects in the new subnote. The new subnotes may be created in a particular notes window, or if a note window is not open at the time that the user selects an object or document portion, a new note may be created in response to the selected object. The notes and subnotes may be stored in a note database associated with sets of documents.
Inventors:
Rivette; Kevin G.
(Palo Alto,
CA
)
, Florio; Michael P.
(Atherton,
CA
)
, Jackson; Adam
(Belmont,
CA
)
, Ahn; Don
(Daly City,
CA
)
, Rappaport; Irving S.
(Palo Alto,
CA
)
, Kurata; Deborah
(Pleasanton,
CA
)
Assignee:
Smartpatents, Inc.
(Menlo Park,
CA
)
Appl. No.:
832971
Filed:
April 4, 1997
Current U.S. Class:
715/512
707/104.1
715/500.1
Current International Class:
G06F 17/30 (20060101)
Field of Search:
395/773,774,776-778,762
U.S. Patent Documents
5392428
February 1995
Robins
5559942
September 1996
Gough et al.
5592607
January 1997
Weber et al.
5596700
January 1997
Darnell et al.
5628003
May 1997
Fujisawa et al.
Primary Examiner:
Burwell; Joseph R.
Attorney, Agent or Firm:
Sterne, Kessler, Goldstein & Fox P.L.L.C.
Parent Case Text
This application is a continuation of application Ser. No. 08/423,676, filed Apr. 18, 1995, now U.S. Pat. No. 5,623,679, which is a Continuation-In-Part of application Ser. No. 08/341,129 filed Nov. 18, 1994, which is a Continuation-In-Part application of Ser. No. 08/155,752 filed Nov. 19, 1993, now U.S. Pat. No. 5,623,681.
Claims
What is claimed is:
1. A computer based note system, comprising:
a plurality of subnotes linked to portions of documents;
a note database comprising an entry for each of said subnotes, said entry comprising a document identification field, a begin location field, an end location field, and a subnote content field, said begin location field and said end location field storing location information identifying a portion of a document that is linked to said each of said subnotes, said document identification field storing information that identifies said document, and said subnote content field storing informational content of said each of said subnotes;
command receiving means for receiving a user command to display said each of said subnotes; and
subnote displaying means, responsive to said command receiving means, for displaying said each of said subnotes, comprising:
means for displaying in a subnote window said informational content from said subnote content field of said entry associated with said each of said subnotes;
means for displaying, by reference to said document identification field of said entry associated with said each of said subnotes, information identifying said document; and
means for displaying, by reference to said begin location field and said end location field of said entry associated with said each of said subnotes, information identifying a location of said portion in said document.
2. The system of claim 1, wherein said begin location field stores begin location information that identifies a location in said document where said portion begins, and said end location field stores end location information that identifies a location in said document where said portion ends.
3. The system of claim 2, wherein said document is a patent, and said begin location information and said end location information comprise column and line number information.
4. The system of claim 1, further comprising:
linking means for creating an additional link between said each of said subnotes and a second portion in a second document; and
updating means, responsive to said linking means, for updating said entry associated with said each of said subnotes to reflect said additional link.
5. The system of claim 4, wherein said updating means comprises:
means for updating said document identification field of said entry associated with said each of said subnotes to also include information identifying said second document; and
means for updating said begin location field and said end location field of said entry associated with said each of said subnotes to also include information identifying a location of said second portion in said second document.
6. The system of claim 4, wherein said subnote displaying means comprises:
means, responsive to a command to display said each of said subnotes, for displaying in said subnote window said informational content from said subnote content field of said entry associated with said each of said subnotes;
means for displaying, by reference to said document identification field of said entry associated with said each of said subnotes, information identifying said first document and said second document; and
means for displaying, by reference to said begin location field and said end location field of said entry associated with said each of said subnotes, information identifying a location of at least said first document portion in said first document.
7. The system of claim 1, further comprising:
user preference means for enabling a user to indicate whether a document note window is to be opened when a subnote is created.
8. The system of claim 7, further comprising:
means for displaying a second document in a text window;
means for enabling a user to select a portion of said second document displayed in said text window;
means for creating a new subnote linked to said selected portion of said second document; and
means for opening a document note window and displaying said new subnote in said opened document note window if said user previously indicated that a document note window is to be opened when a subnote is created.
9. The system of claim 1, wherein each subnote has a title and an associated color.
10. The system of claim 9, further comprising:
user preference means for enabling a user to indicate a preference for sorting subnotes.
11. The system of claim 10, wherein said user preference means comprises:
means for enabling said user to indicate that subnotes are to be sorted by any combination of title, color, and location.
12. The system of claim 11, further comprising:
means for sorting subnotes according to their titles if said user previously indicated that subnotes were to be sorted by title;
means for sorting subnotes according to their colors if said user previously indicated that subnotes were to be sorted by color; and
means for sorting subnotes according to their respective locations in said document if said user previously indicated that subnotes were to be sorted by location.
13. The system of claim 12, further comprising:
means for displaying a list of subnotes sorted according to said user sorting preference;
means for enabling a user to select one of said subnotes in said list; and
means for displaying said selected subnote in a document note window.
14. The system of claim 1, further comprising:
means for displaying a second document in a text window;
selecting means for enabling a user to select a portion of said second document displayed in said text window;
means, responsive to said selecting means, for automatically creating a new subnote linked to said selected portion;
means for enabling said user to copy said selected portion to a temporary storage location; and
means for enabling said user to paste said selected portion from said temporary storage location to said new subnote.
15. A method of manipulating notes having subnotes linked to portions of documents, comprising the steps of:
(1) maintaining a note database comprising an entry for each of said subnotes, said entry comprising a document identification field, a begin location field, an end location field, and a subnote content field, said begin location field and said end location field storing location information identifying a portion of a document that is linked to said each of said subnotes, said document identification field storing information that identifies said document, and said subnote content field storing informational content of said each of said subnotes;
(2) receiving a user command to display said each of said subnotes; and
(3) displaying said each of said subnotes, comprising the steps of:
(a) displaying in a subnote window said informational content from said subnote content field of said entry associated with said each of said subnotes;
(b) displaying, by reference to said document identification field of said entry associated with said each of said subnotes, information identifying said document; and
(c) displaying, by reference to said begin location field and said end location field of said entry associated with said each of said subnotes, information identifying a location of said portion in said document.
16. The method of claim 15, wherein said begin location field stores begin location information that identifies a location in said document where said portion begins, and said end location field stores end location information that identifies a location in said document where said portion ends.
17. The method of claim 16, wherein said document is a patent, and said begin location information and said end location information comprise column and line number information.
18. The method of claim 15, further comprising the steps of:
(4) creating an additional link between said each of said subnotes and a second portion in a second document; and
(5) updating said entry associated with said each of said subnotes to reflect said additional link.
19. The method of claim 18, wherein step (5) comprises the steps of:
updating said document identification field of said entry associated with said each of said subnotes to also include information identifying said second document; and
updating said begin location field and said end location field of said entry associated with said each of said subnotes to also include information identifying a location of said second portion in said second document.
20. The method of claim 18, wherein step (3) comprises the steps of:
displaying in said subnote window said informational content from said subnote content field of said entry associated with said each of said subnotes;
displaying, by reference to said document identification field of said entry associated with said each of said subnotes, information identifying said first document and said second document; and
displaying, by reference to said begin location field and said end location field of said entry associated with said each of said subnotes, information identifying a location of at least said first document portion in said first document.
21. The method of claim 15, further comprising the step of:
enabling a user to indicate whether a document note window is to be opened when a subnote is created.
22. The method of claim 21, further comprising the steps of:
displaying a second document in a text window;
enabling a user to select a portion of said second document displayed in said text window;
creating a new subnote linked to said selected portion of said second document; and
opening a document note window and displaying said new subnote in said opened document note window if said user previously indicated that a document note window is to be opened when a subnote is created.
23. The method of claim 15, wherein each subnote has a title and an associated color.
24. The method of claim 23, further comprising the step of:
(4) enabling a user to indicate a preference for sorting subnotes.
25. The method of claim 24, wherein step (4) comprises the step of:
enabling said user to indicate that subnotes are to be sorted by any combination of title, color, and location.
26. The method of claim 25, further comprising the steps of:
sorting subnotes according to their titles if said user previously indicated that subnotes were to be sorted by title;
sorting subnotes according to their colors if said user previously indicated that subnotes were to be sorted by color; and
sorting subnotes according to their respective locations in said document if said user previously indicated that subnotes were to be sorted by location.
27. The method of claim 26, further comprising the steps of:
displaying a list of subnotes sorted according to said user sorting preference;
enabling a user to select one of said subnotes in said list; and
displaying said selected subnote in a document note window.
28. The method of claim 15, further comprising the steps of:
displaying a second document in a text window;
enabling a user to select a portion of said second document displayed in said text window;
automatically creating a new subnote linked to said selected portion;
enabling said user to copy said selected portion to a temporary storage location; and
enabling said user to paste said selected portion from said temporary storage location to said new subnote.
29. A computer program product comprising a computer useable medium having computer program logic stored therein, said computer program logic enabling a computer to maintain a plurality of subnotes linked to portions of documents, wherein said computer program logic comprises:
means for enabling the computer to maintain a note database comprising an entry for each of said subnotes, said entry comprising a document identification field, a begin location field, an end location field, and a subnote content field, said begin location field and said end location field storing location information identifying a portion of a document that is linked to said each of said subnotes, said document identification field storing information that identifies said document, and said subnote content field storing informational content of said each of said subnotes;
command receiving means for enabling the computer to receive a user command to display said each of said subnotes; and
subnote displaying means, responsive to said command receiving means, for enabling the computer to display said each of said subnotes, comprising:
means for enabling the computer to display in a subnote window said informational content from said subnote content field of said entry associated with said each of said subnotes;
means for enabling the computer to display, by reference to said document identification field of said entry associated with said each of said subnotes, information identifying said document; and
means for enabling the computer to display, by reference to said begin location field and said end location field of said entry associated with said each of said subnotes, information identifying a location of said portion in said document.
30. The computer program product of claim 29, wherein said begin location field stores begin location information that identifies a location in said document where said portion begins, and said end location field stores end location information that identifies a location in said document where said portion ends.
31. The computer program product of claim 30, wherein said document is a patent, and said begin location information and said end location information comprise column and line number information.
32. The computer program product of claim 29, wherein said computer program logic further comprises:
linking means for enabling the computer to create an additional link between said each of said subnotes and a second portion in a second document; and
updating means, responsive to said linking means, for enabling the computer to update said entry associated with said each of said subnotes to reflect said additional link.
33. The computer program product of claim 32, wherein said updating means comprises:
means for enabling the computer to update said document identification field of said entry associated with said each of said subnotes to also include information identify said second document; and
means for enabling the computer to update said begin location field and said end location field of said entry associated with said each of said subnotes to also include information identify a location of said second portion in said second document.
34. The computer program product of claim 32, wherein said subnote displaying means comprises:
means, responsive to a command to display said each of said subnotes, for enabling the computer to display in said subnote window said informational content from said subnote content field of said entry associated with said each of said subnotes;
means for enabling the computer to display, by reference to said document identification field of said entry associated with said each of said subnotes, information identifying said first document and said second document; and
means for enabling the computer to display, by reference to said begin location field and said end location field of said entry associated with said each of said subnotes, information identifying a location of at least said first document portion in said first document.
35. The computer program product of claim 29, wherein said computer program logic further comprises:
user preference means for enabling the computer to enable a user to indicate whether a document note window is to be opened when a subnote is created.
36. The computer program product of claim 35, wherein said computer program logic further comprises:
means for enabling the computer to display a second document in a text window;
means for enabling the computer to enable a user to select a portion of said second document displayed in said text window;
means for enabling the computer to create a new subnote linked to said selected portion of said second document; and
means for enabling the computer to open a document note window and display said new subnote in said opened document note window if said user previously indicated that a document note window is to be opened when a subnote is created.
37. The computer program product of claim 29, wherein each subnote has a title and an associated color.
38. The computer program product of claim 37, wherein said computer program logic further comprises:
user preference means for enabling the computer to enable a user to indicate a preference for sorting subnotes.
39. The computer program product of claim 38, wherein said user preference means comprises:
means for enabling the computer to enable said user to indicate that subnotes are to be sorted by any combination of title, color, and location.
40. The computer program product of claim 39, wherein said computer program logic further comprises:
means for enabling the computer to sort subnotes according to their titles if said user previously indicated that subnotes were to be sorted by title;
means for enabling the computer to sort subnotes according to their colors if said user previously indicated that subnotes were to be sorted by color; and
means for enabling the computer to sort subnotes according to their respective locations in said document if said user previously indicated that subnotes were to be sorted by location.
41. The computer program product of claim 40, wherein said computer program logic further comprises:
means for enabling the computer to display a list of subnotes sorted according to said user sorting preference;
means for enabling the computer to enable a user to select one of said subnotes in said list; and
means for enabling the computer to display said selected subnote in a document note window.
42. The computer program product of claim 29, wherein said computer program logic further comprises:
means for enabling the computer to display a second document in a text window;
select means for enabling the computer to enable a user to select a portion of said second document displayed in said text window;
means, responsive to said selecting means, for enabling the computer to automatically create a new subnote linked to said selected portion;
means for enabling the computer to enable said user to copy said selected portion to a temporary storage location; and
means for enabling the computer to enable said user to paste said selected portion from said temporary storage location to said new subnote.
Description
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to the fields of publishing, document editing and manipulation, and displaying documents and images. More particularly, the present invention relates to paginating, extracting, synchronizing, and displaying, a document in electronic form.
2. Art Background
As the development of multimedia computer display systems continues to advance, more computing power and features are available to computer users. For example, information which has historically been limited to published paper documents is now being made available through on-line computing services from publishers and information vendors. As an increasing market share of the data and computing capacity is provided through low cost high performance personal computers, some of the on-line information is also being made available in compact disks (CD) and magnetic media formats. Compact disk and magnetic media technology offer cost effective mass storage of documents, images and other data, in a format readily accessible for use with personal computers in a home or office environment. The combination of personal computers, compact disk technology and multimedia interactive graphic user interfaces, permits the access and display of textual and graphic information by personal computer (PC) users in a manner not previously known in the industry. The type of information potentially available to a PC user includes professional and technical publications, newspapers, magazines, and other scientific and literary data and images.
However, much of the information which is published through, for example, government sources, newspapers and magazines is not in machine readable form, but rather is printed on paper. Because of the amount of work and effort required to convert the printed information into a machine readable form, only a small portion of the total published information is currently available for use by PC users using magnetic disks, CDs and the like. In addition, the information which is in machine readable form is typically available either as an image of the original document or as a stream of text data. An image of a document has the advantage of presenting the information in its original format as published, including non-text material, such as drawings, equations, symbols, diagrams, etc. The viewer is familiar with the format, and the information is easily recognized and understood. However, since a document image is often stored as a bitmap, the content of the document cannot be easily searched or manipulated. Alteratively, a text data stream format has the advantage of presenting the information in a manipulable and searchable format. Unfortunately, in many cases, the format of presentation is not the format in which the information was originally published in print. Thus, the users are often unfamiliar with the format, inhibiting easy navigation of the document making information difficult to find and use.
One example of the problem of reproducing originally published documents stored in machine readable form, is the storage and display of United States patent documents by the United States Government. The United States Patent Office (herein referred to as the "PTO") provides magnetic tapes of issued U.S. patents and other documents, in the form of a scanned in image, and as a separate stream of text data. The magnetic tape storing the text data does not include graphical illustrations such as drawings, charts, textual tables, or much in the way of formatting data. Thus, the reproduction of a United States patent from PTO Text Files stored on magnetic tape does not result in the display of a U.S. patent as originally published by the U.S. Government. An example of a well known system for displaying text files provided by the PTO is that of the LexPat.RTM. system provided by Mead Data offered in conjunction with the Lexis.RTM. display system. Using the LexPat.RTM. system, the display of a U.S. patent on a terminal, such as a PC, results in a display of text only, and does not include drawings, charts, graphs, or original formatting information. The text of a selected patent appears in ASCII format, but does not appear as the original patent issued by the PTO, and may not be referenced by the original column and line numbers from the published patent. Other systems display text files of periodicals such as the Wall Street Journal or legal documents such as contracts. However, the text files do not appear as the original documents.
The U.S. Patent Office also provides magnetic tapes with image files comprising a scanned in image of the original U.S. patent issued by the PTO and published by the U.S. Government. The image files provided on magnetic tape by the PTO simply represent a bitmap image of the original published patent. As a scanned in image, the entire patent is provided including drawings, charts, graphs, text and the original format, since it represents a simple bitmap of the scanned original document. However, a scanned document may not be easily searched, edited, navigated or otherwise manipulated as can a text file.
As will be described, the present invention provides a method and apparatus for extracting, synchronizing, displaying, navigating and manipulating text and image documents simultaneously in electronic form. The present invention is described with particular reference for use with U.S. patent documents, and includes the process of extracting patent text and image data from magnetic tapes provided by the PTO, synchronizing the text and image data for recovering the original format (i.e., columns and lines) of the original published patent, and displaying the formatted text along with images using a unique graphical user interface (GUI) workbench. Although the present invention is described with reference to patent documents, it will be appreciated that the invention has application to a variety of different types of documents and applications.
The present invention's graphical user interface permits a user to selectively view ASCII text documents as well as bitmapped scanned images simultaneously on a display. When used in conjunction with U.S. patent documents, the graphic user interface of the present invention allows a user, such as a patent attorney, to display and manipulate both textual as well as graphic portions of patents. The text of a patent may be viewed on the display as it was originally published by the PTO, including column and line numbers. Simultaneously, the user may view the figures of a patent in the form of an image comprising a bitmap. Various functions are provided by the present invention for viewing, manipulating and displaying the patent documents. In order to assist the reader in understanding of graphic user interface (GUI) technology, it is suggested that certain references be considered for background. Many user interfaces utilize metaphors in the design of the interface as a way of maximizing human familiarity, and conveying information between the user and the computer. As for the use of familiar metaphors, such as desktops, notebooks, spread sheets, and the like, the interface takes advantage of existing human mental structures to permit a user to draw upon the metaphor analogy to understand the requirements of the particular computer system. (See for example, Patrick Chan "Learning Considerations in User Interface Design: The Room Model", Report CS-84-16, University of Waterloo, Computer Science Department, Ontario, Canada, July, 1984 and the references cited therein.) In addition, the reader is referred to the following references which describe various aspects, methods and apparatus associated with prior art graphic user interface design: U.S. Pat. No. Re.32,632; U.S. Pat. No. 4,931,783; U.S. Pat. No. 5,072,412; and U.S. Pat. No. 5,148,154, and the references cited therein.
As will be described more fully below, the present invention's graphic user interface is based on a desktop "windows" metaphor, and provides the user with the ability to simultaneously display text and image documents in both a synchronized and unsynchronized fashion, as will be more fully described herein.
SUMMARY OF THE INVENTION
The present invention provides a method and apparatus for extracting, synchronizing, displaying, and manipulating text and image documents in machine readable form for display. In the preferred embodiment of the present invention, text and image files for documents, such as for example patent documents, are initially stored on separate magnetic tape media. These data files are extracted from the respective tapes and placed onto a faster medium, such as a hard disk drive. Catalogs are generated of the contents of the tapes and procedures are provided for locating and loading tapes from a tape inventory. The text and image files are synchronized to produce Equivalent Files using heuristic algorithms to create an approximate equivalence relationship between the text and the image files. In the presently preferred embodiment, the automatic pagination of the text and image files provides an equivalence relationship, and a final Equivalent File is obtained through human intervention to correct any inaccuracies still remaining after the automatic process has been completed. However, the present invention also contemplates an entirely automatic pagination process which would require no human intervention to obtain a usable Equivalent File. A word based inverted tree index is created for the text files to allow for very fast text searching using a graphic user interface (GUI) workbench.
The Equivalent Files and image files residing on, for example, a hard disk drive or compact disk (CD), are coupled as a resource to a computer display system. The computer display system includes a computer having a central processing unit (CPU) coupled to memory and input/output (I/O) circuitry. The computer is also coupled to a CD ROM, hard disk drive, or other mass memory device onto which the Equivalent File and image file have been stored. The computer is coupled to a display, such as a cathode ray tube (CRT) or liquid crystal display, as well as a keyboard and a cursor control device. The graphic user interface of the present invention is displayed by the computer on the CRT, and includes a menu bar and a tool bar, each bar having a plurality of command options for selection by a user. The graphical user interface of the present invention permits the user to display, manipulate, and navigate the Equivalent File created using the process of the present invention, and to simultaneously view the image file on the display. In accordance with the teachings of the present invention, the Equivalent File may be synchronized with the image file, or alteratively, an Equivalent File may be displayed along with a completely separate and distinct image (for example, viewing the Equivalent File of one patent while viewing the image file of another patent). Once created, and as shown on the display, the Equivalent File is displayed in substantially the same column and line format as a printed patent published by the U.S. Government.
Using the graphic user interface of the present invention, a user may create libraries of patent text Equivalent Files and image files, as well as open cases to include a plurality of different patents or other documents. The Equivalent File may be selectively viewed on the display in an equivalent window. The Equivalent File may be navigated, highlighted, searched, and otherwise annotated using highlights, patent and case notes. Simultaneous with the viewing of the Equivalent File of a patent within the equivalent window, the user may view the exact portion of the image file corresponding to the display of the Equivalent File, or any portion of an image file within one or more image windows on the display. The present invention further provides search mechanisms for defining and searching key words chosen by the user or selected from the Equivalent File, or a word list. Boolean and proximity searches may also be performed on the Equivalent File and the results displayed. The search terms may be used to search documents within the equivalent window of a current Equivalent File, current library of documents, documents notes (referred to herein as "patent notes" and/or "case notes"), as well as other selected cases. The word list includes an alphabetical list of all words within the selected library, document or the like. The present invention also permits the user to display an image, for example a patent drawing image, within the image window by placing a cursor in the text of a patent Equivalent File and signaling the computer. In response to this signal, the computer displays the last referenced figure drawing within the image window. The interface of the present invention also permits the user to select portions of text and/or drawings within the image window, and enlarge or reduce the selected image for viewing by the user. The interface further permits the user to select any element number appearing on the patent drawings in the image window. The selection of an element number in a patent drawing results in the automatic highlighting of the first and every subsequent occurrence of that element number in the Equivalent File comprising a specification and claims of the selected patent equivalent displayed in the equivalent window. Additionally, multiple patents, drawings and/or other documents may be viewed simultaneously on the display in accordance with the teachings of the graphic user interface comprising the present invention. A variety of other features and functions are provided by the present invention for the manipulation, navigation and display of patent documents on the user interface. The user may display either a synchronized Image File wherein the image displayed is synchronized with the Equivalent file displayed, or an unsynchronized Image File wherein the image displayed is at some page other than the one containing the column of text in the Equivalent File. A user may also copy and paste a portion of, or the whole, Equivalent File to notes of third party programs, such as word processors or drawing programs as well as allowing the user to import ASCII text into the notes from third party systems, such as deposition testimony in ASCII format into patent notes that relate to the topic of the testimony. Particularly when using the present invention with patents, it may be used to facilitate patent searching in the preparation and prosecution of patents, licensing of patents, litigation of patents, conducting infringement and validity studies of patents, producing infringement claim charts, managing and valuing a portfolio or group of patents, conducting 35 U.S.C. .sctn. 112 searches on patents or pending applications, and many other uses which are regularly performed by a patent attorney, patent agent or technical personnel.
NOTATION AND NOMENCLATURE
In some of the detailed descriptions which follow, the present invention is presented partly in terms of interface display images, process steps, and symbolic representations of operations of data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art.
An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. These steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities may take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, displayed and otherwise manipulated. It proves convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, images, terms, numbers, or the like. It should be borne in mind, however, that all of these similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
In the present invention, the operations referred to are machine operations performed in conjunction with a human operator. Useful machines for performing the operations of the present invention include general purpose digital computers, digitally controlled displays or other similar devices. In all cases, the reader is advised to keep in mind the distinction between the method of operating a computer and/or display system, and the method of computation itself. The present invention relates to methods for operating a computer and interactive display system, and processing electrical or other physical signals to generate other desired physical signals.
The present invention also relates to apparatus for performing these operations. This apparatus may be specially constructed for the required purposes or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. The method steps presented herein are not inherently related to any particular computer or other apparatus. Various general purpose machines may be used with programs in accordance with the teachings herein, or it may prove more convenient to construct specialized apparatus to perform the required method steps. As such, no particular programming language is provided, as any one of a variety of languages may be utilized to implement the invention. The required structure for a variety of these machines and programming environments will be apparent from the description given below.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a block diagram of production configuration to extract text and image files, paginate the text files with the image files to produce Equivalent Files, and index the Equivalent Files.
FIG. 2 is a flow chart illustrating the sequence of steps utilized by the present invention to extract text and image files, paginate the text files with the image files to produce Equivalent Files, index the Equivalent Files and display the Equivalent Files and/or Image Files on a display.
FIG. 3 is a functional block diagram illustrating a computer display system incorporating the teachings of the present invention.
FIG. 4 illustrates an enlarged portion of an image file comprising the bibliography page of U.S. Pat. No. 5,165,027.
FIG. 5 illustrates a sample portion of a PTO Text File for U.S. Pat. No. 5,165,027 illustrated in FIG. 4.
FIG. 6 illustrates an example of the column information listed in the PTO Text File for the U.S. Pat. No. 5,165,027 illustrated in FIGS. 4 and 5.
FIG. 7 illustrates the paragraph shown in FIG. 6 as it is stored in the PTO Image File for U.S. Pat. No. 5,165,027.
FIG. 8 illustrates the column line number information provided by a published United States patent.
FIG. 9 illustrates a flow chart block diagram of the extraction process utilized by the present invention to extract PTO Text Files and PTO Image Files for magnetic tapes provided by the PTO for use by the processing system of the present invention to synchronize and index the text and image files.
FIG. 10 is a flow chart illustrating the pagination process of the present invention to synchronize the PTO Text File and the PTO Image File to produce an Equivalent File.
FIG. 11 illustrates the user interface of the present invention upon system start including the title, menu and tool bars.
FIG. 12 illustrates the selection by a user of a down arrow function to open a list of available cases.
FIG. 13 illustrates the present invention's use of information arrows to direct the user to currently available options for execution.
FIG. 14 illustrates the patent text toolbox of the present invention and the display of a menu of patent section headings to assist the user in navigating a selected patent.
FIG. 15 illustrates the sub-command items available for selection by a user upon activating the Library menu option.
FIG. 16 illustrates the Set Library Directories dialog box, displayed after selection of the Set Library Directories sub-command item on the Library menu.
FIG. 17 illustrates the New Library dialog box.
FIG. 18 illustrates the Open Library dialog box.
FIG. 19 illustrates the present invention's Library dialog box for working with the library currently in use.
FIG. 20 illustrates the selection of a patent within the Intel.RTM. Library.
FIG. 21 illustrates the present invention's minimization of a library to an icon.
FIG. 22 illustrates the present invention's Update Library dialog box for updating the library currently in use, which in the present example, the Intel.RTM. Library.
FIG. 23 illustrates the present invention's Search Library dialog box which is displayed upon selection of the Search sub-command item from the library menu.
FIG. 24 illustrates the present invention's Word List dialog box which is displayed upon the activation of the Word List button function within the Search library dialog box.
FIG. 25 illustrates the operation of the present invention's Word List dialog box for selecting an alphabetical tab and viewing the corresponding list of words from the library patents.
FIG. 26 illustrates the present invention's Search Results dialog box identifying the number of occurrences of the search term defined by the user in each of the library patents.
FIG. 27 illustrates the present invention's Library to Case Cross Reference dialog box.
FIG. 28 illustrates the present invention's Patent Text Toolbox for operating upon Equivalent Files displayed in an equivalent window.
FIG. 29 further illustrates the present invention's Patent Text Toolbox for operating upon the Equivalent File within the equivalent window.
FIG. 30 illustrates the present invention's simultaneous display of an equivalent window and an image window, as well as the display of a Patent Image Toolbox for operating upon images displayed within the image window.
FIG. 31 illustrates the present invention's simultaneous and synchronized display of an Equivalent File in an equivalent window and enlarged image displayed in an image window on the display screen.
FIG. 32 illustrates the display of patent section headings and the ability of a user to navigate the patent sections displayed within the equivalent window through the selection of section headings.
FIG. 33 illustrates the present invention's synchronization of an Equivalent File displayed in the equivalent window with the drawings of a patent disposed in an image file displayed in an image window on the display screen. The present invention links references to the figure numbers in the Equivalent File to the figures in the image file displayed in the image window.
FIG. 34 illustrates the present invention's use of an outline box to identify an area of the patent image to be enlarged.
FIG. 35 illustrates the present invention's user interface in which an Equivalent File is displayed in an equivalent window, and simultaneously, an enlarged portion of a figure from the image file is displayed in the image window on the display screen.
FIG. 36 illustrates the present invention's Select Element Number dialog box, which permits a user to input a drawing element and locate the first occurrence and the subsequent occurrences of the drawing element in the Equivalent File displayed in the equivalent window.
FIG. 37 illustrates the present invention's use of highlighting to highlight desired portions of the Equivalent File in various colors.
FIG. 38 illustrates the present invention's display of two equivalent windows and one image window on the display screen.
FIG. 39 illustrates the Import Patents dialog box of the present invention.
FIG. 40 illustrates the Import Patents dialog box after the selection of an Equivalent File to be imported.
FIG. 41 illustrates sub-command items available for selection upon the activation of the Case menu option.
FIG. 42 illustrates the Open Case dialog box which is displayed once the Open Case sub-command item illustrated in FIG. 41 is selected.
FIG. 43 illustrates the New Case dialog box which is displayed upon the selection of the New Case sub-command item illustrated in FIG. 41.
FIG. 44 illustrates the patent number drop down menu which permits a user to select a patent within a case for displaying.
FIG. 45 illustrates the Update Case dialog box which is displayed upon the activation of the Update Case sub-command item illustrated in FIG. 41.
FIG. 46 illustrates the search case dialog box which is displayed upon the selection of the Search sub-command item of the Case menu illustrated in FIG. 41.
FIG. 47 illustrates the Set Case Directories dialog box which is displayed upon the activation of the Set Case Directories sub-command item illustrated in FIG. 41.
FIG. 48 illustrates the Copy to Case dialog box which is displayed upon the selection of the Copy Case sub-command item illustrated in FIG. 41.
FIG. 49 illustrates the Backup Case dialog box which is displayed upon the activation of the Backup Case sub-command item of FIG. 41.
FIG. 50 illustrates the Delete dialog box which is displayed upon the selection of the Delete Case sub-command item illustrated in FIG. 41.
FIG. 51 illustrates the Print dialog box of the present invention which is displayed upon the activation of the Print sub-command item illustrated in FIG. 41.
FIG. 52 illustrates the Print Setup dialog box which is displayed upon the activation of the Print Setup sub-command item illustrated in FIG. 41.
FIG. 53 illustrates the sub-command items available for selection upon the activation of the Edit command option.
FIG. 54 illustrates the sub-command items available for selection by a user upon the activation of the View command option.
FIG. 55 illustrates the Preferences dialog box displayed upon the activation of the Preferences sub-command item of FIG. 54.
FIG. 56 illustrates the Screen Layout dialog box which is displayed upon the selection of a Screen Layout sub-command item of FIG. 54.
FIG. 57 illustrates the user interface of the present invention upon the selection of the Screen Layout of the Screen Layout dialog box illustrating one equivalent window and one image window on the display screen.
FIG. 58 illustrates the user interface of the present invention in which two equivalent windows are displayed side by side on the display screen after selection of Screen Layout of the Screen Layout dialog box.
FIG. 59 illustrates the graphic user interface of the present invention in which two equivalent windows and two image windows are displayed on the display screen subsequent to the selection of Screen Layout of the Screen Layout dialog box.
FIG. 60 illustrates the sub-command items available for selection upon the activation of the Window command option.
FIG. 61 illustrates the patent note menu of the present invention which displays all patent notes which have been generated by a user.
FIG. 62 illustrates a patent note of the present invention.
FIG. 63 illustrates the present invention's use of multi-notes wherein multiple patent notes may be created within a single patent note.
FIG. 64 illustrates the present invention's case note.
FIG. 65 illustrates the minimization of exemplary documents, such as search results and the like on the display of the present invention.
FIG. 66 illustrates the present invention's Go To Section dialog box which permits a user to input a patent column number and upon activation, results in the display of the column in the Equivalent File corresponding to the desired patent column.
FIG. 67 illustrates the present invention's Go To section dialog box which permits a user to select a section of the patent and upon activation, results in the display of the selected section in the Equivalent window.
FIG. 68 illustrates the sub-command items available for selection by a user upon the activation of the Help command option.
FIG. 69 illustrates the About dialog box which is displayed upon the activation of the About sub-command item illustrated in FIG. 68.
FIG. 70 illustrates the sub-command items which are available for selection by a user upon the activation of the Note command option.
FIG. 71 illustrates the case notes in Case dialog box which is displayed upon the selection of the View Case Note sub-command option illustrated in FIG. 70.
FIG. 72 illustrates the patent notes in Case dialog box which is displayed upon the selection of the View Patent Note sub-command item illustrated in FIG. 70.
FIG. 73 is a simplified block diagram of a computer system according to a preferred embodiment of the present invention.
FIG. 74 is a flowchart depicting the preferred manner in which data transfer operations occur between machines in the computer system of FIG. 73.
FIGS. 75, 76A, and 76B are used to describe the manner in which PTO Image files are compressed according to a preferred embodiment of the present invention.
FIGS. 77 and 78 are flowcharts depicting the manner in which pagination is performed according to a preferred embodiment of the present invention.
FIGS. 79 and 80 are used to describe a "Copy Claims" option preferably provided by the user interface of the present invention.
FIGS. 81 and 82 are used to describe a "Zoom Image" option preferably provided by the user interface of the present invention.
FIG. 83 is used to describe a "Copy Image" option preferably provided by the user interface of the present invention.
FIG. 84 is used to describe a "Lock Windows" option preferably provided by the user interface of the present invention.
FIGS. 85A and 85B are used to illustrate the preferred manner in which the present invention performs clumping.
FIG. 86 is used to illustrate the preferred manner in which the present invention performs character stream matching.
FIGS. 87 and 88 illustrate note databases according to first and second embodiments, respectively, of the present invention.
FIG. 89 illustrates an example note window according to the second embodiment shown in FIG. 88.
FIG. 90 illustrates an example display showing a text window, an image window, and a notes window.
DETAILED DESCRIPTION OF THE INVENTION
In the following description, numerous specific details are set forth such as functional blocks, representative data processing devices, window configurations, specific patent documents, text and drawings, etc., to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well known circuits and structures are not described in detail in order not to obscure the present invention unnecessarily.
The present invention will be described in various sections including a discussion of the general system configuration, the tape extraction process, the pagination process, the indexing process, and the graphic user interface. It is to be understood that although the following description is directed to U.S. patent documents, the present invention is not limited to patents, and has application to a variety of documents and images, as may be required by a particular application, such as for example, legal contracts, the Wall Street Journal, The Los Angeles Times, etc.
GENERAL OVERVIEW OF THE INVENTION
The general system configuration of the present invention discloses one possible implementation of the present invention for the display, navigation, manipulation and editing of text and image data in a graphical user interface. As will be described, the general system configuration describes a computer display system which may be in the form of a personal computer, workstation, or dedicated processor system to permit the user to utilize the teachings of the present invention. No particular computer hardware is described within this specification, and the general system configuration description is intended to encompass a broad range of possible data processing systems in which the present invention may be implemented.
A general overview of the system of the present invention is shown in FIG. 1, and a flow chart of the primary process steps comprising the method of the present invention is illustrated in FIG. 2.
The tape extraction process of the present invention extracts data files from PTO text and PTO Image File magnetic tapes provided by the PTO. The data files are extracted from these tapes onto a faster medium (such as a hard disk drive) to provide access times which are useful in modem data processing systems. As will be described, the process of extraction involves appropriately generating catalogues and inventories of the contents of the tapes, as well as procedures for selecting and loading tapes from the newly created tape inventories.
The process of paginating the PTO Text Files and the PTO Image Files to produce "Equivalent Files" is performed by using a heuristic set of algorithms to automatically create an approximate equivalent relationship between the text and image files. A human operator verifies the results to finalize the Equivalent File, such that the original formatting of the published patent document is reflected in the Equivalent File.
As will be described, a process for creating an inverted tree index for the text contained in the PTO Text Files is disclosed. This indexing process results in a pre-built index for very fast text searching when using the graphic user interface of the present invention. Although the present invention describes an inverted tape index, other types of text searching methods may be employed, instead of the inverted tape index.
The graphic user interface ("GUI") of the present invention displays the Equivalent File and the PTO Image File, and allows the user to perform analysis on the displayed files or other stored files. The Equivalent File is formatted and displayed with a similar appearance to the PTO Image File, having the same column and line formatting as the published patent. The user may then, for example, use the GUI to perform text searches to generate accurate column and line citations, navigate the Equivalent File via section headings to locate desired sections of text, as well as to view the figures or text images in the displayed files or other stored files. Images and equivalent patent text may be viewed either in a synchronized or unsynchronized fashion using the teachings of the present invention.
GENERAL SYSTEM CONFIGURATION
FIG. 1 illustrates a block diagram of the present invention's production configuration to extract text and image files, to paginate the text files to produce Equivalent Files, and to index the Equivalent Files. The process begins with the PTO magnetic tapes 1 that are of type 3480 from the PTO. There are three different categories of PTO magnetic tapes: PTO text tapes, PTO image tapes and PTO assignment tapes. A UNIX machine 2 reads the data in the PTO tapes 1 into a large file buffer. The data is then parsed to find each of the documents that are on the tapes. Parsing creates a table which contains patent numbers, the physical locations of the patent files on the tapes, the total number of bytes and other control information about each document that appears on the tape. A document can be either a patent, a certificate of correction, a reissued patent disclaimer or any other post-issuance document. The data can then be either stored in a digital linear tape (DLT) 3 or in any other suitable data storage medium. Because the amount of disk storage space required for the total active set of patents is greater than 1 terabyte (TB), currently the data is stored into libraries S. The libraries may contain PTO Text Files 6, PTO Image Files 7 and post issuance documents 9. If a disk drive system with a large enough storage is available, the data can be stored in a disk drive. At present, the PTO image tapes are left in their original medium, namely the 3480 magnetic tapes.
Continuing to refer to FIG. 1, when an order 10 requesting a list of patents is entered into a UNIX database 11, the UNIX database 11 sorts the request list by patent location to minimize the number of different tapes that need to be mounted, and sends to the staging machine 8 the list of patents and other pertinent information such as the volume serial number of the tapes, and location information that allows the staging machine 8 to fast forward to the individual patent files that are requested. The staging machine 8 creates a file on its disks of all the text and image portions of each patent that has been requested to process. When the staging machine 8 has the text and image files available, it sends the text and image files to the pagination machine 13.
Further referring to FIG. 1, at present, the pagination machine 13 utilizes one or more DOS based machines 16 to paginate the text and image files and to create Equivalent Files as described more fully in the Terminology and Definition section in this Specification. After pagination, an index machine 19 adds post issuance documents 9 and indexes the Equivalent Files. The index machine 19 incorporates one or more DOS based machines 20. Next, the manufacturing machine 23 creates a CD ROM image of the Equivalent Files and the Image Files and writes the image to a CD ROM and digital linear tapes 28. The manufacturing machine 23 may utilize one or more DOS based machines 27, a CD ROM writer 25 and digital linear tapes 28. The CD ROM with the Equivalent Files and the Image Files are delivered to a user who then uses a system, such as the one illustrated in FIG. 3, to display and manipulate the files. The digital linear tapes with the finished patents are stored in a library 30, and the database 11 is updated so that when a particular patent in the library 30 is requested, the staging machine 8 mounts the finished patent from the library 30, and the database flags that the patent has already been paginated and indexed, so that pagination and indexing steps can be skipped for a faster process. Although in the present invention, specific machines such as UNIX machines and DOS machines are disclosed, these are mere examples of different types of computer systems that can be incorporated and not limitations upon the present invention.
As evident from the above description, the present invention involves a significant amount of transfer of data between machines, such as between the extraction machine, the libraries 5, the staging machine 8, and the pagination machine 13 (in practice, these machines may be implemented using a single computer platform, or multiple computer platforms). The manner in which such data transfer takes place according to a preferred embodiment of the present invention shall now be described with reference to FIGS. 73 and 74.
FIG. 73 is a simplified representation (in block diagram form) of the system configuration shown in FIG. 1. FIG. 73 shows a computer system 7302 that includes a first client machine 7304, a second client machine 7306, a shared disk drive 7310, and a tape drive 7308. The shared disk drive 7310 is preferably part of the second client machine 7306, and the second client machine 7306 is preferably a UNIX-based machine. Preferably, the shared disk drive 7310 can be directly accessed by both the second client 7306 and the first client 7304.
As will be appreciated, in UNIX-based systems, file rename operations are atomic operations. Thus, the shared disk drive 7310 cannot perform any other file-related operations when it is performing a file rename operation (this is the case, since the shared disk drive 7310 is part of the UNIX-based second client machine 7306).
There are instances when the first client 7304 will want to access data in the tape drive 7308 via the second client 7306. Consider the case where the first client 7304 represents the pagination machine 13, the second client 7306 represents the staging machine 8, and the tape drive 7308 represents the library 5. Often, the pagination machine 13 will want to access data in the library 5. To do so, the pagination machine 13 will have to interact with the staging machine 8. Preferably, such interaction between the pagination machine 13 and the staging machine 8 is achieved by using the shared disk drive 7310. In particular, the pagination machine 13 (i.e., the first client 7304) writes a "read" command on the shared disk drive 7310. The staging machine 8 (the second client 7306) retrieves the "read" command from the shared disk drive 7310 and then performs the "read" command, wherein such performance of the "read" command results in data being read from the library 5 (tape drive 7308) and transferred to the pagination machine 13. Other data transfer scenarios in the system configuration shown in FIG. 1 will be apparent to persons skilled in the relevant art.
As will be appreciated, handshaking must be implemented between the first client 7304 and the second client 7306 to ensure that the second client 7306 does not read the "read" command from the shared disk drive 7310 before the first client 7304
has finished writing the "read" command to the shared disk drive 7310. Otherwise, improper operation will result.
FIG. 74 is a flowchart 7402 representing the operation of the first client 7304 and the second client 7306 during data transfer operations. Such operation of the present invention achieves handshaking during data transfer operations without requiring any explicit communication between the first and second clients 7304, 7306. This helps in reducing the load on system resources (such as communication bandwidth), thereby optimizing system performance. Flowchart 7402 begins with step 7404, where control immediately passes to step 7406.
In step 7406, the first client 7304 begins writing a read command file (which contains commands that instructs the second client 7306 to read data from the tape drive 7308) to the shared disk drive 7310. The read command file is named "DLT.CXX". The second client 7306 periodically scans through the shared disk drive 7310 and retrieves and executes files with a ".CMD" extension.
Step 7408 is performed after the first client 7304 has completely written the file "DLT.CXX" to the shared disk drive 7310. In step 7408, the first client 7304 changes the name of the "DLT.CXX" file to "DLT.CMD". As discussed above, file rename operations are atomic operations. Thus, the second client 7306 is not able to read the "DLT.CMD" file from the shared disk drive 7310 until the rename operation is complete (and the rename operation is not initiated until the read command file has been completely written to the shared disk drive 7310).
In step 7410, after the rename operation is complete, the second client 7306 discovers that a file with a ".CMD" extension is located in the shared disk drive 7310 (i.e., the "DLT.CMD" file). The second client 7306 retrieves the "DLT.CMD" file from the shared disk drive 7310, and executes it. Operation of flowchart 7402 is complete after step 7410 is performed, as indicated by step 7412.
Referring now to FIG. 3, an exemplary computer display system for use in accordance with the teachings of the present invention is shown. The computer system includes a display 40, such as a CRT monitor or a liquid crystal display (LCD), and further includes a cursor control device 42, such as a mouse of the type shown in U.S. Pat. No. Re.32,632, a track ball, joy stick, keyboard or other device for selectively positioning a cursor 44 on a display screen 68 of the display 40. Typically, the cursor control device 42 includes a signal generation means, such as a switch 46 having a first position and a second position. For example, the mouse shown and described in U.S. Pat. No. Re.32,632 includes a switch which the user of the computer system uses to generate signals directing the computer to execute certain commands. As illustrated, the cursor control means 42 (hereinafter all types of applicable cursor control devices, such as mice, track balls, joy sticks, graphic tablets, keyboard inputs, and the like, are at times collectively referred to as the "mouse 42") is coupled to a computer 48.
The computer 48 comprises three major components. The first of these is an input/output (I/O) circuit 50 which is used to communicate information in appropriately structured form to and from other portions of the computer 48. In addition, the computer 48 includes a central processing unit (CPU) 52 coupled to the I/O circuit 50 and a memory 55. These elements are those typically found in most general purpose computers, and in fact, computer 48 is intended to be representative of a broad category of data processing devices capable of generating graphic displays.
Also shown in FIG. 3 is a keyboard 56 to input data and commands into the computer 48, as is well known in the art. A mass memory disk 60 is shown coupled to I/O circuit 50 to provide additional storage capability for the computer 48. In addition, a CD ROM 62 and a floppy disk 64 is further coupled to the I/O circuit 50, for providing, as will be described, a library of textual documents and images to be displayed on the display 40. It will be appreciated that additional devices may be coupled to the computer 48 for storing data, such as magnetic tape drives, as well as networks, which are in turn coupled to other data processing systems. A printer 57 is coupled to the I/O circuit 50 for printing documents, images, and the like, as is well known.
In one embodiment, the present invention is a computer program product (such as a floppy disk, compact disk, etc.) comprising a computer readable media having control logic recorded thereon. The control logic, when loaded into memory 55 and executed by the CPU 52, enables the CPU 52 to perform the operations described herein. Accordingly, such control logic represents a controller, since it controls the CPU 52 during execution.
As illustrated in FIG. 3, the display 40 includes the display screen 68 in which a window 70 is displayed. The window 70 may be in the form of a rectangle or other well known shape, and may include a menu bar 72 disposed horizontally across the length of the window, or in any other desired position on the window. As is well known, the movement of the mouse 42 may be translated by the computer 48 into movement of the cursor 44 on the display screen 70. The reader is referred to literature cited in the background describing object-oriented display systems generally, and in particular, desktop metaphor window-based systems for additional description related to other computer systems which may be utilized in accordance with the teachings of the present invention. The system illustrated in FIG. 3 is intended to represent a general computer display system capable of providing a graphic user interface display.
In this specification, the present invention is described with reference to the display, navigation, and manipulation of United States patent documents. In particular, the invention is described herein as providing a unique method and apparatus for extracting, paginating, displaying, manipulating, navigating and editing the text of issued United States patents, and simultaneously displaying an image of a patent including the drawings on the display 40. Although the description herein describes the invention with reference to patent documents, as has previously been mentioned, it will be appreciated by one skilled in the art that the present invention may be used in a variety of applications which require the simultaneous display, synchronization of, or unsynchronized display of, text and images on a display. For purposes of this specification, all references to "patents" or documents generally, shall be understood to encompass documents of every type, and are not limited solely to patent documents.
In addition, it will be noted that no particular programming language has been disclosed to implement the present invention using the computer display system illustrated in FIG. 3. A variety of programming languages such as C, C++, Visual Basic, etc. may be used to implement the present invention on many different computer display platforms, using the teachings described herein.
TERMINOLOGY AND DEFINITIONS
A "PTO Image File" is an electronically stored data file in the format specified in the document: "U.S. Patent and Trademark Office APS U.S. Patent Image Data File". Each of these files contains one or more image pages from a patent document. Each image page in a PTO Image File is an electronic representation of an actual page of a patent or a related patent document (such as a Certificate of Correction). The image pages are created by the U.S. Patent and Trademark Office by the use of an electronic scanner, and are stored in the PTO Image File in Group 4 compressed format (see Federal Information Processing Standards publication 150: "Facsimile Coding Schemes and Coding Control Functions For Group 4 Facsimile Apparatus"). An enlarged portion of an exemplary image page (the bibliography page of U.S. Pat. No. 5,165,027) is shown in FIG. 4.
A "PTO Text File" is an electronically stored data file in the format specified in the document: "U.S. Patent and Trademark Office Patent Full-Text/APS File". Each of these files contains an ASCII text representation of most of the textual data in a patent document. Generally, the bibliography information and the text paragraphs in the main body of the patent will be found in this file. Some equations and tables of textual information that appear in a patent will also be stored in this type of file. Visual information, such as diagrams and tables containing information of a graphical nature, and formatting information will not be found in the PTO Text File. In addition, the column and line number information that appears on published patents is not stored in the PTO Text File nor is the format of the bibliographical page.
The ASCII data in a PTO Text File is stored in fixed-length eighty character records. The first four characters of each record are an ID code that identifies what type of data the record contains, the fifth character is a blank, and the last seventy-five characters of the record store the actual data values. If the first four characters are all blanks, then the record is a continuation of the previous record.
For example, in the PTO Text File for U.S. Pat. No. 5,165,027 (part of which is illustrated in FIG. 5), there is a record that begins with "TTL" followed by "Microprocessor breakpoint apparatus". This "TTL" record stores the title of the patent which is "Microprocessor breakpoint apparatus". One of these "TTL" records is required in every patent.
Another record, which begins with "ISD" and contains "19921117", shows the issue date of the patent which is Nov. 17,1992.
In both of the above examples, the amount of data to be stored is seventy-five characters or less and therefore fits in one record. In many instances, there is too much data to fit in one record. Paragraphs of text from the main body of the patent are often split into multiple records because they have more than seventy-five characters. The first record of such a paragraph would start with an identifier ("ID") such as "PAR" (which indicates a paragraph whose first line is indented). The subsequent records used to hold the paragraph would start with an ID of four blanks indicating that these records are continuations of the first record. As many words as will fit in the seventy-five characters (without breaking the words) are stored in ear-h record (see FIG. 5).
The PTO Text File stores data relating to a patent in an informational format using ASCII text rather than a visual display format (see FIG. 5). The Text File is comprised of records that contain labeled pieces of information. This is a very convenient format for processing the information about a patent using a computer (such as performing a text search or navigating the text of the patent).
The PTO Image File stores data relating to a patent in a scanned bitmap display format that is very easy for a human being to work with (see FIG. 4) since it visually appears as the original published patent. The PTO Image File comprises a series of digitized page images that are created by using a page scanning device to capture black and white pictures of typeset patent pages. This is a very convenient format for allowing a human to view the information contained in a patent. For example, the image pages can be printed on a laser printer to produce a readable paper document that visually displays the diagrams, equations and figures of the patent, as it was published by the U.S. Government.
An "Equivalent File" is an electronically stored data file which contains pagination information that details the equivalence relationship between a PTO Text file and a PTO Image File. This relationship makes both the PTO Text file and the PTO Image File more useful by specifying how the record-based ASCII data of the PTO Text file can be manipulated to be substantially equivalent in appearance to the PTO Image File and yet still retain its useful properties as an ASCII file.
"Pagination" is a process by which an Equivalent File is created from a PTO Text File and a PTO Image File. The PTO Image File is read to determine the locations of column breaks, column number, line breaks and line numbers as well as the locations and sizes of imbedded tables, structures, equations, and other non-text information in the specification. Pattern recognition techniques familiar to those skilled in the art are used to block and segment the layout of the image pages.
The PTO Text File is read to determine bibliographic information, figure references, section headings, font style, point size, superscript, subscript, boldness or presence of italicized type, and special characters.
The results of these two operations are then combined either manually, or by the use of Optical Character Recognition techniques to produce the equivalent File. Each of the PTO Text File paragraphs that begins with a bibliographic information ID code is formatted to approximate the appearance of the Bibliography section on a typeset PTO Bibliography Image Page. Likewise each of the text paragraphs from the PTO Text File in the Specification and Claims sections is processed to produce a text file formatted to approximate the appearance of the Specification or Claims section in typeset PTO Specification and Claims image page(s).
The requirement for pagination of the PTO Text File and the PTO Image File arises from several distinct requirements in the field of use. In citing a patent in, for example, a legal proceeding, the specific reference is made by the column number and line number of the portion of interest. These column and line numbers are printed in the published patent and appear in the format of the page represented by the PTO Image File. However, these column and line numbers do not appear in the PTO Text File, making it difficult to discern a proper citation from the PTO Text File. In use, a user may perform a word search on the PTO Text File to locate a specific term. Once located in the PTO Text File, should the user wish to cite that reference, he or she must refer back to the PTO Image File (or the actual paper patent) to locate the exact column number and line number, without the benefit of any information as to that location.
Another requirement for pagination arises from the practice of placing pure images in line with the text in the columns of the patent. For example, a diagram of a structure followed by the text description of that structure, in the PTO Text File would appear only as text, without the image of the structure. The user must refer back to the PTO Image File (or the paper patent) to locate and study the diagram of the structure, again without any information regarding the physical location of the illustration, diagram, figures or the like, from the data in the PTO Text File.
The specific information about how the typesetting equipment processes the data from the PTO Text File to produce the PTO Image File is not available from the U.S. Government. Therefore, the two files must normally be treated as completely separate entities. (The PTO itself uses the files separately on two computers manufactured by Sun Microsystems, Inc.) The PTO Text File is normally used to search for text but has no information as to where or how the information appears in the typeset patent image pages. The PTO Image File is used to view the typeset text, diagrams, figures, and equations but has no representation of the data stored in a format that can be searched by a computer.
The purpose of the Equivalent File of the present invention is to paginate the PTO Text File so that the data in the Text file can be presented in a paginated patent-like format, thus facilitating searching in, and direct citation from the text, a function heretofore not available using the PTO Text Files. The pagination process formats the PTO Text File with correct column breaks, column numbers, end of the breaks, and line numbers, thus allowing direct citation, along with the benefits of pure text searching. The information contained in the Equivalent File can be used in both a familiar visual format by a human being and automatically by the computer at the same time.
A "synchronized" display is a method of navigating an Equivalent File and the corresponding Image File in a way that a user can view a column in the Equivalent file and the same column in the Image File simultaneously. For example, when the user views column 3 of the Equivalent File in a window, he can simultaneously view column 3 of the Image File in another window. Thus, the user can view two files, an Equivalent File and an Image File, in a synchronized manner.
An "unsynchronized" display is a method of displaying one portion of an Equivalent File and another portion of an Image File asynchronously. For example, assume there is a sentence in column 2 of the Equivalent File stating "referring to FIG. 5, the system illustrates . . . " If a user selects the sentence in the Equivalent File, the Image File will display the first page which contains FIG. 5. Thus, the Equivalent File and the Image File do not refer to the same column, but they refer to the related matters. Another example of an unsynchronized display is displaying one portion of the Equivalent File while displaying a completely unrelated drawing, an unrelated table, or a different text portion of the Image File of the same patent or an Image File of another patent. Accordingly, in an unsynchronized display, there may be no relationship or linkage between the Equivalent File and the Image File displayed simultaneously.
The underlying structure of the information stored in the Equivalent File may be stored in many forms. It may be stored in a binary structure format for fast access by a language that implements structure operations such as the C programming language. Another alternative is to store some of the underlying structural information about the text in a generalized markup language such as SGML (Standardized Generalized Markup Language) and store the raw positional information in a binary structure format. There are many alternatives having their own impact on capabilities, speed, and ease-of---use of the present invention. The reader may therefore implement the present invention in the particular programming language which best accommodates the reader's system requirements. As previously described, the present invention may be implemented using a variety of computer systems, including the system shown in FIG. 3.
The SGML may be used in a variety of applications. The SGML may be used to write a patent application that is equivalent in appearance to a published patent. The SGML may be also utilized to create a compound document that contains both the Equivalent File and bit scanned images of tables, flow charts, equations and the like.
Equivalent Files are associated with at least the following types of synchronization information:
1. Column
The positions within the PTO Text File of the first character of each patent text column as those columns are displayed in the PTO Image File. This permits the present invention to determine which ASCII text is displayed in each column of the main body of the patent.
2. Line
The positions within the PTO Text File of the first character of each line of text as those lines are displayed in the PTO Image File. This permits the present invention to determine which ASCII text is displayed in each line of each column of the main body of the patent.
3. Column Line Number
The approximate line number in the patent column that each line of text in the PTO Text File is adjacent to, permitting the present invention to determine the approximate vertical positions of the ASCII text lines displayed in each column of the main body of the patent.
4. Bibliographic formatting
The approximate arrangement of the bibliographic data from the PTO Text File as it appears on the bibliographic page images in the PTO Image File.
5. Graphic Item Locations
The locations in the PTO Image File of the various figures, figure elements, equations, non-text tables, structures and diagrams referred to in the PTO Text File.
6. Sections
The positions within the PTO Text File of the various logical sections of the document (e.g., background of the invention, brief description of the drawings, the claims section, etc.) as they are displayed in the PTO Image File.
7. Font
The font style in which the various ASCII characters in the PTO Text File are displayed in the PTO Image File.
8. Point Size
The font size in which the various ASCII characters in the PTO Text File are displayed in the PTO Image File.
9. Superscript or Subscript
Whether the various ASCII characters in the PTO Text File are displayed as superscripts or subscripts in the PTO Image File.
10. Boldness
The degree of boldness of the font style in which the various ASCII characters in the PTO Text File are displayed in the PTO Image File.
11. Italics
The degree of italicness of the font style in which the various ASCII characters in the PTO Text File are displayed in the PTO Image File.
12. Special Characters
Some of the ASCII characters in the PTO Text File are displayed in the PTO Image File as special characters. Typically a group of characters in the PTO Text File (e.g., ".OMEGA.") will map to one special character in the PTO Image File (e.g., ".OMEGA."). This is due to the ASCII standard not defining many special characters that are useful.
As an example of the "Column" information listed above, refer to the paragraph of text from the main body of U.S. Pat. No. 5,165,027 that begins with "Numerous techniques are used . . . ". FIG. 6 shows how the ASCII characters for this paragraph are stored in the PTO Text File. The same paragraph is displayed in the PTO Image File for U.S. Pat. No. 5,175,027 in FIG. 7.
It should be noted that the paragraph in the PTO Text File (see FIG. 6) is 5 lines long, and that the same paragraph displayed in the PTO Image File (see FIG. 7) is 7 lines long. In addition, no words are broken across fines in the PTO Text File. Words at the ends of lines displayed in the PTO Image File may be spht so that part of the word appears at the end of one line, followed by a hyphen, and the rest of the word appears on the next line (e.g., "perfor-mance").
The Equivalent File is associated with line numbers to identify which of the ASCII characters in the PTO Text File fall in which lines displayed in the PTO Image File. For example, the Equivalent File would store the lines of the paragraph in the PTO Image File (see FIG. 7) beginning with the following characters in the PTO Text File:
Line 1: The "N" in "Numerous".
Line 2: The "m" in the middle of "performance".
Line 3: The "d" in "development".
Line 4: The "T" in "The".
Line 5: The "p" in "part".
Line 6: The "s" in "some".
Line 7: The "t" in "that".
As an example of the "Column" information listed above, refer to the first page image of the specification of U.S. Pat. No. 5,165,027, shown in FIG. 8. As illustrated, the first column of the patent begins with the "M" in "MICROPROCESSOR". The second column of the patent begins with the "d" in "data". These positions are stored in the Equivalent File in order to identify which of the ASCII data in the PTO Text File falls within which columns.
FIG. 8 also shows an example of the "Column Line Number" information listed above. The column of numbers that runs down the middle of the page indicate what line numbers within the patent text columns each of the lines of text falls on. For column 1, shown in FIG. 8, the line that contains "This application is a continuation of application Ser. " is line 4 of the column. In column 2, shown in FIG. 8, the line that shows "address at which a breakpoint is to occur. A second" is line 8 of the column. This information is associated with the Equivalent File to identify the approximate vertical position on the page image where a given line of text appears.
As an example of "Bibliographic formatting" information, reference is made to FIGS. 4 and 5. Notice that the title record which starts with "TTL" has its data displayed in bold below the words "United States Patent", the inventor's name, and a horizontal ruler line. Each piece of bibliographic information is stored as a column of text in the Equivalent File.
Paginating the bibliography data from the PTO Text File to the formatting on the bibliography pages in the PTO Image Files also involves adding text labels to the Equivalent File. For example, the characters "United States Patent [19]" that appear at the top of every bibliography page are not found anywhere in the PTO Text File. These words appear at the top of every patent so their presence in the PTO Image File is unnecessary. However, in order to create an Equivalent File that is similar in appearance to the PTO Image File, these words must be specified in the Equivalent File. The pagination algorithm is designed to add these text labels when they are needed.
EXTRACTION
The extraction process of the present invention is illustrated in block diagram form in FIG. 9. The PTO provides the PTO Text File and PTO Image Files on IBM.RTM. 3480 magnetic tapes. The extraction process identifies the particular IBM.RTM.
3480 tape that a specific PTO Text File or a PTO Image File is located in, extracts those files from the tape(s) and converts them for use by the processing system which synchronizes and indexes the files.
PTO Text Tapes are issued by the PTO on specific calendar dates and contain a unique Volume Serial Number (VSN). All patents issued on a certain date should be present in the tape(s) issued on that date. The tapes do not contain an index. Therefore, extracting a specific PTO Text File requires that the entire 200 MB IBM.RTM. 3480 tape be read into a magnetic disk buffer and stripped of header blocks, tape marks labels, etc., and then parsed to create a Volume Table of Contents (VTOC). The VTOC contains the document number, byte count offset from the beginning of the tape, and the length of the document file in bytes. A separate program may then be used to index the beginning byte of the file and copy the file segment to another file, which then becomes the PTO Text File for the specific patent. It is possible for a PTO Text File to span multiple PTO Text Tapes. When this happens, a procedure is utilized by the present invention to concatenate the multiple file segments together. The VTOC created from the magnetic disk buffer is used to update a Relational Database System (RDB) for future reference, and the buffer is then erased.
PTO Text Files are stored on the magnetic tapes in uncompressed format. PTO Image Files are stored on the magnetic tapes preferably in compressed format, preferably in Group 4 2D (two dimensional) fax format.
According to the present invention, PTO Text files are processed in uncompressed format. However, PTO Image files are processed at least in part in compressed format. The processing of Image Files according to the present invention is generally depicted in FIG. 75.
An example 2D compressed Image is shown in FIG. 75 as block 7506. According to the present invention, the 2D compressed Image 7506 is converted to a 1D compressed Image 7508. Many functions performed by the present invention involve processing this 1D compressed Image 7508. (With some operations, such as with zooming and with pagination, the 1D compressed image 7508 is decompressed to an uncompressed format, as represented by item 7510 in FIG. 75. Typically, this uncompressed Image file contains 2320 bits by 3408 bits. Zooming and pagination is discussed below.) In contrast, conventionally such functions are performed by solely processing uncompressed images.
The structure of a 1D compressed Image shall now be described with reference to FIGS. 76A and 76B. FIG. 76A illustrates a representation of an example uncompressed Image 7602. A typical line 7604 in this uncompressed Image 7602 is shown. This line 7604 includes a number of black spaces (each black space representing a logical 1 bit) and a number of white spaces (each white space representing a logical 0 bit).
A 1D compressed Image 7606 corresponding to the uncompressed Image 7602 is shown in FIG. 76B. This 1D compressed Image 7606 includes a line 7608 (called the compressed line 7608) corresponding to the line 7604 (called the uncompressed line 7604) in the uncompressed Image 7602. The compressed line 7608 represents the uncompressed line 7604 by quantifying the number of black and white spaces in the uncompressed line 7604, while preserving the sequence of such black and white spaces. Thus, as indicated in the compressed line 7608, the uncompressed line 7604 contains 128 black spaces, followed by 64 white spaces, followed by 8 black spaces, followed by 64 white spaces, followed by 102 black spaces, followed by 90 white spaces.
Procedures for converting between uncompressed Images, 2D compressed Images, and 1D compressed Images will be apparent to persons skilled in the relevant art. Such procedures are discussed in many publicly available documents, such as Federal Information Processing Standards Publication No. 150, entitled "Facsimile Coding Schemes and Coding Control Functions for Group 4 Facsimile Apparatus, " Nov. 4, 1988, incorporated herein by reference in its entirety.
INITIAL AUTOMATIC PAGINATION
The initial automatic pagination process is illustrated in flow chart form in FIG. 10. The automatic pagination process utilizes the PTO Text File and creates an Equivalent File that is an initial approximation of the formatting of the original published patent.
The steps of initial pagination of the present invention are as follows:
1. Read the PTO Text File into memory of a computer system (for example, a computer system of the type shown in FIG. 2 may be utilized).
2. Assign each of the ASCII data records that begins with a bibliographic information ID code, an approximate location on the corresponding image page of the PTO Image File at which its data should be displayed. See the document "U.S. Patent and Trademark Office Patent Full-Text/APS File" for a listing of all the bibliographic data record ID codes. Also, see the document "Patents and Trademarks Style Manual" for a specification of how bibliography information is formatted on bibliography pages.
3. Process each of the paragraphs of the main body of the patent. Build a list of the locations of the Logical Groups that are found (see the document "U.S. Patent and Trademark Office Patent Full-Text/APS File" for a listing of the Logical Groups that can appear in the main body of the patent, i.e. "GOVT", "PARN", "BSUM", "DRWD", "DETD", "CLMS ", "DCLM").
4. Save the pagination information to disk in an Equivalent File.
In steps 2 and 3 above, the paragraph formatting procedure is performed whenever there is a data value that might span more than one line in the corresponding page image in the PTO Image File. In addition, the autopagination technique may be utilized on compressed data.
The manner in which the autopagination technique may be utilized on compressed data shall now be described. As discussed above, the present invention generates a 1D compressed image file from the uncompressed image file provided by the PTO. According to an embodiment of the present invention, pagination is performed using the uncompressed PTO text file and the 1D compressed image file. This embodiment is described below with reference to a flowchart 7802 shown in FIG. 78. Flowchart 7802
begins with step 7804, where control immediately passes to step 7806.
In step 7806, clumps are identified in the 1D compressed image file. A clump is a group of dark spaces (each "dark space" representing a logical "1" value) that are adjacent to one another either vertically (between lines) and/or horizontally (within lines) and/or diagonally. In an alternate embodiment, a clump can represent a group of white spaces. The operation performed in step 7806 is called "segmentation". Conventionally, segmentation is not performed using compressed data images. Instead, segmentation is conventionally performed using uncompressed data images. According to such conventional procedures, it is necessary to search an uncompressed data image in the vertical, horizontal, and diagonal directions. However, since the present invention uses 1D compressed images, it is only necessary to search in the vertical and diagonal directions (this is assuming that clumping is done vertically, horizontally, and diagonally; if clumping is only done horizontally and vertically, then the present invention searches only vertically). This is the case since 1D compressed images are already clumped in the horizontal direction (this is apparent from FIG. 76B). Thus, the use of the present invention of 1D compressed images significantly decreases the processing time to perform segmentation.
Preferably, the present invention in step 7806 searches for dark spaces in adjacent rows which vertically overlap. Consider an example 1D compressed image 8502 shown in FIG. 85A, where two rows 8504 and 8506 are shown. Row 8504 has 2 dark spaces, followed by 3 white spaces, followed by 2 dark spaces, followed by 1 white space. Row 8506 has 3 dark spaces, followed by 1 white space, followed by 2 dark spaces, followed by 2 white spaces. The present invention generates a table 8508 shown in FIG. 85B from rows 8504 and 8506. Table 8508 contains information that denotes the boundaries between groups of white and dark spaces in rows 8504 and 8506. Table 8508 contains an entry for each row in the compressed image 8502, such as entries 8510
and 8512 that correspond to rows 8504 and 8506, respectively. Entry 8510 is derived by adding each value in row 8504 with the preceding value or sum. Thus, the "5" in entry 8510 is derived by adding "3" plus "2" from row 8504. The "7" in entry 8510 is derived by adding "2" from row 8504 plus "5" (i.e., The prior sum). Each entry in table 8508 is generated in the same way.
Once table 8508 is generated, clumps are identified by analyzing the dark space boundary information contained in entries 8510 and 8512. For example, the dark space boundary information contained in entry 8510 indicates that row 8504 has dark spaces in bit positions 1-2 and 5-7. The dark space boundary information contained in entry 8512 indicates that row 8506 has dark spaces in bit positions 1-3 and 4-6. Bit positions 1-2 vertically overlap bit positions 1-3. Thus, these dark spaces in rows 8504 and 8506 represent at least a part of a clump. Also, bit positions 5-7 vertically overlap bit positions 4-6. Thus, these dark spaces in rows 8504 and 8506 represent at least a part of another clump. This analysis is performed for all of the entries in the table 8508. Note that it was possible to identify these clumps based on the dark space boundary information contained in table 8508.
Each of the clumps identified in step 7806 may represent a character. In step 7808, the clumps are compared to character templates. The character templates are bit patterns corresponding to characters, such as alphanumeric characters, punctuation characters, graphical characters, etc. Thus, in step 7808, the clumps are compared to character templates for the purpose of recognizing the clumps as characters.
The operation performed in step 7808 is called "template matching." Preferably, template matching is performed by finding the center of gravity of the clump being processed (each clump is processed, i.e., matched, in turn) and the center of gravity of each template (the centers of gravities of the templates are preferably calculated in advance). The center of gravity is defined as the (x,y) location where the x coordinate in this (x,y) location is equal to the average of all of the x coordinates in the dark spaces of the clump, (the terms "spaces" and "pixels" are used interchangeably herein) and the y coordinate in this (x,y) location is equal to the average of all of the y coordinates in the dark spaces of the clump. Then, the clump is aligned with a template (each template is processed in turn) such that the center of gravities of the clump and the template coincide. The number of pixels in the template and the clump having the same value is then determined. Consider, for example, the pixels at the center of gravity of the clump and the template. If they are both equal to 1, or are both equal to 0, then the sum is incremented by 1. Otherwise, the sum is not incremented. This comparison operation is performed for each pixel in the clump and the template. Then, the sum is divided by the total number of pixels in the smallest rectangle enclosing both the clump and the template. If this resulting quotient (also called score) is above a predetermined threshold, then the clump is said to match the template and is recognized as that character represented by the template. Preferably, this predetermined threshold is approximately 90%, although other values could alternatively be used, and could vary from template to template. The above analysis is performed for each template until the clump is recognized. It should be noted that not all clumps are recognized.
In one embodiment, the character templates have been previously compressed such that they are 1D compressed character templates. Such 1D compressed character templates are compared to the clumps in step 7808. Alternatively, the character templates are not compressed. Instead, the clumps are decompressed, and are then compared to the uncompressed character templates in step 7808.
In step 7809, page parsing is performed. With respect to patent documents, the present invention first locates column numbers (appearing at the top of columns in patents) in the processed image file. This is done by looking for clumps which have been recognized in the previous step as large-sized numbers in the processed image file. Then, the present invention locates the patent number which appears in large-sized numbers at the top of each page in a patent. As will be appreciated, PTO Image files include a series of line numbers (i.e., 5, 10, 15, 20, etc.) between the left and right columns on each page of text. The present invention searches for these line number sequences to identify these columns. The present invention uses this information to identify which clumps are in which columns. These clumps are assigned sequential position numbers, preferably starting from 1. Similarly, the characters in the PTO text file are assigned sequential position numbers, preferably starting from 1. As described below, these position numbers are used to compare the processed image file with the PTO text file for matching purposes.
In step 7810, lines of characters (such characters having been recognized in step 7808) are identified. Step 7810 may be performed using any well known line recognition technique. One such line recognition technique operates by processing each character in turn. If the center of a character is between the top and bottom of the previous character, then the two characters are considered to be in the same line. For reference purposes, the lines of characters recognized by the above-described operations are called the processed image file.
In step 7812, the present invention matches the PTO text file to the processed image file. The purpose of this matching operation is to identify the ends of lines, columns, and pages in the processed image file, and to then reflect such ends of lines, columns, and pages in the PTO text file to thereby generate the Equivalent File. The Equivalent File is synchronized to the Image File on a line/column/page basis.
For example, suppose the PTO text file includes the following sentence: "The present invention includes a computer platform." In step 7812, the present invention matches each word in this sentence to words in the processed image file. Suppose that the word "computer" in this sentence is presently being analyzed. The word "computer" from the PTO text file is matched with an identical word in the processed image file. If this word is at the end of a line in the processed image file, then the present invention reflects this end-of-line information in the Equivalent File. Similarly, if this word is at the end of a column or the end of a page in the processed image file, then the present invention reflects this end-of-column/end-of-page information in the Equivalent File.
In one embodiment, Step 7812 is performed as follows. First, unique pairs of adjacent characters (not counting spaces) are identified in the PTO text file. These character pairs may include overlapping characters in words. Second, a look up table having an entry for each character pair is created. The positions in the PTO text file where the character pairs are located are stored in the respective entries of the table. Third, unique pairs of adjacent characters (in the horizontal direction) in the processed image file are identified. These character pairs from the processed image file, which are also called anchor pairs, may include overlapping characters in words.
Processing then continues to map the anchor pairs to the characters in the PTO text file. An anchor pair table is created having an entry for each anchor pair. These entries include the position information from the lookup table associated with the PTO text file for the anchor pairs. Then, positions from this anchor pair table corresponding to impossible sequences of characters are eliminated.
For example, a portion of an example PTO text file 8608 is shown in FIG. 86. A processed image file 8606 corresponding to this PTO text file 8608 is also shown. Only the clumps identified in step 7808 are shown in FIG. 86. The lookup table for the PTO text file 8608 is shown as item 8610. Item 8612 represents the anchor pair table before positional information is deleted. Such positional information is deleted as follows. The first anchor pair, in this case "Th", is selected. The left-most position (in this case, the only position) of this anchor pair is position 1. The other anchor pairs are then evaluated with respect to this anchor pair "Th" to determine whether their positions (in the PTO text file) can possibly correspond to the anchor pairs. First, the anchor pair "he" is evaluated. This anchor pair occurs at positions 2, 5, and 14 in the PTO text file 8608. The anchor pair at "he" can occur only at position 2 with respect to the anchor pair "Th", since it is known that "he" is in the same word as "Th" (this information is known since "Th" is very close to "he" in the processed image file). Accordingly, positions 5, 14, and 25 are deleted. Searching is performed in both a forward and a backward direction. Consider the case where the anchor pair "ab" is selected. The anchor pair "th" can occur only at positions 4 and 13 with respect to anchor pair "ab", since anchor pair "ab" appears after anchor pair "th" in the processed image file 8606 (at least with respect to the clumps identified in the processed image file 8606). Each anchor pair is selected, and then the other anchor pairs processed with respect to the selected anchor pair, in both a backward and forward direction. After as many of the positions from the anchor pair table 8612 have been deleted, it is possible to match the processed image file 8606 to the PTO text file 8608 to identify end-of-lines in the PTO text file 8608.
Flowchart 7802 is complete after step 7812 is performed, as indicated by step 7814.
The automated pagination feature of the present invention as described above results in an Equivalent File that is synchronized on a line, column, and page basis. In alternate embodiments, the text file is automatically paginated such that the Equivalent File is only synchronized on a page basis, a column basis, a line basis, or any combination of the above.
PAGINATION CORRECTION TOOL
The pagination correction tool allows a human to check and correct the results of the initial automatic pagination process. A computer system of the type illustrated in FIG. 3 may also be used. This tool is a software program with a graphical user interface that provides the following capabilities, and completes the following steps:
Open and read into memory a PTO Text File;
Open and read into memory a previously edited Equivalent File on media;
Use a cursor control device (for example, mouse 42) to mark or unmark characters that begin a patent column.
Use a cursor control device (for example, mouse 42) to mark or urunark characters that begin lines within a patent column.
Add or remove blank lines to set the appropriate vertical line spacing so that the lines of text fall on the same line numbers of the patent text column they are in as shown in the PTO
Image File;
Use a cursor control device to mark or unmark paragraphs as being section titles.
Indicate which figures are on which drawing sheets.
As is typical in computer programs, the specified tasks listed above do not need to be performed in any particular order except that the file must be opened at the beginning and closed (and usually saved) at the end.
In an alternate embodiment, automatic pagination is not performed. Instead, pagination is entirely, manually performed by using the pagination correction tool. This embodiment is particularly useful when it is only necessary to synchronize on a page basis, or a column basis, for example. For reference purposes, such page basis synchronization, column basis synchronization, etc., are collectively called synchronization levels.
In other embodiments, the operator is provided with an option of automatic pagination or manual pagination, or any combination of the two. This embodiment is represented by a flowchart 7702 shown in FIG. 77. In step 7710, the operator may select automatic pagination or manual pagination. If the operator selects automatic pagination, then step 7712 is performed, wherein pagination is performed automatically, as discussed above. After step 7712 is performed, or if the operator did not select automatic pagination in step 7710, then step 7714 is performed. In step 7714, the operator uses the pagination correction tool to perform manual pagination.
INDEXING
A B+-tree inverted index of words is generated for a group of one or more Equivalent Files to greatly speed the process of searching the text of the files. These indexes are built from all the words in the PTO Text File. The index generator ignores end-of-line hyphens when building indexes but does not ignore hyphens in the middle of lines.
The present invention utilizes the following build/search index technique: when indices are built, all punctuation marks in a text file are stripped, and the resulting alphanumeric words are entered individually into an index database. The word position in the text file is also stored. For example, a string such as "[Ax,Bx,Cx]" is converted to three separate words-"Ax", "Bx", "Bx" and "Cx", and the individual words are entered into the index database as three separate items.
When a user enters a search string such as "[Ax,Bx,Cx]", the string is converted into the following tokens: "Ax", "Bx" and "Cx". The tokens are searched using the text conversion technique described above, and the resulting search produces three lists of search matches. These lists are processed and filtered for all occurrences. The occurrence of "Ax" is immediately followed by the occurrence of "Cx". This technique allows words originating from the source text to be searched directly, without the need to store a large number of punctuation mark locations.
USER INTERFACE
The graphic user interface of the present invention is comprised in part of a computer program which is stored in either mass memory 60, CD-ROM 62, or floppy disk 64, of the system illustrated in FIG. 3. Appropriate programming code is loaded into memory 55 by the I/O circuit 50 and executed by the CPU 52. It will be appreciated that the computer program of the present invention may also be stored in random access memory (RAM), or in other machine readable form and media. The graphic user interface displays the Equivalent File and the PTO Image File, described above in previous sections, and provides a variety of viewing and editing options.
Referring now to FIG. 11, the display screen 68 is shown in detail. Illustrated within the display 68, is a title bar 100 for identifying the title of the program in which the user interface of the present invention is utilized. In the example of FIG. 11, the title of the program is PatentWorks Workbench.TM., however, depending on the nature of the program in which the present invention is used, the title may change in accordance with the particular application. In addition, a menu bar 102 is provided which includes a plurality of command options such as "Case", "Edit", "Patent", "Note", "Library", "View", "Window", and "Help". Additionally, other context specific command options may be displayed depending on the specific application in which the present invention is used.
As illustrated in FIG. 11, a tool bar 103 is displayed immediately below the menu bar 102. The tool bar 103 comprises the primary source of options and selection items which a user of the present invention will commonly access. As will be described more fully below, the tool bar of the present invention includes a briefcase icon 106, a direction button 107 for dropping a list of available cases, a light bulb icon 108 for designating a patent to a case, and a direction button 109 for obtaining a list of all patents or other documents which may be displayed in an Equivalent File format from a case. Additionally, a library icon 110 is provided on the tool bar 103, the selection of which provides a listing of all the patents available in the patent library. A magnifying glass icon 112 is displayed for selecting a search box to appear on the display screen. A target icon 113 is provided for identifying search results. Other icons displayed along the menu bar 103 include a printer icon 115 for printing documents. A case note icon 125 for displaying case notes is also provided on the tool bar 103. A patent note icon 126 and a direction button 127 are also provided for reviewing and accessing patent notes.
The specific functions and operations of these various icons and command options displayed on the menu bar 102 and tool bar 103 will be described more fully below. It will be noted that all tool bar icons or button functions have keyboard equivalents designed to allow the user to perform the functions of the icons and button functions without using the cursor control device. All of the functions of the tool bar 103 are also displayed in the menu bar drop down menus. Additionally, as shown in FIG. 11