United States Patent5752058
Van De VanterMay 12, 1998

Title

System and method for inter-token whitespace representation and textual editing behavior in a program editor

Abstract

A user-friendly editor for structurally represented computer programs is disclosed. The present editor combines advantages of text editors and structure editors by transforming, as the user types, the text stream entered by the user into a token stream, where the tokens of the token stream constitute the words of the program being entered. Each of the tokens is classified by the editor as one of group of extended lexemes defined by the language in which the program being edited is written. These extended lexemes are defined similarly to lexemes that might be used in a batch lexer, but are more numerous as the present editor must account for the incomplete and ill-formed lexemes that arise as the user types the program. Based on information in the token stream, the editor prettyprints the program as the user types. This prettyprinting step involves typesetting each of the tokens in a visually distinct manner and displaying a varying amount of visual whitespace between the tokens, each of these operations being based on the lexical types of the tokens. The editor also allows the user to edit the program from the prettyprinted display as if the program were internally represented as text via a cursor whose position and appearance depends on the lexical types of tokens adjacent to the cursor. The present editor also allows the user to periodically request a syntactical analysis of the program, wherein the structure of the program is represented in a syntax tree that references the token stream.


Inventors:Van De Vanter; Michael L. (Mountain View, CA)
Assignee:Sun Microsystems, Inc. (Mountain View, CA)
Appl. No.:499284
Filed:July 6, 1995

Current U.S. Class:715/531 717/111 715/514 715/519 715/527 
Current International Class:G06F 9/44 (20060101)
Field of Search:395/702,703,773,776,788-793,701,781

U.S. Patent Documents
4989145January 1991Kyushima
5493678February 1996Arcuri et al.
5513305April 1996Maghbouleh
Other References
Brun et al., "The Token-Oriented Approach to Program Editing", Sigplan Notices, v. 20, n. 2, pp. 17-20, Feb. 1985. .
Kitchovitch, "Parsing Method Supporting the Process of Program Editing with On-line Syntax Checking", Problems of Engineering, Cybernetics, and Robotics, v. 40, p. 49-59, Jan. 1993. .
Shilling, "Incremental LL(1) parsing in language-based editors", IEEE Transactions on Software Engineering, v. 19, n. 9, pp. 935-940, Sep. 1992. .
Lunney et al., "Syntax-directed editing", Software Engineering Journal, v. 3, n. 2, pp. 37-46, Mar. 1988. .
Madhavji et al., "Semi-structured cursor movements in MUPE-2", Software Engineering Journal, v. 4, no. 6, pp. 309-317, Nov. 1989. .
Zelkowitz et al., "Experiences building a syntax-directed editor", Software Engineering Journal, v. 4, no. 6, pp. 294-300, Nov. 1989. .
Video: Siggraph Video Review: Magpie -Tektronix. (Item #9), Issue 19, CHI 95, Jun. 1985. .
Oman, et al., Typographic Style Is More Than Cosmetic Communications of the ACM, May 1990, vol. 33, No. 5, pp. 506-520. .
Reps. et al., The Synthesizer Generator Reference Manual (Jul. 1987) Dept. of Computer Science, Cornell University, Section 3, pp. 79-106. .
Van de Vanter, User Interaction In Language-Based Editing Systems Chapter 4 and Chapter 5, pp. 35-111 & List of References, Dec. 1992. .
Ballance, et al., Grammatical Abstraction And Incremental Syntax Analysis pp. 185-198, Sigplan '88, Jun. 1988. .
Calliss, Problems with Automatic Restructurers University of Durham, Durham, England, pp. 13-21, Nov. 1987. .
Ballance et al., The Plan Language-Based Editing System ACM Transactions on Software Engineering & Methodology, vol. 1, pp. 95-127, Jan. 1992. .
Schwartz et al., Incremental Compilation In Magpie Sigplan Notice, vol. 19, No. 6, pp. 122-131, Jun. 1984. .
Teitelbaum, et al., The Cornell Program Synthesizer: A Syntax-Directed Programming Environment, Comm. of the ACM, v. 24, n. 9, pp. 563-573, Sep. 1981. .
Baecker, et al., Human Factors and Typography For More Readable Programs Chapter 6, pp. 259-279 + List of References & Bibliography, Jan. 1987..~
Primary Examiner: Burwell; Joseph R.
Attorney, Agent or Firm:Crisman; Douglas J. Flehr Hohbach Test Albritton & Herbert LLP

Claims


What is claimed is:
1. A process for visual whitespace representation and cursor behavior in a computer program editor configured to edit a computer program represented as a stream of tokens from a display of said program as if said computer program were represented as a text flow, said process comprising the steps of:
displaying each line of said computer program as it is entered by a user, wherein a displayed line comprises a sequence of tokens and visual whitespace separating adjacent tokens, said visual whitespace being based on lexical types of a first and a second token composing said adjacent tokens and not on spaces entered by said user, said spaces not being recorded in the token stream;
displaying a cursor in said displayed line of said computer program, said cursor providing user interactability with said displayed line, said cursor having a cursor position corresponding to an insertion point in said stream lying between editable elements composing said tokens, between said tokens or at the beginning or end of said stream;
maintaining boundaries between said tokens and updating said insertion position as said user edits said line of said computer program; and
updating said cursor position, as said user edits said line of said computer program, in accordance with a set of cursor display rules as applied to ones of said tokens adjacent said cursor position such that said cursor position is updated as though said user were editing said line of said computer program with a text editor.

2. The process of claim 1 wherein said maintaining step further comprises the steps of:
internally representing unambiguous boundaries between tokens with an implicit token boundary; wherein said implicit token boundary is not represented by an editable element in said token stream;
internally representing ambiguous boundaries between said tokens with a separator, or ordinary separator, represented as an editable element in said token stream, wherein said ambiguous boundaries are between said adjacent tokens that could be legitimately joined into a single token according to the rules of the computer language in which said computer program was written; and
internally representing a potentially ambiguous boundary with a provisional separator, said potentially ambiguous boundary corresponding to a boundary between tokens where a user typed a space and where no additional separator was necessary.

3. The process of claim 2 wherein:
said insertion point is classified as having one of a multiplicity of predefined distinct insertion point contexts; and
said insertion point updating step includes executing one of a plurality of distinct insertion point update methods selected in accordance with said classified insertion point context and said user editing actions.

4. The process of claim 3 wherein said multiplicity of predefined contexts comprises six general categories: "1"--between said editable elements comprising one of said tokens, "2"--between said adjacent tokens sharing said implicit token boundary, "3"--between said adjacent tokens and to the left of said separator, "4"--between said adjacent tokens and to the right of said separator, "5"--between said adjacent tokens, with said provisional separator to the left of said insertion point but no separator to the right of said insertion point and "6"--between tokens, with said provisional separator to the left of said insertion point and said separator to the right of said insertion point.

5. The process of claim 4 wherein said editing actions comprise:
editable element deletion, character insertion, space insertion, editing of string literals, and navigating over said editable elements comprising said stream.

6. The process of claim 5, wherein said editing action is said character insertion, said step of insertion point updating and token boundary maintaining further comprising the steps of:
when said insertion point context is selected from said insertion point contexts 1, 2, 4, or 5, performing said insertion, re-analyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1, 2 or 3 immediately following said character just inserted based on said re-analyzing of said token stream following said insertion operation; and
when said insertion point context is selected from said insertion point contexts 3 or 6, performing said insertion, re-analyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 2 or 3 immediately following said character just inserted based on said re-analyzing of said token stream following said insertion operation.

7. The process of claim 5, wherein said editing action is said editable element deletion, said step of insertion point updating and token boundary maintaining further comprising the steps of:
when said insertion point context is said insertion point context 1, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1, 2
or 3 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation;
when said insertion point context is said insertion point context 2, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 2 or
3 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation;
when said insertion point context is said insertion point context 3, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1 or
4 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation;
when said insertion point context is said insertion point context 4 or 5, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts
2 or 4 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation; and
when said insertion point context is said insertion point context 6, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1 or
4 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation.

8. The process of claim 5, wherein said editing action is said space insertion, said step of insertion point updating and token boundary maintaining further comprising the steps of:
when said insertion point context is said insertion point context 1, splitting said token between said editable elements to form two tokens, reanalyzing said resulting token stream and placing said insertion point in said insertion point context
4;
when said insertion point context is said insertion point context 2, inserting a provisional separator between said tokens and placing said insertion point in said insertion point context 4;
when said insertion point context is said insertion point context 3, moving said insertion point over said separator and placing said insertion point in said insertion point context 4, to the right of said separator;
when said insertion point context is selected from said insertion point contexts 4, 5 or 6, not changing said cursor position or said token stream.

9. The process of claim 5 wherein said editing action is said editing of string literals, said step of insertion point updating and token boundary maintaining further comprising the steps of:
detecting string delimiting characters indicating boundaries of said string literals;
matching said string delimiting characters, wherein said matching involves identifying pairs of said string delimiting characters, wherein said pair includes a string beginning delimiter and a string ending delimiter;
forming a token representing all characters, wherein said characters includes spaces, inserted by said user between said string beginning delimiter and said string ending delimiter; and
when said tokens exist between said delimiters, including said tokens in said string as editable elements of said string and deleting said included tokens from said token stream.

10. The process of claim 5 wherein said editing action is navigating over said editable elements of said token stream, said step of insertion point updating comprising moving said cursor over said editable elements in response to left cursor and right cursor keystrokes typed by said user and mouseclicks.

11. The process of claim 2, wherein said separators do not exist between said adjacent tokens in said token stream, but instead are inferable from an approximate separator table defining whether said separator exists between adjacent separators having a range of extended lexical types, said separator table being conservative, meaning said table prescribes separators between first and second adjacent tokens when tokens sharing said extended lexical types of said first and second tokens could legitimately be joined according to said computer language rules in at least some situations.

12. The process of claim 2, wherein said separators are not inserted between said adjacent tokens in said token stream, but instead are inferable from an exact separator table defining whether said separator exists between said adjacent tokens based on individual and lexical attributes of said adjacent tokens.

13. A system for visual whitespace representation and cursor behavior in a computer program editor, said system allowing editing of a computer program represented by said editor as a stream of tokens from a display of said program as if said computer program were represented as a text flow, said system comprising:
an input device configured to receive editing actions and to output an event stream of input events corresponding to said editing actions;
a typographical display processor configured to display each line of said computer program as it is entered, wherein a displayed line comprises a sequence of tokens and visual whitespace separating adjacent tokens, said visual whitespace generated by said typographical processor based on lexical types of a first and a second token composing said adjacent tokens and not on spaces entered by said user, said spaces not being recorded in the token stream; and
a keystroke executive configured to receive said input events and to maintain said token stream, including boundaries between said tokens and an insertion point in said token stream, upon receiving said editing actions to said displayed line made at a cursor position defined by a cursor, said cursor position being determined by said typographical processor and corresponding to said insertion point, wherein said insertion point has a position in said stream selected from between editable elements composing said tokens, at token boundaries between said tokens, or at the beginning or end of said stream, said editing actions being indicated to said keystroke executive by corresponding input events;
said keystroke executive being configured to update said insertion point position and said token stream in accordance with a set of token stream updating rules and said input event.

14. The system of claim 13 wherein said keystroke executive maintains said token stream by:
internally representing unambiguous boundaries between said tokens with an implicit token boundary; wherein said implicit token boundary is not represented by an editable element in said token stream;
internally representing ambiguous boundaries between said tokens with a separator, or ordinary separator, represented as an editable element in said token stream, wherein said ambiguous boundaries are between said adjacent tokens that could be legitimately joined into a single token according to the rules of the computer language in which said computer program was written; and
internally representing a potentially ambiguous boundary with a provisional separator, said potentially ambiguous boundary corresponding to a boundary between tokens where a user typed a space and where no additional separator was necessary.

15. The system of claim 14 wherein:
said insertion point is classified as having one of a multiplicity of predefined distinct insertion point contexts; and
said keystroke executive updates said token stream according to one of a plurality of said token stream updating rules selected in accordance with said classified insertion point context and said user editing actions.

16. The system of claim 15 wherein said multiplicity of predefined contexts comprises six general categories: "1"--between said editable elements comprising one of said tokens, "2"--between said adjacent tokens sharing said implicit token boundary, "3"--between said adjacent tokens and to the left of said separator, "4"--between said adjacent tokens and to the right of said separator, "5"--between said adjacent tokens, with said provisional separator to the left of said insertion point but no separator to the right of said insertion point and "6"--between tokens, with a said provisional separator to the left of said insertion point and no separator to the right of said insertion point.

17. The system of claim 16 wherein said editing actions comprise: (1) editable element deletion, (2) character insertion, (3) space insertion; (4) editing of string literals; and (5) navigating over said editable elements comprising said stream.

18. The system of claim 17, wherein, when said editing action is said character insertion, said keystroke executive updates said token stream by:
when said insertion point context is selected from said insertion point contexts 1, 2, 4, or 5, perform said insertion, re-analyzing said token stream according to said token stream updating rules, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1, 2 or 3 immediately following said character just inserted based on said re-analyzing of said token stream following said insertion operation; and
when said insertion point context is selected from said insertion point contexts 3 or 6, performing said insertion, re-analyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 2 or 3 immediately following said character just inserted based on said re-analyzing of said token stream following said insertion operation.

19. The system of claim 17, wherein, when said editing action is said editable element deletion, said keystroke executive updates said token stream by:
when said insertion point context is said insertion point context 1, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1, 2
or 3 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation;
when said insertion point context is said insertion point context 2, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1, 2
or 3 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation;
when said insertion point context is said insertion point context 3, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1 or
4 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation;
when said insertion point context is selected from said insertion point contexts 4 or 5, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 2 or 4 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation; and
when said insertion point context is said insertion point context 6, performing said deletion, reanalyzing said token stream, and placing said insertion point and said cursor in a post-analysis position selected from insertion point contexts 1 or
4 with respect to said editable elements adjacent to deleted editable element based on said reanalysis of said token stream following said deletion operation.

20. The system of claim 17, wherein, when said editing action is said space insertion, said keystroke executive updates said token stream by:
when said insertion point context is said insertion point context 1, splitting said token between said editable elements to form two tokens, reanalyzing said resulting token stream and placing said insertion point in said insertion point context
4;
when said insertion point context is said insertion point context 2, inserting a provisional separator between said tokens and placing said insertion point in said insertion point context 4;
when said insertion point context is said insertion point context 3, moving said insertion point over said separator and placing said insertion point in said insertion point context 4, to the right of said separator;
when said insertion point context is selected from said insertion point contexts 4, 5 or 6, not changing said cursor position or said token stream.

21. The system of claim 17 wherein, when said editing action is said editing of string literals, said keystroke executive updates said token stream by:
detecting string delimiting characters indicating boundaries of said string literals;
matching said string delimiting characters, wherein said matching involves identifying pairs of said string delimiting characters, wherein said pair includes a string beginning delimiter and a string ending delimiter;
forming a token representing all characters, wherein said characters includes spaces, inserted by said user between said string beginning delimiter and said string ending delimiter; and
when said tokens exist between said delimiters, including said tokens in said string as editable elements of said string and deleting said included tokens from said token stream.

22. The system of claim 17 wherein, when said editing action is navigating over said editable elements of said token stream, said keystroke executive moves said cursor over said editable elements in response to left cursor and right cursor keystrokes typed by said user on said input device.

23. The system of claim 14, further comprising an approximate separator table accessible by said keystroke executive and said typographical display processor; wherein said separators do not exist between said adjacent tokens in said token stream, but instead are inferable from said approximate separator table defining whether said separator exists between adjacent separators having a range of extended lexical types, said separator table being conservative, meaning said table prescribes separators between first and second adjacent tokens when tokens sharing said extended lexical types of said first and second tokens could legitimately be joined according to said computer language rules in at least some situations.

24. The system of claim 14, further comprising an exact separator table accessible by said keystroke executive and said typographical display processor; wherein said separators are not inserted between said adjacent tokens in said token stream, but instead are inferable from an exact separator table defining whether said separator exists between said adjacent tokens based on individual and lexical attributes of said adjacent tokens.

Description



The present invention relates generally to computer program editors, and particularly to systems and methods for providing natural, text-like, editing behavior in an editor wherein programs are internally represented as streams of tokens.

BACKGROUND OF THE INVENTION

Computer program editors are specialized editors that allow computer program authors, or programmers, to create and modify computer programs. In addition to supporting basic word processing functions such as block copy/move and string find/replace, most computer program editors provide additional editing features that are especially valuable to the programmer. Such additional features include highlighting important structural constructs (e.g., condition statements, keywords, brackets, etc.) and providing language-sensitive commands for navigating, modifying and reformatting or "prettyprinting" programs. While all existing program editors try to provide similar functions, they fall into one of two classes based on how they internally represent the program being edited: as streams of text (text-oriented editors) or as syntax trees (structure-oriented editors).

The most common design approach for program editors is to begin with a text editor and add the aforementioned useful features based on linguistic (or structural) knowledge of the program being entered. This type of editor internally represents the program being edited as a "flat", text file, in which the editable characters predominantly have a one-to-one correspondence to keystrokes made by the program author. In such a text editor, the display of the program also has a one-to-one correspondence to the user's keystrokes. Thus, if the user types a carriage return to break a line of text or begin a new line, a corresponding carriage return/line feed is entered into the text file and displayed on the monitor. Similarly, if a user of a text-based editor hits the spacebar twice in a row, the editor enters two editable spaces into the text file, which are both displayed on the editing screen. This is the approach taken by GNU Emacs, which provides a multitude of "modes" that specialize the behavior of Emacs for programs entered in various programming languages. This is also the approach taken by "vi" another common editor, which provides fewer language-specialized services than Emacs. Finally, most PC-based program editors use this text-based approach.

The chief advantage of text-oriented editors is that their use is familiar. That is, they provide all users with free access to the conventional text editing functionality to which they are accustomed; this gives users great flexibility and minimizes learning costs. However, most text editors do not support full linguistic analysis and therefore have only incomplete and unreliable information about the linguistic structure of programs being edited. For example, many language-based services in Emacs discover structure by matching regular expressions, which by definition cannot capture full linguistic information. These weak-analysis approaches have very little to offer in the way of error diagnosis. Moreover, text editors are not up to the task of providing enough structure to support robust, on-the-fly prettyprinting of the program being entered.

More aggressive text-oriented program editors have been proposed that maintain both full textual and full tree representations of the program being entered. See, for example, the inventor's doctoral dissertation, "User Interaction in Language-Based Editing Systems," UCB/CSD-93-726, Ph.D. Dissertation, Computer Science Division, EECS, University of California, Berkeley, December 1992. This thesis sets out a research system in this category, but there are no known commercial versions. Such aggressive text-editors exact a very high engineering overhead because of the need to build a mapping between related parts of the two representations (i.e., which part of the text corresponds to which part of the structure tree) and to maintain that mapping in the presence of changes as the user types. More importantly, these systems share the fault with structure editors that they provide no useful language-oriented services for those parts of the program that are in the midst of being composed or edited (newly typed text must be analyzed before the system knows anything about it) and those services are of very little use in the presence of syntax errors.

An alternative approach has been explored by a series of research systems under the general category of "structure editors". Two principles are central to this approach: (1) programs are represented internally as syntax trees, and (2) the user's primary mode of interaction with the program is assumed to be in terms of that underlying structure. See, for example, Tim Teitelbaum and Thomas Reps, "The Cornell Program Synthesizer: A Syntax-Directed Programming Environment, Communication of the ACM
24, 9 (September 1981), 563-573 for an early research statement. The commercial descendent of that system is described in: Thomas Reps and Tim Teitelbaum, The Synthesizer Generator Reference Manual, Springer Verlag, Berlin, 1989, Third edition. All practical systems of this sort (for programs) are actually hybrids, meaning that they permit ordinary textual editing under some circumstances. Even so, those circumstances are still expressed in terms of the underlying structure. For example, the user might select a statement structurally, implicitly asking that the structure be treated temporarily as text, and then edit the selected text. When the user thinks editing is complete, the system converts the revised text back into structure if it can. The advantage of this approach is that the editor has access to complete, reliable linguistic information in order to drive services such as prettyprinting.

Structure editors have several failings. First, they restrict the freedom to edit textually that users expect; experience shows that this has severely limited the acceptance of these editors. Second, they provide no useful language- oriented services for those parts of the program being edited textually; those services cannot be provided until the user has eliminated all syntax errors in parts being edited and the system has analyzed the text in order to restore the tree structure. Thus, while the user is editing the editor offers no help at all in that region. Finally, structure-editors typically provide very little support in the presence of syntax errors, since most of the language-based services they provide require a well-formed syntax tree in order to operate. Of course, while the user is editing textually, the program is syntactically ill-formed most of the time; it is therefore next to impossible to maintain a well-formed syntax tree of the pro gram as it is being entered.

Thus, there is a need for a program editor that provides the advantages of both types of program editors: the full freedom of textual editing with first class, reliable structural information available all of the time (not just for parts of the program and not just when no errors are present). More specifically there is a need for a program editor that provides a single, non-textual, internal program representation for most services (e.g., language support, prettyprinting, etc.), which is also suitable for programs in progress; i.e., programs which are syntactically ill-formed and contain program words that are either lexically ill-formed or incomplete versions of legal lexemes. This internal program representation should represent the words of the program, their absolute positions, and their extended lexical type, including whether a word is an ill-formed or incomplete lexeme.

It would be desirable for such an editor to be capable of maintaining this internal representation on the fly, as the user types, even in the presence of the inevitable program syntax errors. This editor should also be able to prettyprint the program being entered based on the aforementioned internal representation of the program, as the user types, where prettyprinting involves (1) typesetting each token based on the aforementioned structural information, including whether a particular program is ill-formed or illegal and (2) displaying varying amounts of whitespace between program words determined, not on spaces entered by the user, but by aesthetic and ergonomic rules based on the particular context of the displayed program words adjacent to the whitespace.

A similar goal has been pursued in the realm of natural language, WYSIWYG (What You See Is What You Get) processors, which typeset the document being edited on the fly, as the user types. However, many of these systems simply treat inter-word space in the conventional way, as "whitespace characters" that the user may enter and manipulate exactly as any other characters. Other systems offer an abstraction of this behavior (e.g., the "smart spaces" option in FrameMaker) by allowing a user to enter no more than one "space" between words; in this case, the space isn't a true character (and in fact the visual whitespace between words may vary as typesetting is performed) , but behaves conceptually as a "word separator". However, this latter approach is not applicable to program editing, where the lexical classes of the program "words" dictate that displayed whitespace has a variety of internal representations depending on the editing context. Thus, it would also be desirable for a program editor adopting such an approach to let the user edit the non-textually represented program as if it were text via a cursor whose behavior is similar to that found in a text editor but which also conveys the editing context of the cursor, the program words, and whitespace adjacent to the cursor.

Finally, this editor should provide an optional, full program structural analysis capability (or parser) that adds useful program interpretation information that cannot be gleaned from the internal representation alone. This information should represent the syntax of the program being entered; however, as this full program analysis capability could not generate useful information in the presence of syntax errors, it should not be allowed to interfere with the user's editing, and should only be invoked when desired by the user.

SUMMARY OF THE INVENTION

In summary, the present invention is a computer program editor and editing method that meets the needs set out above. Specifically, the present invention is an editing process for inter-token whitespace representation and textual editing behavior in a computer program editor that allows a user to edit a computer program represented as a stream of tokens from a display of the program as if the user were editing a computer program represented as a text flow. This editing process includes the following steps.

A computer program editor using the editing method of the present invention typographically displays the computer program as it is entered or edited by the user. Each line of the displayed program includes a sequence of displayed tokens and visual whitespace separating adjacent displayed tokens, where the amount of visual whitespace is based on the lexical types of the adjacent tokens. In another step of the present invention, the user is allowed to edit the computer program by interacting with the program's display using a cursor positioned in the sequence of tokens and visual whitespace, where the cursor's position in this sequence corresponds to an insertion point which defines the actual place in the token steam at which the user's editing actions are implemented. Generally, this insertion point can lie between editable elements composing the tokens, between the tokens, or at the beginning or end of the token stream. As the user edits the computer program, the editing process also maintains boundaries between the tokens and updates the insertion point and cursor position to reflect the results of each editing action taken by the user.

An advantage of the present invention is that it allows a user to edit structurally-represented programs as if they were text; thus, the present editing process includes several steps necessary to emulate text-like editing behavior. For example, in the maintaining step referred to above, the boundaries between adjacent tokens are internally represented in one of three different ways: (1) implicitly (when the adjacent tokens have an unambiguous boundary); (2) with a separator, which acts like an editable element of the token stream between adjacent tokens having an ambiguous textual boundary ("ambiguous" meaning that, in a textual representation of the program, any visual space between adjacent words corresponding to the tokens would be necessary); and (3) with a provisional separator, which corresponds to a boundary between tokens where a user typed a space where no additional separator was necessary.

Based upon these three different types of token boundaries, the present editor classifies the insertion point as one of a multiplicity of predefined insertion point contexts, including: "1"--between editable elements composing one of the tokens, "2"--between adjacent tokens sharing an implicit token boundary, "3"--between adjacent tokens and to the left of a separator, "4"--between adjacent tokens and to the right of a separator, "5"--between adjacent tokens, with a provisional separator to the left of the insertion point but no separator to the right of the insertion point, and "6"--between tokens, with a provisional separator to the left of the insertion point and a separator to the right of the insertion point. The editing process also executes one of a plurality of distinct insertion point updating methods that emulate text-like editing behavior by accounting for the effect of different user editing actions taken at each one of the different insertion point contexts. These different editing actions include editable element deletion, character insertion, space insertion, editing of string literals and navigating over editable elements.

The present invention is also a system for visual whitespace representation and cursor behavior in a computer program editor that allows a user to edit a computer program internally represented as a stream of tokens from a display of the program as if the user were editing a computer program represented as a text flow. This system includes at least one input device from which the user edits the computer program, a typographical display processor that displays each line of the computer program as it is entered by the user, and a keystroke executive that receives each input event from the input devices and maintains the token stream in response to each input event as follows.

As the user edits and/or enters the program from the input device, the input device outputs an event stream of input events, including keystrokes, to the keystroke executive. After each input event, the keystroke executive updates the token stream, including boundaries between the tokens and an insertion point in the token stream, as the user edits the computer program by making editing actions at a cursor whose position corresponds to the insertion point's position in the token stream. This insertion point position can be selected from between editable elements composing the tokens, at token boundaries between the tokens, or at the beginning or end of the token stream. As the keystroke executive updates the insertion point and token stream, the typographical display processor displays each line of the computer program being edited by the user. Each line of the displayed program comprises a sequence of tokens and visual whitespace separating adjacent tokens, where the amount of visual whitespace generated by the typographical display processor is based on the lexical types of the adjacent tokens. The keystroke executive updates the insertion point position and the token stream in accordance with a set of token stream updating rules and one input event so that the cursor, the displayed tokens and the visual whitespace behave as though the user were editing the computer program with a text editor.

BRIEF DESCRIPTION OF THE DRAWINGS

Additional objects and features of the invention will be more readily apparent from the following detailed description and appended claims when taken in conjunction with the drawings, in which:

FIG. 1 is a block diagram of a computer system incorporating the computer program editor of the present invention.

FIG. 2 is a block diagram of a preferred embodiment of the computer program editor of the present invention.

FIG. 3 is a block diagram of the keystroke executive of FIG. 2 and the data structures with which it interacts.

FIG. 4 is a block diagram of the comment processor of FIG. 2 and the data structures with which it interacts.

FIG. 5 is a block diagram of the typographical display of FIG. 2 and the data structures with which it interacts.

DESCRIPTION OF THE PREFERRED EMBODIMENT

Referring to FIG. 1, there is shown a block diagram of a computer system 100 incorporating an editor 122 from which a user can edit a structurally-represented program 124 as if it were represented as a stream of characters. The system 100 has a processor 110 coupled to a primary memory 114 and a secondary memory 116 via a local bus 104. The system also includes at least one input device 112, such as keyboard and/or a mouse, and a display 118 coupled to the processor via a peripheral bus 102. The operation of these elements is set out below, following a glossary defining terms used in the description of the present invention.

Glossary:

Anchor

An editable element that is the visible manifestation of an annotation's anchor point.

Anchor Point

A location in a flow associated with an annotation, expressed as a position between two tokens and a rank in the ordering of annotations present in that position. Annotations may be displayed visually at their anchor points (in-line annotations) or elsewhere (out-of-line annotation).

Annotation

A discrete body of information that is part of a quantum by virtue of having an anchor point in the quantum's flow. Annotations are intended to contain meta-information, information that is about source code (for use by people and tools) but which is not part of the code according to language rules. Users specify which annotations are visible in any given view.

Bracketing token

A token that, according to language rules, must be accompanied by a matching counterpart, the two of which together enclose intervening contents in a strictly nested fashion. Examples include parentheses, curly braces, and square brackets.

Display Option

A choice about how the contents of an annotation should be displayed in a view, possibly including, but not limited to the following: invisible (even though active in the view), as a glyph or other holophrast, in the preferred natural mode of the annotation's contents (text, graphics, spreadsheet), and animation.

Editable Element

A member of a flow that appears to be editable by ordinary text-oriented interaction. For example, the insertion point moves over an editable element in a discrete jump. The user may select and delete the visible manifestation of an editable element. Editable elements include the characters that comprise tokens, as well as separators, in-line annotations, anchors and line breaking information.

Flow

A sequence of tokens that, together with its annotations, comprises a quantum.

Footnote

An annotation whose contents are displayed at the bottom of a view or in a specially related visual context, instead of at the anchor point itself. A special mark usually appears at the anchor point of a footnote, as well as with the content, in order to relate them visually.

In-line annotation

An annotation whose contents are displayed at its anchor point (between adjacent tokens or other annotations) in the flow.

Insertion Point

A distinguished position in a view at which certain textual editing operations are intended to take effect, expressed as a position between two editable elements (characters, anchors, separators, and the like) and displayed unambiguously.

Language-based services

Useful mechanisms available to the user that exploit an internal linguistic model of each quantum maintained by the Editor.

Layout Option

A choice about where an annotation should be displayed with respect to its anchor point: one of in-line, marginalia, and footnote.

Marginalia

An annotation whose contents are displayed at the extreme left or right of a view, usually near the same line as its anchor point, instead of at the anchor point itself.

New style

The distinguished style used as a context for new tokens, in place of the syntactic context otherwise computed during an update.

New token

A token that has been entered into or modified in a quantum since the most recent update.

Out-of-line annotations

An annotation whose contents are not displayed visually at its anchor point in a flow. Examples include footnotes and marginalia.

Provisional separator

An ephemeral separator that may appear to the left of the insertion point immediately after the user presses the space bar, in a context where a separator would otherwise be unnecessary and therefore would not appear.

Quantum

A chunk of source code in a programming language, consisting of a flow of tokens and associated annotations. Editor interaction with the contents of a quantum is generally independent of all other quanta; for example there is no token conceptually before the first token of a quantum. Both files and programs are quantums.

Separator

An editable element that acts as a logical "space" in boundaries between tokens and other editable elements. Deletion of a separator directs the Editor to join the two surrounding tokens. Separators normally appear only where needed. In a boundary between language tokens, this means only where the textual concatenation of tokens would form a different token. In more general boundaries, this means when the two adjacent editable elements could meaningfully join or merge into a new object. Cf. token gap.

Style

A named set of visual display attributes that can be applied to tokens or groups of tokens. Style attributes include typeface, size, slant, weight, color, and background color.

Syntactic context

Relationships, defined by language rules, between a token and the tokens that surround it, for example as part of a procedure header, as the defining occurrence of a name, or as an argument name. The syntactic context of a new token can only be determined by an update.

Token

The basic lexical unit or "word" in a programming language: these include identifiers (such as "i"), keywords (such as "int"), punctuation marks (such as ","), and operators (such as "++"). Tokens are the basic unit for some Editor operations. Each character of a token is normally an editable element.

Token Gap

Visual whitespace that horizontally separates the display of adjacent tokens. A token gap is not an editable element, although it may appear to be if there is a separator at the token boundary.

Update

Analysis performed by the Editor in order to improve its internal linguistic model of the source code in a quantum, thereby improving the quality of its language-based services.

View

The visual display of a quantum, perhaps one of many created by the Editor, through which the user may change the quantum by editing (which changes become immediately apparent in all views on the quantum) or may invoke other language-based services.

View Style

A collection of rules, based on language structure and styles, that jointly specify the visual appearance of source code displayed by the Editor.

The system 100 is controlled by operating system software 120 executing in the processor 110. Other executable application software, including the editor of the present invention 122, are stored in the secondary memory 116, as are data files such as the program to be edited 124. As shown in FIG. 1, the editor 122 includes a keystroke executive 130, a structural analyzer 150 and a typographical display facility 170, which are described in depth below. The editor 122 also includes other analyzers, such as a semantic analyzer 175. As the program 124 is edited, the editor constantly updates the internal program representation 156, which is maintained in the primary memory 114. This internal representation 156 has four principal components: a token stream 158 and an annotations list 160, which together compose a program quantum; an optional syntax tree 162, including associated static semantic information; and an insertion point 157, all of which are described below. Many behaviors of the editor 122 are determined by a customizable editing database 164, also resident in the primary memory 114 during editing, which includes user view preferences 166 and lexical/syntactical rules 168 for the language in which the program
124 is written.

The key distinction between the keystroke executive 130, which performs lexical analysis by applying the lexical rules 168 to the token stream 158, and the structural analyzer 150, which performs syntax analysis on the token stream 158 by applying the syntactical rules 168, is that lexical analysis is local, whereas syntax analysis is global. These two analysis processes correspond to the lexical and syntactical analysis phases of a compiler. To analogize to the analysis of a document written in English, lexical analysis involves breaking down the document into its constituent words and punctuation. On the other hand, syntax analysis involves evaluating the document's sentence structure. Thus, an English syntax analyzer would be able to indicate that the adjective blue is incorrectly placed in the following sentence: "The boat blue is fast." Of course, this sentence would be declared legal if it were translated into French and evaluated by a French syntax analyzer, as adjectives follow nouns in French. Similar roles are played in the program editor 122 by the keystroke executive 130 and the structural analyzer 150. That is, the keystroke executive 130 classifies the program's words according to the parts of speech, or lexemes, of the respective computer language in which the program is written by analyzing only small chunks of code. The structural analyzer 150 then evaluates, based on the lexical properties of the program words, whether a line of code, series of nested if statements, function, or other large scale program structure is correctly formed according to the syntax of the program language. In a significant departure from compilers or structure editors, which may perform either or both of these functions, but always together, in the present editor 122, lexical analysis is independently useful apart from its role as a front end to structural analysis.

When a user invokes the editor 122 against a program to be edited 124, the operating system 120 loads the editor 122 and the program 124 into the primary memory 114 and begins executing the instructions of the editor 122 in the processor 110, at which point the user can edit the program 124. The program 124 could be an existing program or a new program, although that fact has little bearing on the operation of the editor 122, which is more fully described below in reference to FIG. 2.

Referring to FIG. 2, there is shown a block diagram illustrating additional details of and functional relationships between the input device 112, editing routines 130, 150, 170, the program representation 156, the editing database 164, and the display 118 while the program 124 is being edited. Additional editor 122 elements not shown in FIG. 1 include the tokenizer 132 and comment processor 136, which are functional blocks within the keystroke executive 130, and the error messages 160a and comment list 160b, which are objects within the annotation list 160.

Moreover, in FIG. 2, the lexical/syntax rules 168 from FIG. 1 have been subdivided into lexical rules 168a and syntax rules 168b. The user edits the program 124 by interacting, through the input device 112, with the displayed version 118 of the program being edited. When the input device 11 2 includes a keyboard, these interactions could include keystrokes directing the editor 122 to move the cursor (when the user uses the cursor control keys), insert a space at the cursor (when the user hits the spacebar), delete the editable element to the left or right of the cursor, insert a character at the cursor (when the user types a character), or break the current line (when the user hits the enter key). When the input device 11 2 also includes a mouse, these interactions could also include various types of mouse clicks. The input device relays all of these interactions, or input events, to the keystroke executive 130 as an event stream 113.

The keystroke executive 130 is the key component of the editor 122 and is responsible for receiving every input event from the input device 112; updating, based on the event stream 113, the token stream representation 158, and managing the insertion point 157, which defines the position of the cursor with which the user performs all basic editing operations. Consequently, the keystroke executive 130 has read/write access to the insertion point 157 and the token stream 158, and is coupled to the input device 112, from which it receives the event stream 113. The tokenizer 132 is a subfunction of the keystroke executive and executes only when called by the keystroke executive 130, which, in the case of an insert character input event, passes the tokenizer 132 the newest character of the event stream 11 3, a subset, or token stream segment 138, of the token stream 158 and the position of the character being inserted relative to the token stream segment 138. In addition to the inputs it receives from the keystroke executive 130, the tokenizer 132 has read access to the lexical rules 168a; after executing, the tokenizer 132 returns its result to the keystroke executive 130. The comment processor 136 is another subfunction of the keystroke executive 130, which passes the comment processor 136 all input events 113 related to the editing or entering of program comments. Based on these comment-related input events, the comment processor creates and maintains comment objects, which it stores in the comment list 160b.

At the same hierarchical level of the editor 122 as the keystroke executive 130 is the optional structural analyzer 150, which has read access to the syntax rules 168b and the token stream 158 and write access to the error messages list 160a, to which it outputs syntax error messages. More critically, the structural analyzer 150 is coupled to the syntax tree 162, which it creates, updates and maintains based on the token stream 158 and the syntax rules 168. Unlike the keystroke executive 130, the structural analyzer 150 is not called after ever keystroke of the input device 1 1 2, but when its intervention is requested by the user or when there is some other opportunity.

The structural analyzer 150 is optional because the syntax tree 162 it maintains is not required for the program 124 to be edited or displayed in the editor 122. Both of these functions can be adequately performed by the editor 1 22 solely on the basis of information in the token stream representation. Instead, the syntax tree 162 provided by the structural analyzer 150 is only necessary for fine tuning the prettyprinted display of the program being edited 124 and for ancillary editing services, none of which are essential functions of the editor 122. For the purposes of this document, the term "syntactical analysis" encompass all or part of the syntax and static semantic analysis performed by a compiler in, respectively, its second and third passes over a source code file (the first pass being lexical analysis). As described above, in the art of compilers, syntax analysis determines whether the tokens generated by during the lexical analysis pass are in legal positions according to the syntax rules of the programming language. Static semantic analysis takes an even wider view of the program being compiled. For example, static semantic analysis generates type information for variables used in statements already syntactically analyzed by referring to variable declarations not cognizable by the parser.

The typographical display processor 170 has read access to the data structures 157, 158, 160, 162 composing the program representation 156 and the view preferences 166, which are contained in the editing database 164. Based on the user's view preferences 166, which define the type of editing information the user wishes to be displayed, the typographical display processor 170 typesets or prettyprints the information in the program representation 156 to the display 118, to which the display processor's output is coupled. The term "prettyprinting" means formatting the display of a program or other document, using techniques similar to those used by typesetting programs, to generate a well-formed and aesthetically pleasing image.

While all of these elements are important to the operation of the editor 122, the key improvements made by the editor 122 lie in (1) the form of the internal program representation 156 and (2) the operation of the keystroke executive 130. These aspects of the editor 122 are now described.

As mentioned above, as the program 124 is entered or edited, the keystroke executive 130 receives the event stream 113 and accordingly updates the token-stream representation 156 of the program 124 after each user keystroke or mouse click. Each of the tokens in the token stream represents a "word" of the program 124, where "word" means a symbol or group of symbols that can be classified as any of the basic building blocks, or lexical classes, of a computer language such as strings (e.g., "hello, world"), arithmetic operators (e.g., +, -, /, =), keywords (e.g., char, struct, float) or floating point constants (e.g., 2.0 E+5).

Thus, as the program 124 is being entered, the keystroke executive 130 is able to create useful program information that would not be available from a text processor, including (1) the tokens or "words" of the program 124 and (2) the lexical types of the tokens (e.g., float constant, string, variable, keyword). Moreover, because the keystroke executive 130 also controls the insertion point 157 of the cursor, with which the user edits the program 124, the keystroke executive 130 is also able to provide on the fly the position of each token within the token stream. Thus, as the user types, the keystroke executive 130 creates a representation of the program 156 in which each word of the program is lexically classified, tokenized, and associated with a unique position in the token stream 158. This is an ideal representation for a program editor as (1) it can be maintained on the fly as the user types, (2) it provides enough reliable, lexical information that the program can be prettyprinted or typeset as it is entered by the user, and (3) it allows other language-based services to be implemented even when syntax analysis is not successful. This representation is also the ideal compromise between a flexible, but informationally-empty, text representation, and a rigid, but informationally-rich purely-structural representation. Note however that not all information entered by the user or generated by the editor 122 regarding the program 124 is stored in the token stream 158. Other information, such as program comments, which are not actual program statements, are stored in the annotation list 160, as are the syntax error messages generated by the structural analyzer 150 as it parses the program 124. This type of ancillary information is stored apart from the token stream 158 primarily for efficiency reasons; however, because all of the annotations are indexed to appropriate anchor points within the token stream 158, it is always possible to re-associate the annotations and the tokens, which collectively are referred to as a quantum.

In rare editing situations, the keystroke executive 130 modifies the token stream 158 using language-independent rules. For example, if a user deletes the single character of a single character token, the keystroke executive 130 just deletes the token. This requires no language-specific knowledge. However, for most other editing situations, which demand specialized lexical knowledge, the keystroke executive 130 must seek advice from the tokenizer 132, which is an expert in the incremental, lexical analysis of program code, as to how the token stream 158 should be updated in light of new input events 113 typed at the current insertion point 157. In this querying process, the keystroke executive 130 passes the tokenizer 132 a few tokens 138
from the token stream 158 that are adjacent to the current insertion point 157 and a particular request to evaluate an insertion or deletion in light of the token stream segment 138. The tokenizer 132 then suggests an update 138' to the token stream 158
by interpreting the few tokens in light of the lexical rules 168a provided in the editing database 164. These lexical rules 168a are similar to conventional lexical rules that might be used in a batch-mode lexical analyzer, such as "Lex", but are extended to account for illegal and incomplete lexemes that arise when the program 124 is lexically analyzed as it is being entered.

Anytime the keystroke executive 130 determines that the user is entering a new comment or editing an existing comment, it calls the comment processor 136, which has expertise in maintaining and updating the comment list 160b. While the comment processor 136 is processing a comment, the keystroke executive 130 passes all input events in the event stream 113 to the comment processor 136, which simultaneously updates the appropriate comment in the comment list 160b as if it were a simple text editor (i.e., the comment processor 136 performs no lexical analysis on the input events). When the keystroke executive 130 detects a new comment trigger, such as a comment beginning delimiter, the comment processor 136 allocates a new comment record in the comment list 160b.

Based on the token stream 158 and annotations 160, the typographical display processor 170 typesets, or prettyprints, the program 124 to the display 118 as the user types. In addition to prettyprinting the program 124, the editor 122 can provide a wide range of program editing services, such as language-sensitive support and highlighting, which are also based on the token stream representation 158 of the program 124. Moreover, even though the program is internally represented as a token stream
158, the editor 122 allows the user to interact with the program being edited as if it were internally represented as text; i.e., the behavior of the editor 122 is practically indistinguishable from a word processor. The keystroke executive 130 and typographical display processor 170 of the editor 122 make this text-like editing behavior possible by carefully managing (1) the movement and placement of the cursor, which is the means by which the user interacts with the displayed program 118, and (2) the display of whitespace, or gaps, between adjacent tokens in the token stream 158.

The editor 122, as an option, also allows a user to request a full syntactical analysis, including static semantic analysis, or "update" of the program 124 to supplement the less complete, but still useful, token stream representation of the program 158. Alternatively, the editor 122 may perform an optional update whenever there is an opportunity to do so. In either case, to perform the update, the editor 122 calls the structural analyzer 150, which analyzes the token stream 158 in light of the language-specific lexical/syntax rules 168, and then updates or forms (when the program 124 is newly entered) the syntax tree 162 of the program 124. The information gleaned from the update is then used by the typographical display processor 170
to update the token-based information on the display 118. When the structural analyzer 150 encounters parts of the program 122 not susceptible to syntactical analysis due to the presence of syntax or lexical errors, neither the syntax tree 162 nor display of those parts is modified. Thus, the editor 122 is capable of seamlessly merging information from the two representations on the display 118 in a way that provides all available useful information without hindering the user.

The typographical display facility 170 prettyprints the program 124 to the display 118 based on the token stream 158 from the keystroke executive 130 and/or the syntax tree 162 from the last update generated by the structural analyzer 150. The typographical display processor 170 also displays the various annotations 140, including error messages 160a and comments 160b alongside the displayed program in an intuitively useful manner determined by the user. The typographical display facility 170
also controls the visual whitespace between displayed tokens and positions the cursor in the displayed program so that the user can edit the displayed program as if in a text based editor rather than a structural editor.

Having briefly surveyed the structure and operation of the principal components of the editor 122, they will now be discussed in detail, beginning with the keystroke executive 130.

Keystroke executive 130

Referring to FIG. 3, there is shown a block diagram of the keystroke executive 130, including additional details of data structures with which the keystroke executive 130 and its subroutines interact. The only system elements shown in FIG. 3
that are not represented in FIG. 2 are the separator table 168a-1 and the extended lexeme table 168a-2, which are elements of the lexical rules 168. The structure of these two tables 168a-1 and 168a-2 and the token stream 158 are described below.

As was said above, the keystroke executive 130 transforms, often with the assistance of the tokenizer 132, the input events 113 being entered by the user editing the program 124 into changes to a token stream 158. In the preferred embodiment of the present invention shown in FIG. 3, the token stream 158 is stored as an ordered list of tokens, where each token in the list is characterized by: (1) its position 158a in the token stream 158, (2) its constituent symbols, or content, 158b, and (3) its lexical type, or class, 158c. Additionally, each token may include an optional, boolean, provisional separator bit. For example, the tokens 158-1 through 158-13 occupy positions 104 through 116 in the token stream 158 and correspond to the following C-language for-statement (the examples in this document are drawn from the C and C++ computer languages) entered from the input device 112 and processed by the input device executive 130: for (i=0; i<1024; i++). Note that the keystroke executive 130 has transformed this one line expression into a series of 13 tokens, each associated with corresponding symbols from the event stream 158b and a lexical class 158c. Also note that the token class 158c is represented in two ways: as a lexical class description (e.g., "keyword(for begin)") and a corresponding numerical class (e.g, "21"); this class number 158c is how the other routines and data structures of the editor 122 (e.g., the separator table 168a-2 or the typographical display processor 170) refer to the class of a token.

A. Extended Lexeme Table 168a-2:

The lexical class information 158c is generated by the tokenizer 132 and the keystroke executive 130 according to the lexical rules in the extended lexeme table 168a-2. This extended lexeme table 168a-2 is structured like lexeme tables used in batch-mode lexical analyzers, or lexers, such as Lex, in that it contains a series of rules defining the lexemes or lexical classes of the language in which the program is being entered. However, the extended lexeme table 168a-2 is larger than a conventional lexeme table for the same language. This is because the extended lexeme table 168a-2 must account for the many "incomplete and "ill-formed" tokens that are formed as the user enters and/or edits the program 124, which is lexically analyzed by the keystroke executive 130 or the tokenizer 132 after every input event.

For example, as would a conventional lexeme table, the extended lexeme table 168a-2 includes a lexical rule that defines a legal integer literal (class 67) expressed in hexadecimal ("hex") format, which, in C/C++, corresponds to the string "0x" followed by a series of symbols "#", where "#" is an integer between 0 and 9 or a character between A and F. However, unlike a conventional lexer, the extended lexeme table 168a-2 also includes rules that define incomplete integer/hex literals (the string "0x" followed by nothing else), and ill-formed integer/hex literals (the string "0x" followed by at least one symbol "$", where "$" is not an integer between 0 and 9 or a character between A and F). These extended lexemes (i.e., incomplete and ill-formed lexemes) are necessary in the present editor 122, which aims to provide useful lexical information regarding the lexical classes of the tokens in the token stream 158 as the user inputs the corresponding event stream 113. Such extended token information is not necessary in a conventional lexer, which, as the front end of a compiler, simply rejects either kind of token as illegal without any further attempts to classify them. The preceding discussion is a brief description of the extended lexeme and on-the-fly analysis of tokens. A more complete explanation is provided in the same inventor's co-pending U.S. Patent application, Method and Apparatus for Diagnosing Lexical Errors, Ser. No. 08/305,220, filed Sep. 13, 1994.

For example, consider the following event stream 113 which is generated by a user typing an integer hex literal, and the corresponding token stream representation 158, which is generated keystroke-by-keystroke by the keystroke executive 130. In this illustration, the character just typed by the user is underlined and the corresponding insertion point 157 in the token stream 158 after the keystroke is marked with a ".vertline." character.

TABLE 1 ______________________________________ Event stream Position Content Lexical Class 113 158a 158b 158c ______________________________________ 0 1 0.vertline. Integer literal 0x 1 0x.vertline. Incomplete integer literal 0x5 1
0x5.vertline. Integer literal 0x5H 1 0x5H.vertline. IllegaI integer literal ______________________________________

In this example, note how the lexical class changes as the user types. This is because the keystroke executive 130 lexically analyzes the event stream 113 after each user keystroke. Also note that the token stream position 158a does not change as the user's keystrokes have the effect of modifying a single token, which, in this example, is the first token. Rather, what does change after each token stream 158 update is the insertion point 157, which marks both the keystroke executive's work position in the token stream 158 and the position of the cursor (displayed to the user on the display 118), with which the user interacts with the program being edited.

B. Token Boundaries

In another marked contrast to text editors, the keystroke executive 130 does not represent in the token stream 158 spaces typed by the user between program tokens (however, as will be discussed later, the keystroke executive 130 does represent in the token stream 158 typed spaces in string and character literals). Instead, the textual role played by whitespace in conventional text editors is replaced in the token stream 158 by token boundaries maintained by the keystroke executive 130 as the user types. Most token boundaries are implicit in what the user types and are recognized by the keystroke executive 130 and tokenizer 132 through lexical analysis. At such implicit boundaries there is no editable element between tokens.

For example, recall that the tokens 158-1 through 158-13 in the token stream 158 shown in FIG. 3 correspond to the event stream "for(i=0;i<1024;i++)". This expression could have been entered from the input device 11 2 in many ways, all semantically equivalent; e.g., the user might have typed the event stream 11 3 as "for.sub.-- (i.sub.-- =0;.sub.-- i<1024;.sub.-- i++)" or as "for(i=0;.sub.-- i<1024;.sub.-- i++)", where the ".sub.-- " symbol corresponds to a typed space in the event stream 113. To the user entering this text, these spaces play an important role. For example, they provide visual separation between adjacent program words and, as they are directly displayed by text editors, are an integral component in rendering a pleasing display of the program 124. However, none of these spaces are necessary in a semantic sense; i.e., their presence or absence does nothing to change the meaning of the for-loop expression (although in some situations, covered below, spaces do provide semantic information). Consequently, the keystroke executive 130, in tokenizing these event streams, discards any spaces entered by the user between the tokens 158-1 through 158-13. That is, the boundaries between the tokens 158-1
through 158-13 are implicit, being lexically inferable from the lexical classes of the adjacent tokens. At such implicit boundaries there are no editable elements between the tokens.

1. Separators

There are also ambiguous token boundaries, where the program language would require a space in a purely textual representation (i.e., ambiguous boundaries are token boundaries where a space would be required in the textual equivalent of the token stream to preserve the boundary). In these cases, the keystroke executive 130 creates a separator, an editable element that acts something like a space. More specifically, in the present editor 122, a separator will only be associated with adjacent tokens in the token stream 158 that could legitimately be merged into one token. Generally, this determination is made by comparing the content of what the user is typing to the language definition of the program such as defined in the extended lexeme table 168a-2.

For example, no separator is needed between the two tokens "for" and "(" in the token stream "for(" because those two tokens cannot legitimately be merged. In other words, no space is required in the textual equivalent of the token stream to preserve the boundary between the program words "for" and "("; which is not an ambiguous boundary. As a result, regardless of whether the use types "for.sub.-- (" or "for(", the keystroke executive 130 always gives the same token stream without a separator: "for", "(".

In contrast, if the user types two adjacent "+" characters without any intervening spaces, the keystroke executive 130 generates a single corresponding plus-plus token "++", but if the user types a plus character "+" followed by a space ".sub.-- " then another plus character "+", the keystroke executive 130 returns a different token stream with two tokens and an intervening separator: "+.quadrature.+". Note that the separator in the two token example is not required because the token stream representation of the event stream "+.sub.-- +" would be ambiguous without it; in fact, the event stream "+.sub.-- +" could be unambiguously represented as two adjacent plus ("+") tokens having an implicit token boundary. This separator is also not required to create a pleasing display of the program 124 as the typographical display processor 170 displays visual whitespace between adjacent tokens based on their respective lexical classes, not user-entered spaces. Rather, the separator in the token stream 158 corresponding to the event stream "+.sub.-- +" is required so that the user can edit the corresponding stream as if it were text. I.e., by providing an editable separator corresponding to the displayed whitespace between the two plus tokens, the editor 122 gives the user an object to delete, whose deletion instructs the keystroke executive 130 to join the separated tokens into the single token, "++". If this separator were not present, there would be no simple, text-like way in which a user could join the two tokens.

For example, consider the editor's behavior if a necessary separator were missing. If there were only an implicit boundary between the tokens (i.e., if the event stream "+.sub.-- +" were internally represented as the two adjacent tokens "+" and "+"), the user might place the cursor (".vertline.") in the whitespace between the displayed tokens as follows: "+.vertline.+", and hit the rubout key, thinking this action should delete the space between the tokens, thus textually joining them. However, because the only editable element to the left of the cursor is the leftmost plus token, this action would only result in the keystroke executive 130 deleting that leftmost token. Thus, adding a separator between the two plus tokens allows the user to join the two tokens in a familiar, text-like manner (i.e., by putting the cursor between the separator and the rightmost plus token, then deleting backwards). Moreover, because a separator serves only an editing purpose and not a visual (whitespace) purpose, the keystroke executive 130 will allow only one separator between tokens, no matter how many spaces a user enters. In addition to serving as an editable element, a separator acts as an immutable token boundary; i.e., adjacent tokens with an intervening separator may never be joined as a side effect of other operations.

The preceding discussion set out what separators are; i.e. they are editable elements that act like textual space and define immutable token boundaries. The following discussion sets out various ways in which separators are implemented in the present editor 122.

a. Separators-lmplementation:

As stated above, the tokenizer 132 or the keystroke executive 130 determines whether a separator is required in each token boundary based on lexical properties of the two surrounding tokens. In the preferred embodiment of the present invention, the keystroke executive 130 does not actually represent any required separators in the token stream 158, but decides, on the fly, whether a separator exists or should exist between two tokens. Alternatively, based on the same type of decision procedure, the keystroke executive could actually add an explicit separator to the token stream 158.

This decision procedure can be implemented by the editor 122 in two different ways, one approximate but conservative (meaning that some unnecessary separators are introduced into the token stream, but that no necessary separators are omitted), the other exact (meaning that only necessary separators are introduced into the token stream). The approximate procedure must be conservative as it is ok with respect to editing behavior to have additional separators but absolutely critical not to omit necessary ones.

The preferred embodiment of the present invention implements the approximate but conservative approach. In this approach, the keystroke executive 130 or tokenizer 132 determines the need for a separator between two tokens by referring to the separator table 168a-1 on the basis of the lexical classes 158c of the adjacent tokens. The separator table 168a-1, as is shown in FIG. 3, consists of a grid of boolean separator values (0/1.fwdarw.no separator/separator) indexed by the lexical class of two adjacent tokens in the token stream 158. For example, if the leftmost of the two adjacent tokens has a class 158b of "101" (reading the vertical axis of the table 168a-1), and the rightmost has a class 158b of "100" (reading the horizontal axis of the table 168a-2), they would be separated by a separator as their intersection in the separator table 168a-1 has a boolean separator value of "1". This is an approximate approach because it is based on the lexical classes of the adjacent tokens and not the particular tokens being lexically analyzed. For example, class-oriented rules in the separator table 168a-1 would indicate that a separator should always be included between an identifier on the left and a floating point number on the right. This rule is required because a user might occasionally want to join these types of tokens; e.g., where the left token is "A" and the right token is the floating point number "1E5", these two tokens could legitimately be merged into the single token identifier "A1E5". However, because this rule is conservative, it introduces an unnecessary separator in other situations where the textual equivalent of adjacent tokens of the same respective classes have an unambiguous boundary; e.g., where the left token is "A" and the right token is ".1E5" , the tokens cannot legitimately be joined. Using this approximate approach it is ok to add unnecessary separators (although this does require some special editing approaches), but it is not ok to eliminate necessary separators. For an example of such a case where the separator is critical, see the "++" example set out above. Thus, the approximate approach embodied in the extended lexeme table 168a-2 errs on the side of adding unnecessary separators.

As intimated above, this approximate and conservative approach does cause some editing problems, such as when a user tries to delete an unnecessary separator such as the one associated with the boundary between the token "A" on the left and the token ".1E5" to the right (in this case the editor 122 simple moves the cursor over the unnecessary separator). However, this table lookup method is a fast, resource-efficient approach that can be executed on the fly as the user of the editor 1 22
generates the event stream 113. Moreover, it should be noted that unnecessary separators occur rarely in practice in languages such as C and C++ because the adjacent tokens that would give rise to an unnecessary separator never legitimately occur next to each other.

Like the approximate approach, the exact decision procedure is rule-based, but rather than being token class-oriented, this latter approach evaluates the specific adjacent tokens according to a set of lexical rules to determine whether adjacent tokens could legitimately be joined. For example, the efficient approach would prescribe a separator between the tokens "A" and "1E5" but not between the tokens "A" and ".1E5". This approach eliminates editing side effects as described above, but is more computationally intensive.

2. Provisional Separators

Provisional separators are a temporary record that the user has just hit the space bar in a place, such as between two tokens, where there is no need of an additional separator. The keystroke executive 130 or tokenizer 132 makes use of these provisional separators based on the assumption that the user must have typed the space for a good reason, thus, a record of the space should be preserved, if only for a short time. This allows the user to enter a new token whose boundary with the previous token is ambiguous, in which event the keystroke executive 130 eventually converts the provisional separator to an ordinary separator. If the user types a new token whose boundary with the previous token is not ambiguous, the keystroke executive 130 merely eliminates the provisional separator. In the preferred embodiment of the editor 122, there can be only one provisional separator in the token stream 158, and it only exists immediately to the left of the insertion point (described later).

For example, assume that the user wishes to modify the text expression "a=b" to read "a.sub.-- c=b". In a text editor, a typical user might accomplish this change by placing the cursor to the left of the equals sign in the first expression, hitting the spacebar and typing a "c" following the space. However, in the editor 122, the event stream "a=b" is actually represented as the token stream 158 shown in Table 2 consisting of three tokens with no intervening separators.

TABLE 2 ______________________________________ position content class p/s/bit 158a 158b 158c 158d ______________________________________ i a vanable/61 0 i + 1 = equals- 0 arith.op./71 i + 2 b variable/61 0 ______________________________________

Thus, in the absence of a provisional separator, when the user places the insertion point 157 between the tokens "a" and "=" and strikes the spacebar, the keystroke analyzer 130 would make no corresponding change in the token stream 158. I.e., the space typed by the user would not be recorded or recoverable. This is because the insertion point 157 already corresponds to an implicit token boundary between unambiguous tokens. Thus, when the user goes on to type the "c", which textually has an ambiguous boundary with the token "a" (i.e., "a.sub.-- c" has a different textual meaning than "ac"), the keystroke executive 130 would not introduce the necessary separator at the boundary between the "a" and "c" tokens and instead would form the updated token stream "ac=b". However, as the present editor 122 assumes that the user must have typed the seemingly superfluous space for a reason, it provides a means (the provisional separator) for representing this kind of editing activity in the internal token stream 1 58, which allows the editor to change the provisional separator to an ordinary separator when the user enters the "c".

In the preferred embodiment of the present invention, the provisional separator is represented as a boolean provisional separator bit 158d (0/1>set, not set) in the insertion point data structure 157. Thus, in the present editor 122, there is only one provisional separator permitted in the token stream 158, and, if present, it is always understood to exist only to the left of the insertion point. This is because a provisional separator is an ephemeral object created by the editor 122 as a result of an immediately preceding input event. In another preferred embodiment of the present invention, the provisional separator is represented as an additional bit in each token in the token stream 158 indicating whether the token is followed by a provisional separator.

To allow the kind of editing behavior expected by users of text editors, the keystroke executive 130 or tokenizer 132 sets the provisional separator bit 158d whenever the user types a space at an existing token boundary where no separator is present. Thus, in the above example, the tokenizer 132 sets the provisional separator bit 158b when the user types a space between the "a" and "=" tokens; and the resulting token stream would read: "a.box-solid.=b" where ".box-solid." represents the provisional separator 158b at the insertion point 157). After the tokenizer 132 inserts the provisional separator in the token stream, the typographical display processor 170 will display the provisional separator as a space, but possibly in a way that is distinct from a displayed ordinary separator. Generally, a provisional separator 158b is ephemeral, being reset to 0 by the keystroke executive 130 and possibly replaced by an ordinary separator depending on the next user keystroke. In the example above, assuming that the user types a "c" at the provisional separator ".box-solid.", the tokenizer 132 will appropriately replace the provisional separator with an ordinary separator, meaning that the token stream 158 now reads "a.quadrature.c=b". The tokenizer uses the ordinary separator in this case as the token boundary between the tokens "a" and "b", which are both of the lexical class "variable/61", is ambiguous. If the user typed a plus sign "+" at the provisional separator instead of "c", the tokenizer 132 would remove the provisional separator completely as the resulting boundary between "a" and "=" in the expression "a+=b" would be unambiguous. Note that the tokenizer 132 will never enter more than one provisional separator between adjacent tokens into a flow.

C. Lexical Analysis Process

As set out above, maintaining the token stream 158 on the fly requires the keystroke executive 1 32, with the assistance of the tokenizer 132, to recognize all of the lexemes allowed by the language in which the program is being entered. Lexical analysis to this extent is well-known in batch-oriented tokenizer's, such as "Lex", which are executed on a program after the program has been entered. However, to maintain the token stream 158 on the fly completely and accurately, the keystroke executive 130 and the tokenizer 132 must also recognize the ill-formed and incomplete lexemes that are formed while a user of the editor 122 is typing.

Generating this kind of extended lexical information, e.g., whether a token is a legal or incomplete floating point expression, requires the keystroke executive 132 to analyze the impact of each user keystroke in the context of a small portion of the existing token stream 158 (called a token stream segment 138), insertion point 157, extended lexeme table 168a-2 and separator table 168a-1. Of course, the number of tokens in the token stream segment 138, and their positions relative to the current insertion point depends on the lexical characteristics of the programming language in which the program is being entered. For example, in C or C++, the keystroke analyzer 130 generally needs to analyze no more than the tokens adjacent the token containing the insertion point.

While the keystroke executive 130 can sometimes update the insertion point 156 and token stream 158 by itself (e.g., when a user deletes the single character of a single character token), in most situations it must consult with the tokenizer 132, which will suggest an update to the token stream segment 138 based on the extended lexical rules 168a-2. In such a consulting situation, the keystroke executive 130 queries the tokenizer 132 differently based on whether the current editing operation is an insertion or a deletion. In the case of an insertion, the keystroke executive 130 passes the tokenizer 132 a token stream segment 138 consisting of the token to the left of the insertion point and all of the tokens to the end of the current line of text, the character to be inserted, and the position at which that character is to be inserted, and asks the tokenizer 132 to recommend minimal changes to the token stream segment 138 that reflect the textual modification. In response, the tokenizer
132, which is a narrow expert in lexical analysis, knowing nothing of the insertion point 1 57, visual whitespace, or cursor behavior, performs an incremental lexical analysis on the token stream segment 138 by inserting the new character into the textual equivalent of the token stream (treating all separators as spaces) in the place indicated by the keystroke executive 130, and then analyzing the result as if it were text. Because the tokenizer 132 tries to make minimal changes to the token stream 138, in most situations changes to the token stream will not go beyond the token to the right of the insertion point. The tokenizer 132 then returns its suggested token stream update to the keystroke executive 130. It must also return the position of the inserted character relative to the updated token stream segment 138' so that the keystroke executive 130 can update the insertion point appropriately.

When the editing action is a delete, the keystroke executive 130 identifies the editable element in the token stream segment 138 the user wants to delete and asks the tokenizer 132, "if this element were to be deleted, how should the token stream segment 138 be updated?" As in the case of an insertion, the tokenizer 132 computes the minimal change to the token stream segment 138 and returns its suggested update and the offset of the deleted character relative to the updated token stream segment
138' to the keystroke executive 130.

The keystroke executive 130 is then free to accept or reject the suggestion based on higher level information to which only it has access. The keystroke executive 130 then writes the updated token stream 138' back to the token stream 158 and updates the insertion point 157. This process is described in greater detail in a pseudocode fragment in Pseudocode Insert A, included at the end of the detailed description.

As an example of a case where the keystroke executive 130 rejects the suggestion of the tokenizer 132 assume that a user about to create a comment has just typed the first character of a C-style comment beginning delimiter (i.e., "/"). The keystroke executive 130 passes this keystroke to tokenizer 132, which concludes that this keystroke is a C division token; therefore it suggests that the keystroke executive 130 replace the token stream segment 138 with an updated token segment 138' including the new division token. At this point, the keystroke executive 130 has no way of knowing that the user is typing a comment; therefore, it accepts the tokenizer's suggestion. Now assume that the user has just typed the second character of the comment beginning delimiter (i.e., "*"), which would look to an ordinary lexer like a C multiplication token. In the preferred embodiment of the present invention, the tokenizer 132 using extended lexical analysis rules is aware that it should join these two tokens together to form a special token ("/*") that is not in the language. This causes no confusion because neither C nor C++ allow a division token to be meaningfully adjacent to a multiplication token. Thus, the tokenizer 132 suggests that the keystroke executive update the token stream 158 by replacing the division token ("*/*") with the special token "/*". However, and this is an area that will be covered in depth later, a basic tenet of the editor 122 is that comments and any other non-program information be kept separate from the token stream 158, simplifying lexical analysis of the program 124. Consequently, the keystroke executive 130 rejects the replace suggestion made by the tokenizer 132 and instead deletes the "/" from the token stream 158 and calls the comment processor 136 to create a new comment. This commenting process is described below in greater detail.

The extended lexical type data generated by the tokenizer 132 are critical to the success of the present editor 122 as they provide information to the typographical display processor 170 that allows it to typeset the line being edited before it has been completed by the user and before it has been syntactically analyzed by the structural analyzer 150. For example, based on extended lexical analysis results, the prettyprinter 170 is able to typeset the incomplete floating point token "1.0E" or the illegal token "1.0E@" according to the display specification for a completed floating point expression, while displaying the extended lexical type information (e.g., that the token is incomplete or ill-formed) in such a manner as to assist the user in completing or correction the token (e.g., by highlighting the illegal token in red). This would not be possible in a conventional batch processor, which could not handle the torrent of incomplete tokens generated as the user enters the program.

Of course, the fact that the editor 122 can provide lexical information that is displayed by the prettyprinter as the user types would be of no value to users unless they could easily interact with and edit the program being displayed. Thus, to achieve a high level of user acceptance, users should be able to interact with the displayed program as if it were textually, not structurally, represented. This implies that cursor movement and whitespace representation in the present editor should approximate that in conventional text based editors for the most common editing operations. How the present editor 122 does this, in the context of updating the token stream 158, is set out below.

1. Token Stream Updating Methods for Text-Like Cursor Control

In the present editor 122, a user edits a computer program 124 by interacting with the displayed program through a cursor whose cursor position is defined by the current insertion point 157 in the token stream 158. The current insertion point
157 is maintained by the keystroke executive 130, which intercepts all of the user keystrokes and transforms them into the token stream 158; the typographical display processor 170 then communicates the current insertion point 1 57 to the user via a cursor at the appropriate place within the prettyprinted token stream 158 on the display 118.

In a text-oriented editor, such editing behaviors are a simple matter--the editor simply moves the cursor from character to character as the user edits the underlying text, which has an internal representation that, generally, corresponds one-to-one with the displayed program. This simple heuristic also applies to the situation when the user moves the cursor from one program word to another across one or more spaces, as spaces are editable elements in conventional text editors.

However, the present editor 1 22 internally represents the program 124 as a token stream 158, small bits of which (i.e., token stream segments 138) are updated on the fly by the keystroke executive 130. Moreover, there are no inter-token spaces in the token stream representation 158; instead, the tokens are separated either implicitly or by separators. Consequently, when the user edits the event stream 158, the keystroke executive 130 or the tokenizer 132, in determining how to update the token stream segment 138, must take into account (1) the user's keystroke/editing action (2) the position of the insertion point/cursor relative to the tokens and, possibly, separators, composing the token stream segment 138, and (3) the content 158b and lexical class 158c of the tokens in the token stream segment 138. Additionally, the keystroke executive 130 must update the insertion point appropriately, and the typographical display processor 170 must display the cursor in a screen position that (1) corresponds to the insertion point 157 and (2) is intuitive to the user. As lexical classes have been discussed already, this section will explain the methods and rules by which the keystroke executive 130 and/or the tokenizer 132 computes an updated token stream segment 138' and insertion point 157. The sole purpose of these methods and rules is to give the user the impression that they are editing a stream of editable elements in a conventional text editor.

The first issue in defining updating rules for the keystroke executive 130 is to consider the range of editing operations permitted by the editor 122. The present editor 122 supports three basic text-oriented editing operations. These three basic editing operations are: (1) inserting a non-space character; (2) deleting an editable element and (3) inserting a space. The editor 122 also supports basic navigating operations such as mouse clicks and cursor forward and cursor backward keystrokes. These navigating operations are handled in the obvious way, by moving the insertion point over the editable elements. Given that the token stream of the present invention includes tokens and separators, and that no two separators can be adjacent except in the situation where one of the separators is a provisional separator, the insertion point 157 can occupy many possible positions with respect to an individual token and the aforementioned separators. However, of these myriad insertion point positions, only the six basic positions shown in Table 3 are necessary to the following discussion of the token stream and insertion point updating performed by the keystroke executive 130 (i.e., Table 3 is not exhaustive). Each row of Table 3
describes the insertion point, shows its associated index, which will frequently be referred to in the following discussions, and also includes an example of a token stream segment ("n: in int") showing the respective insertion point 157. In these examples, "(", "n", ":", "in", "int" and ")" are all tokens and the vertical bar (".vertline.") marks the insertion position 157 and the corresponding cursor position. Ordinary separators and provisional separators are shown as a hollow box ".quadrature." and a filled box ".box-solid." respectively.

TABLE 3 ______________________________________ Basic Insertion Point Positions Position Example ______________________________________ 1: Within a token. (n: i/n.quadrature.int) 2: Between tokens on a line, (n: /in.quadrature.int) no separator. 3. Between tokens on a line, (n: in/.quadrature.int) at left of a separator. 4. Between tokens on a line, (n: in.quadrature./int) at right of a separator. 5. Between tokens on a line, (n: .box-solid./in.quadrature.int) at right of provisional separator. 6. Between tokens on a line, at left (n: in.box-solid./.quadrature.int) of a sep, at right of prov. sep. ______________________________________

Thus, the keystroke executive 130 must have access to the following information to unambiguously locate the cursor in the token stream 158:

(1) the token-oriented position (one of the six values from above),

(2) if the token-oriented position is "1", the token-stream position of the token in which the insertion point is positioned, or, if the token-oriented position is not "1", the token-stream position of the token to the right of the insertion position;

(3) the specific editable element to the right of the insertion point; and

(4) the provisional separator bit 158d.

Except for the provisional separator bit 158d, this cursor positioning information can be stored in a data structure accessible to the keystroke executive 130 or can be determined on the fly by the keystroke executive 130 from the editing context. The preferred embodiment of the present invention adopts the latter approach; however, for the purposes of this application, the four values (1) through (4) are illustrated as the token.sub.-- position 157b, the text.sub.-- stream.sub.-- position 157a, the editable.sub.-- element.sub.-- position 157c, and the provisional separator bit 158d of the hypothetical insertion point data structure 157. Having discussed the basic insertion point positions, the three methods by which the editor
122 executes the three basic editing operations are now set out. When reviewing the tables illustrating these various methods, please note that the editing outcomes shown in the tables reflect notions of what is natural cursor and editing behavior in text editing (the goal of the editor 122 being to provide text-like editing of structurally-represented programs). In most cases, the outcomes follow obviously from the editing operation. In a very small percentage of the cases, the outcomes reflect presumed user preferences which could be resolved differently depending on the user. Thus, these outcomes should be understood to reflect design choices made in the invention, other outcomes being equally consistent with the teachings of the present editor 122. Moreover, not all possible outcomes are shown, or if shown, discussed as the following discussions will focus on those editing outcomes that provide insight into important aspects of the keystroke executive's behavior. These editing methods are also described by the pseudocode included herein as Pseudocode Insert B, included at the end of the detailed description.

The keystroke executive 130 also includes heuristics that determine how the cursor position/insertion point should be updated following editing actions that cause a separator to be added or removed from the token stream 158. One such heuristic follows: when an editing operation results in the addition of a separator in the token stream, place the cursor behind the new separator, where "behind" means to the right of separators added to the left of a deleted character, and to the left of separators added to the right of an inserted character. The idea behind this heuristic is to keep the cursor from hopping past new separators to render the behavior of the cursor familiar to users of the editor. Of course, this and all other heuristics might not seem natural to users who do not share the same set of expectations about "natural" editor performance. Consequently, any number of cursor control heuristics can be written into the keystroke executive 130.

a. Inserting a Character

When the user types a non-space character, the keystroke executive 130 or tokenizer 132 inserts the character into the token stream segment 138 at a position corresponding to the insertion point 157; lexically analyzes the results, treating separators as immutable token boundaries; and computes changes to token boundaries and classes. The resulting updated token stream segment 138' is then written back to the token stream by the keystroke executive 130.

In this process, the lexical analysis step proceeds left to right, starting at the beginning of the left-most token touching the insertion point. For example, Table 4 shows how the tokenizer 132 responds to the insertion of the character "x" in representative flows at the various insertion points. Note that in every case shown, the insertion point is displayed to the right of the new character "x" when the operation has been completed.

TABLE 4 ______________________________________ Insert an "x" Position Example Editor Response Final Position ______________________________________ 1: Within a (n: i/n.quadrature.int) Change "in" to "ixn"; 1: token. reanalyze. "ix/n.quadrature.int" 2: Between, (n: /in.quadrature.int) Change ": in" to ":xin"; 1: no sep. reanalyze. "n: x/in.quadrature.int" 3. At left of a (n: in/.quadrature.int) Change "in" to "inx"; 3: separator. reanalyze. "inx/.quadrature.int" 4. At right of a (n: in.quadrature./int) Change "int" to "xint"; 1: separator. reanalyze. "in.quadrature./xint" 5. At right of a (n: .box-solid./in int) Change "in to "xin"; 1: prov. sep. reanalyze. "n: x/in.quadrature.int" 6. At left of a (n: in.box-solid./.quadrature.int) Analyze "x" by itself; 3: sep., insert new token "n: inx/.quadrature.int" right of a prov. sep. ______________________________________

Referring to Table 4, the "Final Position" in each case results from the example shown. Other final positions are also possible; for example, assuming that the cursor is in position 1, e.g., "i.vertline.n", and the user types the token "+" instead of "x", lexical analysis would result in an updated token stream segment 138' of "i+.vertline.n", consisting of three tokens with only implicit boundaries, where the final insertion point, or cursor position, is position 2 (between tokens--no separator).

Table 5 presents various examples in which the keystroke executive 130 would place the final cursor position in different final positions than those shown in Table 4. Unlike Table 4, these examples involve the insertion of different characters in different illustrative token stream segments. Thus, the "Example" column shows the existing token stream (bolded) and the user keystroke (in italics). The "Final Position" column shows the final cursor/insertion point position and the corresponding updated token stream segment, including the new insertion point marked with the ".vertline." symbol. Ordinary and provisional separators are shown as in Tables 3 and 4.

TABLE 5 ______________________________________ additional insert examples Final Position Example Editor Response Position ______________________________________ 1: Within a a/b; Change "ab" to "a+b"; 2: token. type "+" reanalyze. "a+/b"
2: Between, a/); Change "a)" to "a+)"; 2: no sep. type "+" reanalyze. "a+/)" 3. At left of a +/.quadrature.+ Change "+.quadrature.+" to 2: separator. type "+" "++.quadrature.+"; "++/+" reanalyze. 4. At right of a a.quadrature./b Change "a.quadrature.b" to 2: separator. type "+" "a.quadrature.+b"; "a+/b" reanalyze. 4. At right of a a.quadrature./.1; Change "a.quadrature..1" to 3: separator. type "a" "a.quadrature.a.quadrature..1"; "a.quadrature.a/.quadrature..1" reanalyze.
5. At right of a a.box-solid./); Change "a.box-solid.)" to 2: provisional type "+" "a.box-solid.+)"; "a+/)" separator. reanalyze. ______________________________________

In the first of the examples ("position"=1) in FIG. 5, the user has typed a plus token "+" between two editable elements of the token "ab". Consequently, when this token stream segment 138 is evaluated by the tokenizer 132, it is split into three tokens with implicit token boundaries (implicit because the equivalent text stream "a+b" is unambiguous regardless of intervening spaces). Thus, the tokenizer 132 suggests the updated token stream segment 138', where the insertion point is in position 2, between the "+" and "b" tokens with no separator. The second example ("position"=2) is similar, except in this case, the initial token stream consisted of two tokens "a" and ")" with no separator. Because the updated token stream 138' is equally unambiguous after insertion of the "+" token, no separator is required between the "+" and ")" tokens, meaning the final insertion point and cursor are still in position 2.

The third example ("position"=3) is interesting because it results in the elimination of a separator. In this case the user has placed the insertion point between the first "+" token and the separator (".quadrature.") and entered another plus "+". The tokenizer 132 recognizes the two plus characters "++" as a legitimate C/C++ arithmetic operator, so it joins the two into a single "++" token. The tokenizer 132 also recognizes, based on the contents of the separator table 168a-1, that no separator is necessary between the "++" token and the adjacent arithmetic operator "+". Thus, the tokenizer 132 returns an updated token stream segment 138' in which the separator has been removed and the insertion point is now between the "++" and "+" tokens; this is position 2. In the fourth example ("position"=4), the separator also has disappeared after the insertion. This is because, the ambiguity, thus the need for a separator, between the adjacent variables "a" and "b" is eliminated by the insertion of the plus token "+". Thus, the final insertion position in the updated token stream segment 138' is position 2, following the just-inserted plus token.

In the fifth example (also with "position"=4), the tokenizer 132 inserts the character "a" into the token stream segment "a.quadrature..vertline..1" and reanalyzes the resulting text string. Because, in C/C++, it might be possible to join the inserted identifier "a" with either an identifier (such as "a", to the left of the insertion point) or a floating point literal (such as "0.1" to the right of the insertion point), in the updated token stream segment 138', a separator is associated with the token boundary between the tokens "a" and "0.1". However, in this particular example the new separator is unnecessary as the particular tokens "a" and "0.1" could never legitimately be joined. In the sixth and final example, the provisional separator is present because, before the current insertion event, the user typed a space between the tokens "a" and ")", where no separator was otherwise required. When the user types the character "+" the tokenizer 132 deletes the provisional separator, which is no longer necessary given the unambiguous boundaries between the tokens "a", "+" and ")", and the insertion point lies in position 2, between the "+" and ")" tokens.

b. Deleting a Character

When the user deletes an editable element in the token stream 158, the action taken by the keystroke executive 130 depends on the editing context. When the user deletes the only character of a single character token, the keystroke executive 130
deletes the token from the token stream segment 138. When the user deletes the separator between two tokens, the keystroke executive 130 joins the adjacent tokens of the token stream segment 138 into a single token. When the user deletes a character from a multi-character token, the keystroke executive 130 removes the character from the token. In most cases, after deleting the character the keystroke executive 130 reanalyzes the resulting flow in the same fashion as for insertions, then writes the updated token stream segment 138' back to the token stream 158.

Table 6 shows how the keystroke executive 130 and tokenizer 132 respond to a "delete next character" command issued by the user when the cursor is at a position in the flow corresponding to the various insertion positions. A similar table is not shown for the "delete previous character" command as those results are nearly symmetrical to those shown below.

TABLE 6 ______________________________________ Delete Next Character Final Position Example Editor Response Position ______________________________________ 1. Within a A/BC Change "ABC" to "AC"; 1: token. reanalyze. "A/C" 1. Within a A/B+ Change "AB+"to "A+"; 2: token. reanalyze. "A/+" 1. Within a A/B.quadrature.C Change "AB.quadrature.C" to "A.quadrature.C"; 3: token. reanalyze. "A /.quadrature.C" 2. Between, no +/AB Change "+AB" to "+B"; 2: sep. reanalyze. "+B" 2. Between, no A/+B Change "A+B" to "AB"; 3: sep. reanalyze. "A /.quadrature.B" 3. At left of a A/.quadrature.B Change "A.quadrature.B" to "AB"; 1: separator. reanalyze. "A/B" 3. At left of a A/.quadrature..1 Change "A.quadrature..1" to "A.1"; 4: separator. reanaly