United States Patent5751985
Shen , ; et al.May 12, 1998

Title

Processor structure and method for tracking instruction status to maintain precise state

Abstract

Apparatus and method provide for tracking and maintaining precise state by assigning a unique identification tag to each instruction at the time of issue, associating the tag with a storage location in a first active instruction data structure, updating the data stored in the storage location in response to instruction activity status changes for each instruction, and maintaining a plurality of pointers to the storage locations that move in response to the instruction activity status. Status information includes an activity data item, such as an activity bit, that is set at the time the instruction is issued and cleared when execution completes without error. Pointers are established that point to the last issued instruction, the last committed instruction pointer, and reclaimed instruction pointer. These three pointers are moved forward toward the later issued (newer) instructions along the data structure based on comparisons of the active-bit for each location associated with one instruction in the data structure and predetermined rules. Exceptions or error conditions for any instruction prevent changing the active-bit so that movement of the pointers is controlled and prevented under these error conditions.


Inventors:Shen; Gene W. (Mountain View, CA), Szeto; John  (Oakland, CA), Patkar; Niteen A.  (Sunnyvale, CA), Shebanow; Michael C.  (Plano, TX)
Assignee:Hal Computer Systems, Inc. (Campbell, CA)
Appl. No.:487801
Filed:June 7, 1995

Current U.S. Class:712/218 
Field of Search:395/375,800,392,394

U.S. Patent Documents
4703481October 1987Fremont
4847755July 1989Morrison et al.
4893233January 1990Denman et al.
4903264February 1990Talgam et al.
4912707March 1990Kogge et al.
5003458March 1991Yamaguchi et al.
5003462March 1991Blaner et al.
5021945June 1991Morrison et al.
5075844December 1991Jardine et al.
5093908March 1992Beacom et al.
5193206March 1993Mills
5235700August 1993Alaiwan et al.
5261071November 1993Lyon
5271013December 1993Gleeson
5293499March 1994Jensen
5301309April 1994Sugano
5313634May 1994Eickemeyer
5313647May 1994Kaufman et al.
5355457October 1994Shebanow et al.
5442757August 1995McFarland et al.
5497499March 1996Garg et al.
5590295December 1996Deosaran et al.
Other References
Hwu et al.; Checkpoint Repair for High-Performance Out-of-Order Execution Machines IEEE Transactions on Computers; Dec. 1987; pp. 1496-1514..~
Primary Examiner: Treat; William M.
Assistant Examiner: Coulter; Kenneth R.
Attorney, Agent or Firm:Flehr Hohbach Test Albritton & Herbert LLP Ananian; R. Michael Knauer; Steven M.

Parent Case Text



RELATED APPLICATIONS

This application is a Continuation of U.S. patent application Ser. No. 08/398,299 for a PROCESSOR STRUCTURE AND METHOD FOR TRACKING INSTRUCTION STATUS TO MAINTAIN PRECISE STATE by inventors Gene W. Shen et al. filed Mar. 3, 1995 now abandoned; which is a Continuation of now U.S. patent application Ser. No. 08/390,885 for a PROCESSOR STRUCTURE AND METHOD FOR TRACKING INSTRUCTION STATUS TO MAINTAIN PRECISE STATE by inventors Gene W. Shen et al. filed Feb. 14, 1995, now abandoned.

U.S. patent application Ser. No. 08/388,602 for an INSTRUCTION FLOW CONTROL CIRCUIT FOR SUPERSCALER MICROPROCESSOR by inventor Takeshi Kitahara filed Feb. 14, 1995; U.S. patent application Ser. No. 08/388,389 for an ADDRESSING METHOD FOR EXECUTING LOAD INSTRUCTIONS OUT OF ORDER WITH RESPECT TO STORE INSTRUCTIONS by inventors Michael Simone and Michael Shebanow filed Feb. 14, 1995 now abandoned; U.S. patent application Ser. No. 08/388,606 for a METHOD AND APPARATUS FOR EFFICIENTLY WRITING RESULTS TO RENAMED REGISTERS by inventors DeForest Tovey, Michael Shebanow, John Gmruender filed Feb. 14, 1995 now abandoned; and U.S. patent application Ser. No. 388,364 for a METHOD AND APPARATUS FOR COORDINATING THE USE OF PHYSICAL REGISTERS IN A MICROPROCESSOR by inventors DeForest Tovey, Michael Shebanow, John Gmuender filed Feb. 14, 1995 now abandoned, are each hereby incorporated by reference in their entirety.

U.S. patent application Ser. No. 08/390,885 entitled PROCESSOR STRUCTURE AND METHOD FOR TRACKING INSTRUCTION STATUS TO MAINTAIN PRECISE STATE filed Feb. 14, 1995 by inventors Gene W. Shen, John Szeto, Niteen A. Patkar and Michael C. Shebanow now abandoned; U.S. patent application Ser. No. 08/398,299 entitled PROCESSOR STRUCTURE AND METHOD FOR TRACKING INSTRUCTION STATUS TO MAINTAIN PRECISE STATE filed Mar. 3, 1995 by inventors Gene W. Shen, John Szeto, Niteen A. Patkar and Michael C. Shebanow now abandoned; U.S. patent application Ser. No. 08/397,810 entitled PARALLEL ACCESS MICRO-TLB TO SPEED UPADDRESS TRANSLATION filed Mar. 3, 1995 by inventors Chih-Wei David Chang, Kioumars Dawallu, Joel F. Boney, Ming-Ying Li, and Jen-Hong Charles Chen; U.S. patent application Ser. No. 08/397,809 entitled LOOKASIDE BUFFER FOR ADDRESS TRANSLATION IN A COMPUTER SYSTEM filed Mar. 3, 1995, by inventors Leon Kuo-Liang Peng, Yolin Lih, and Chih-Wei David Chang now abandoned; U.S. patent application Ser. No. 08/397,893 entitled RECLAMATION OF PROCESSOR RESOURCES IN A DATA PROCESSOR filed Mar. 3, 1995 by Michael C. Shebanow, Gene W. Shen, Ravi Swami, and Niteen Patkar now abandoned; U.S. patent application Ser. No. 08/397,891 entitled METHOD AND APPARATUS FOR SELECTING INSTRUCTIONS FROM ONES READY TO EXECUTE filed Mar. 3, 1995 by Michael C. Shebanow, John Gmuender, Michael A. Simone, John R. F. S. Szeto, Takumi Maruyama, and DeForest W. Tovey now abandoned; U.S. patent application Ser. No. 08/397,911 entitled HARDWARE SUPPORT FOR FAST SOFTWARE EMULATION OF UNIMPLEMENTED INSTRUCTIONS filed Mar. 3, 1995 by Shalesh Thusoo, Farnad Sajjadian, Jaspal Kohli, and Niteen Patkar now U.S. Pat. No. 5,632,028; U.S. patent application Ser. No. 08/398,284 entitled METHOD AND APPARATUS FOR ACCELERATING CONTROL TRANSFER RETURNS filed on Mar. 3, 1995 by Akiro Katsuno, Sunil Savkar, and Michael C. Shebanow now abandoned; U.S. patent application Ser. No. 08/398,066 entitled METHODS FOR UPDATING FETCH PROGRAM COUNTER filed Mar. 3,1995 by Akira Katsuno, Niteen A. Patkar, Sunil Savkar, and Michael C. Shebanow now abandoned; U.S. Patent application Ser. No. 08/398,151 entitled METHOD AND APPARATUS FOR RAPID EXECUTION OF CONTROL TRANSFER INSTRUCTIONS filed Mar. 3, 1995 by Sunil Savkar; U.S. patent application Ser. No. 08/397,910 entitled METHOD AND APPARATUS FOR PRIORITIZING AND HANDLING ERRORS IN A COMPUTER SYSTEM filed Mar. 3, 1995 by Chih-Wei David Chang, Joel Fredrick Boney, and Jaspal Kohli; U.S. patent application Ser. No. 08/397,800 entitled METHOD AND APPARATUS FOR GENERATING A ZERO BIT STATUS FLAG IN A MICROPROCESSOR filed Mar. 3, 1995 by Michael Simone now U.S. Pat. No. 5,638,312; and U.S. patent application Ser. No. 08/397,912 entitled ECC PROTECTED MEMORY ORGANIZATION WITH PIPELINED READ-MODIFY-WRITE ACCESS filed on Mar. 3, 1995 by Chien Chen and Yizhi Lu, are each hereby incorporated by reference in their entirety.

U.S. application Ser. No. 08/457,049 entitled METHOD AND APPARATUS FOR ROTATING ACTIVE INSTRUCTIONS IN A PARALLEL DATA PROCESSOR by inventors Sunil Savkar, Michael C. Shebanow, Gene W. Shen, and Farnad Sajjadian filed Jun. 1, 1995; U.S. application Ser. No. 08/456,746 entitled PROGRAMMABLE INSTRUCTION TRAP SYSTEM AND METHOD by inventors Sunil Savkar, Michael C. Shebanow, Gene W. Shen, and Farnad Sajadian filed Jun. 1, 1995; are each hereby incorporated by reference in their entirety.

U.S. patent application Ser. No. 08/487,801 entitled PROCESSOR STRUCTURE AND METHOD FOR TRACKING INSTRUCTION STATUS TO MAINTAIN PRECISE STATE filed Jun. 7, 1995 by inventors Gene W. Shen, John Szeto, Niteen A. Patkar, and Michael C. Shebanow; U.S. patent application Ser. No. 08/478,025 entitled PROCESSOR STRUCTURE AND METHOD FOR AGGRESSIVELY SCHEDULING LONG LATENCY INSTRUCTIONS INCLUDING LOAD/STORE INSTRUCTIONS WHILE MAINTAINING PRECISE STATE filed Jun. 7, 1995 by inventors Gene W. Shen, John Szeto, Niteen A. Patkar, Michael C. Shebanow, and Michael A. Simone now U.S. Pat. No. 5,651,124; U.S. patent application Ser. No. 08/483,958 entitled PROCESSOR STRUCTURE AND METHOD FOR MAINTAINING AND RESTORING PRECISE STATE AT ANY INSTRUCTION BOUNDARY filed Jun. 7, 1995 by inventors Gene W. Shen, John Szeto, Niteen A. Patkar, and Michael C. Shebanow now U.S. Pat. No. 5,649,136; U.S. patent application Ser. No. 08/476,419 entitled PROCESSOR STRUCTURE AND METHOD FOR CHECKPOINTING INSTRUCTIONS TO MAINTAIN PRECISE STATE filed Jun. 7, 1995 by inventors Gene W. Shen, John Szeto, Niteen A. Patkar, and Michael C. Shebanow now U.S. Pat. No. 5,659,721; U.S. patent application Ser. No. 08/473,223 entitled PROCESSOR STRUCTURE AND METHOD FOR A TIME-OUT CHECKPOINT filed Jun. 7, 1995 by inventors Gene W. Shen, John Szeto, Niteen A. Patkar, and Michael C. Shebanow now U.S. Pat. No. 5,644,742; U.S. patent application Ser. No. 08/484,795 entitled PROCESSOR STRUCTURE AND METHOD FOR TRACKING FLOATING-POINT EXCEPTIONS filed Jun. 7, 1995 by inventors Gene W. Shen, John Szeto, and Michael C. Shebanow; now U.S. patent application Ser. No. 08/472,394 entitled PROCESSOR STRUCTURE AND METHOD FOR RENAMABLE TRAP-STACK filed Jun. 7, 1995 by inventors Hideki Osone and Michael C. Shebanow; and U.S. patent application Ser. No. 08/482,073 entitled PROCESSOR STRUCTURE AND METHOD FOR WATCHPOINT FOR PLURAL SIMULTANEOUS UNRESOLVED BRANCH EVALUATION filed Jun. 7, 1995 by inventors Gene W. Shen, Michael C. Shebanow, Hideki Osone, and Takumi Maruyama now U.S. Pat. No. 5,655,115, are each hereby incorporated by reference in their entirety.

Claims


We claim:
1. In a central processing unit for executing instructions issued by an instruction issue unit, said processor having a data store, an instruction issue unit, an instruction execution unit, and an instruction issue and execution scheduler; a method for tracking speculative instruction execution in a processor, said method comprising the steps of:
defining a data structure in the data store, the data structure providing storage for a plurality of instruction identification tags and a plurality of activity bits, each identifying an instruction active or inactive status, uniquely associated with each particular one of the plurality of instruction identification tags;
assigning an identification tag with its associated activity bit to each particular instruction issued by the instruction issue unit;
setting the activity bit stored in the data structure to a first state for a particular associated instruction to identify the particular instruction as an active instruction when the particular instruction becomes currently active in the processing unit; and
clearing the activity bit stored in the data structure to a second state for a particular associated instruction to identify the particular instruction as an inactive instruction when the particular instruction completes execution normally without error.

2. A method as in claim 1, wherein the data structure comprises a circular data structure having n-addressable locations 0, 1, 2, . . . , n-2, n-1; and wherein addressable location n-1 is logically adjacent addressable location 0; and
wherein the step of assigning an instruction tag to each particular instruction issued by the instruction issue unit comprises assigning an instruction tag with its associated activity bit to a unique one of the n-locations.

3. A method as in claim 2, wherein the identification tag is a numerical serial number and the step of assigning an identification tag with its associated activity bit to each issued instruction comprises assigning monotonically increasing serial numbers, modulo-n, to subsequently issued instructions as each instruction is issued, so that after n-1 serial numbers have been assigned to issued instructions the serial numbers are reassigned in circular manner and the next issued instruction is assigned serial number 0, and wherein an instruction issued earlier in time is an older instruction than an instruction issued later in time, and the serial number associated with the earlier instruction is an older serial number, independent of the numerical magnitude of the serial number.

4. A method as in claim 3, further comprising the steps of:
tracking the execution status of each issued instruction based on the state of its associated activity bit;
evaluating each activity bit in the circular data structure and identifying the oldest serial number associated with an active instruction and thereby determining the oldest issued and still active instruction; and
committing an instruction having a serial number older than the serial number associated with said oldest active instruction so that said instruction having an older serial number cannot be undone; and
reclaiming processor resources allocated to the committed instruction.

5. A method as in claim 4, wherein the reclaimed processor resources include the instruction serial number assigned to the committed instruction and the storage location in the data structure associated with the committed instruction.

6. In a speculative out-of-order execution processor, a method for tracking instruction status comprising the steps of:
providing an addressable data structure within the processor having a plurality of storage locations for storing instruction activity status data for each of a plurality of issued instructions;
assigning an instruction identification tag to each of the issued instructions;
associating each assigned identification tag with one of the plurality of storage location;
storing an activity data in the data structure identifying a particular issued instruction as active at the time the particular instruction is issued;
detecting completion of execution of the particular instruction; and
storing data identifying the particular instruction as inactive when the particular instruction completes execution without the occurrence of any one of a predetermined set of execution error conditions.

7. A method as in claim 6, wherein the predetermined set of execution error conditions includes an instruction execution error, an exception error condition, a hardware fault, and a branch instruction misprediction.

8. In a speculative out-of-order processor having an instruction issue scheduler an instruction issue unit, and an instruction execution unit for executing instructions to completion, a method for tracking issued instruction status comprising the steps of:
providing an n-location data structure for storing instruction activity status data;
defining a unique storage location address for each of the n locations in the n-location data structure;
associating each particular instruction issued by the instruction issue unit with one of the unique storage location addresses;
upon issuance of the particular instruction, storing instruction activity status data in the data structure at the data structure location associated with the particular instruction to indicate that the particular instruction is active status;
detecting instruction execution completion and any error condition associated with the instruction execution completion in the execution unit for each issued instruction;
if the instruction has completed execution without an error condition then updating the activity status data in the associated storage location to indicate that the instruction is inactive, and if the instruction has completed with an error condition or has not competed execution then retaining the original activity status data identifying the particular instruction as active in the associated storage location; and
communicating instruction activity status data to the instruction issue scheduler so that said instruction issue scheduler can schedule further instructions for execution without concern for the status of previously issued instructions which are inactive at that time.

9. A method as in claim 8, further comprising the steps of:
communicating the error condition associated with execution completion of any of the instructions to an error handler unit in the processor; and
modifying instruction issue and execution sequence scheduling in response to the communicated error condition.

10. In a speculative out-of-order execution processor having an instruction issue unit for issuing instructions and a data store for storing data within the processor, a method for tracking precise processor state comprising the steps of:
defining a first data structure having n addressable data storage locations in the data store of the processor;
allocating a plurality of unique instruction identification tags for subsequent assignment to issued instructions;
assigning one of the plurality of allocated unique identification tags to each particular instruction at the time the particular instruction is issued;
associating each instruction tag with one of the addressable storage locations in the first data structure within the processor;
for each particular instruction, updating the data stored in the data storage location associated with the particular instruction in response to instruction activity status changes for each instruction, said activity status changes including a change in status from an active status at the time of instruction issue to an inactive status at the time an instruction is completed without predetermined error; and
maintaining a plurality of pointers to selected ones of the n addressable data storage locations and moving the pointers to identify different addresses in response to the instruction activity status changes.

11. A method as in claim 10 wherein the step of maintaining a plurality of pointers comprises the steps of:
evaluating the data stored in the addressable storage locations to determine instruction status for each of a plurality of issued instructions;
computing a plurality of different processor condition indicators based on the data stored in the addressable storage locations, each different processor condition indicator identifying an instruction with its associated identification tag that has attained a particular execution status selected from a set of predetermined execution statuses; and
storing the processor condition indicators as the pointers in a third data store within said processor; and
wherein the step of moving the pointers in response to the instruction activity status changes comprises the steps of:
repeating said evaluating, computing of different processor condition indicators, and storing steps at predetermined time intervals to update said processor condition indicators.

12. A method as in claim 10, wherein the set of predetermined execution statuses includes the last issued instruction, the last committed instruction, and the last retired instruction.

13. A method as in claim 10, wherein the set of predetermined execution statuses further includes the last non-memory referencing instruction, and the last predicted branch instruction.

14. A method as in claim 10, wherein said status data includes an active-bit that is set at the time the instruction is issued and cleared when execution completes without error.

15. A method as in claim 14, wherein said plurality of pointers include:
a first pointer that points to the last issued instruction, and
a second pointer that points to the last committed instruction which is the last instruction that has completed without error and for which all sequentially earlier issued instructions have completed without error.

16. A method as in claim 15, wherein said plurality of pointers further include:
a third pointer that points to the last reclaimed instruction which is the last instruction for which allocated processor resources have been reclaimed.

17. A method as in claim 16, wherein each said identification tag is one of a monotonically increasing sequence of numerical serial numbers (modulo n) and said addressable storage locations in said first data structure are addressed by said serial numbers.

18. A method as in claim 17, wherein said first data structure having n addressable data storage locations is a circular data structure, and wherein advancement of said pointers to higher valued serial numbers is accomplished with modulo-n arithmetic so that incrementing a pointer from address n places said pointer at address 0.

19. A method as in claim 15, wherein said step of maintaining said pointers by moving said pointers in response to the instruction activity status changes comprises the steps of:
advancing said first issued instruction pointer forward toward higher instruction serial numbers (modulo-n) by one serial number for every instruction issued;
advancing said second committed instruction pointer forward toward higher instruction serial numbers (modulo-n) based on a serial number sequential evaluation of the state of the active-bit for each storage location and first predetermined rules but not advancing said second pointer beyond said first pointer; and
advancing said third retired pointer forward toward higher instruction serial numbers (modulo-n) based on a serial number sequential evaluation of the state of the active-bit for each location and second predetermined rules but not advancing said third pointer beyond said second pointer.

20. A method as in claim 19, wherein said first predetermined rules include:
at every predetermined number of machine clock cycles, advancing said second pointer forward to a new address location no higher than the highest data storage location address for which all activity status bits in lower address locations are inactive "0" but no higher than the address of said first pointer at that machine cycle.

21. A method as in claim 19, wherein said second predetermined rules include: at every predetermined number of machine clock cycles, advancing said third pointer forward to a new address location no higher than the highest data storage location address for which all activity status bits in lower address locations are inactive "0" but no higher than the address of said second pointer at that machine cycle.

22. A method as in claim 19, wherein said pointers are moved as a result of instruction issue, commitment, and retirement.

23. A method as in claim 19, wherein said second and third pointers are not moved past instructions for which exceptions, branch instruction mispredicts, and error conditions are detected.

24. In a speculative out-of-order execution processor having data storage means and means for scheduling instruction issue, dispatch, execution, and retirement; a method of tracking status of instructions comprising the steps of:
allocating n unique instruction serial numbers that define the maximum number of instructions that can be outstanding concurrently in said processor;
defining a data structure having n independently addressable storage locations in said data storage means within said processor;
associating each said allocated serial number with one of said addressable storage locations;
in response to instruction issue by said processor, assigning one of said serial numbers to each said issued instruction as an identification tag, and maintaining said assignment until said instruction has completed execution and has been retired so that a temporally unique correspondence is established between each issued instruction and a unique one of said data storage locations;
storing an ACTIVE status indicator in said data storage location associated with said instruction when said instruction is issued;
storing an INACTIVE status indicator in said data storage location associated with sized instruction when said instruction completes execution without error;
defining a plurality of machine pointers that point to said addressable storage locations, and
tracking instruction status and machine resources allocated to each instruction by moving said plurality of machine pointers in response to instruction activity status as ACTIVE or INACTIVE and predetermined rules.

25. A method as in claim 24, wherein said data storage location is single bit in a multi-bit register, said ACTIVE status indicator is a 1-bit, and said INACTIVE status indicator is a 0-bit.

26. A method as in claim 24, wherein said plurality of machine pointers comprise:
a first last issued instruction pointer that points to the last issued instruction;
a second last committed instruction pointer that points to the last committed instruction, which instruction is the last instruction that has completed without error and for which all in-order sequentially earlier issued instructions have completed without error; and
a third retire and reclaim pointer that points to the last instruction for which all machine resources allocated by the processor to the instruction and all in-order sequentially earlier instructions have been reclaimed.

27. A method as in claim 26, wherein said data structure is a circular data structure having n-addressable locations 0, 1, 2, ..., n-2, n-1 so that addressable location n-1 is logically adjacent addressable location 0, said serial numbers are consecutive integers, and said step of associating each said serial numbers with one of said addressable storage locations comprises associating the lowest serial number with a first bit position in said circular data structure and associating said highest serial number with the nth bit position in said circular data structure.

28. A method as in claim 27, wherein said step of tracking instruction status and machine resources allocated to each instruction by moving said plurality of machine pointers comprises the steps of:
initializing said first, second, and third pointers to the same initial storage location address prior to issuing a first instruction;
advancing said first issued instruction pointer (P1) forward toward higher instruction serial numbers (modulo-n) by one serial number for every instruction issued, that is P1.sub.new .rarw.(P1.sub.old +1 modulo n);
advancing said second committed instruction pointer forward toward higher instruction serial numbers (modulo-n) based on a serial number sequential evaluation of the state of the active-bit for each storage location and first predetermined rules but not advancing said second pointer beyond said first pointer; and
advancing said third retired pointer forward toward higher instruction serial numbers (modulo-n) based on a serial number sequential evaluation of the state of the active-bit for each location and second predetermined rules but not advancing said third pointer beyond said second pointer;
wherein said advancing of said pointers to higher valued serial numbers is accomplished in said circular data structure with modulo-n arithmetic so that incrementing any of said first, second, or third pointers from address n-1 places said pointer at address n, and incrementing said pointer from address n places said pointer at address 0.
29. A method as in claim 28, wherein said first predetermined rules include:
at every predetermined number of machine clock cycles, advancing said second pointer forward to a new address location no higher than the highest data storage location address for which all activity status bits in lower address locations are inactive "0" but no higher than the address of said first pointer at that machine cycle; and
wherein said second predetermined rules include:
at every predetermined number of machine clock cycles, advancing said third pointer forward to a new address location no higher than the highest data storage location address for which all activity status bits in lower address locations are inactive but no higher than the address of said second pointer at that machine cycle.
30. A method as in claim 29, wherein said new address locations are determined based on boolean operations of said status activity bits.
31. A method as in claim 29, wherein said first predetermined rules further comprise the step of limiting the number of addresses said second pointer can be advanced during said predetermined number of machine cycles to a first maximum number of addresses; and wherein said second predetermined rules further comprise the step of limiting the number of addresses said third pointer can be advanced during said predetermined number of machine cycles to a second maximum number of addresses wherein said second maximum number of addresses is less than said first maximum number of addresses.
32. A method as in claim 29, wherein said second and third pointers are not moved past instructions for which exceptions, branch instruction mispredicts, or error conditions are detected.
33. In a processor having data storage means, at least one instruction execution unit, and an instruction issue unit; an apparatus for maintaining precise state comprising:
means for receiving instruction execution status from said execution unit for each issued instruction, said instruction execution status including status that a particular instruction is either ACTIVE or INACTIVE, and status that if a particular instruction has completed whether the completion was without execution error;
means for tracking machine resource availability information based on the issue, execution, completion, and retirement status of instructions in said processor and of communicating said information to said instruction issue unit; and
means, in said instruction issue unit, responsive to said resource availability information for continuing issuance of instructions if machine resources are available to process further instructions and for stalling issuance of further instructions when machine resources are not available to process said instructions.
34. An apparatus as in claim 33, wherein said means for tracking machine resource availability information includes:
a data structure defined within said data storage means for storing issue, execution, completion, and retirement status information for each issued instruction, including an instruction activity status information identifying the current status of each instruction as ACTIVE or INACTIVE;
first logic means for determining the last issued instruction,
second logic means for determining the last committed instruction which is the last instruction that has completed without error and for which all sequentially earlier instructions have completed without error, and
third logic means for determining the last reclaimed instruction which is last instruction for which allocated processor resources have been reclaimed.
35. An apparatus as in claim 34, further comprising:
means for identifying predicted branch instructions issued by said issue unit;
means for detecting predicted branch instructions issued by said issue unit;
means for detecting executed predicted branch instructions that were mispredicted and for initiating recovery from said mispredicted branches; and
means for handling instruction execution exceptions according to predetermined exception handling rules.
36. In a central processing unit having a data store, an instruction issue unit, an instruction execution unit, and an instruction issue and execution scheduler; an apparatus for tracking speculative instruction execution in said processor comprising:
a data structure defined in said data store;
means for assigning an identification tag to each instruction issued by the issue unit in response to an instruction issue operation;
means for associating an activity data stored in the data structure and uniquely associated with each particular one of said issued instructions based on the assigned identification tag;
means for storing an activity data in the data structure to identify an ACTIVE instruction state when the instruction is issued; and
means for storing said activity data in the data structure to identify an INCATIVE instruction state when the instruction completes execution without error.

Description

TECHNICAL FIELD

The field of the subject invention concerns apparatus, systems, and methods for increasing processor performance while maintaining precise state in a speculative out-of-order execution processor.

BACKGROUND OF THE INVENTION

With processors, control flow instructions (such as branch instructions), latency due to memory transactions, and instructions requiring multi-cycle operations often prevent the processor from sustaining peak instruction execution bandwidth because "bubbles" are introduced into the pipeline. Performance can be improved by implementing speculative out-of-order Instruction execution to enhance performance. Conventionally, when an intermediate result from an instruction is not available for a subsequent instruction, the processor ceases execution, or "stalls" until that intermediate result is available.

Two techniques, speculative execution and out-of-order execution, help to maintain high execution bandwidth in modern processors. Speculative execution is a known technique in which, when a branch is encountered without the information from an earlier process step to select the appropriate branch, a prediction is made and instruction dispatch and execution proceeds based on that prediction. If the prediction later proves to be incorrect, the mispredicted instruction sequence must be undone using branch mispredict recovery. Speculative execution allows the processor to continue issuing and executing instructions. Known prediction schemes minimize the frequency of mispredicted execution in order to improve performance. However, maintaining precise state in a speculative machine is complicated and incurs undesirable overhead. Out-of-order execution is a technique that hides memory and multi-cycle instruction latency. In processors implementing out-of-order execution, instructions are dynamically reordered and executed in a different order than the sequential program order to reveal available instruction level parallelism.

A "precise exception" model has been shown to be an important feature for simplifying software resolution of exception conditions, but maintenance of precise exceptions is complicated in machines implementing speculative out-of-order execution. Machine state is generally processor specific and architectural state includes all control/status registers, data registers, address registers and all external memory state. For example, SPARC-V9 control/status registers are identified on pages 29-30 of the SPARC-V9 Architecture Manual. it is what the software and software programmer sees. Machine state is a super-set of architectural state that is processor specific and includes everything else about the state of the machine. A faulting instruction is an instruction which generates an exception. An exception is any situation or condition that tells the processor to stop and investigate the situation that caused the exception before proceeding. An exception need not be an error condition and includes interrupts for example. Execution traps may result from exceptions. In a processor implementing a precise exception model, a fault or exception does not modify architectural state. Architectural state has been modified for all instructions prior to the faulting instruction, but architectural state has not been modified for instructions after the faulting instruction. When a precise exception model is not provided, the software must identify the faulting instruction and then calculate a restart point for either retrying the faulting instruction or bypassing the faulting instruction and executing the next instruction.

Precise state maintenance techniques for short-pipelined, single-issue machines are known. Generally, a short-pipelined machine has fewer than about four or five stages including an instruction fetch, issue, execute, and write-back stage wherein state is modified. Single-issue implementations simplify recovery in the event of an exception or misprediction because the pipeline may be cleared without worrying about which instruction in a pipeline stage should be flushed. In these conventional techniques, any exception that may have occurred is detected prior to modifying architectural state. When an exception is detected, the pipeline is flushed of instructions and any writeback of data, status, or results to architectural state is intentionally blocked to prevent modification of architectural state.

In speculative out-of-order superscalar (multi-issue) implementations, maintaining precise state is much more difficult than for a single-issue machine. In such speculative out-of-order machines, instructions which generate errors may execute speculatively and method and structure must be provided to undo any architectural state modification which occur after the location of the fault. Also, exceptions may generally be detected in a different order than program order. Therefore, an out-of-order processor must be able to unscramble the exceptions and determine which instructions should be allowed to complete (and modify architectural state) and which instructions should be undone.

FIG. 1 shows a conventional approach using a re-order buffer for maintaining precise state in a speculative processor. The reorder buffer is implemented by a first-in/first-out stack that effectively introduces a temporal delay between instruction execution completion (when the result is available) and the time when machine state is modified as the result of execution completion. Precise state is maintained by preventing writeback to memory of a mispredicted branch instruction, execution exception, or like condition. Precise state maintained by preventing state modification while the instruction is still speculative rather than by restoring state to undo mispredicted speculative execution.

FIG. 2 shows a conventional approach for maintaining precise state in a speculative processor by storing state in a "checkpoint" prior to executing an instruction and later restoring state from the stored checkpoint information. In conventional checkpointing, every speculatively executed instruction that may result in modification of machine state is check pointed. Each processor has a set of state parameters, including for example, all control/status registers, data register values, and the like, that define the state of the machine. In order to restore a previous machine state, all of the state defining parameters must be stored so that they can be restored if and when the need arises. In conventional checkpointing techniques, every state defining parameter is typically stored for every instruction that may modify any one of the state defining parameters. For every checkpointed instruction, conventional checkpointing stores all state information that may be changed by any one of the checkpointed instructions, and not just the state that may be modified by that particular instruction about to be executed. For example, in a machine requiring 100 state parameters to define machine state, if execution of instruction "X" may modify only one control/status register, execution of that instruction will still require storing 100 state parameters, not just the one that may be modified by that instruction.

FIG. 3 shows the structure and content of an exemplary sequence of instructions and conventional checkpoints in diagrammatic fashion for purposes of illustration only and do not represent actual checkpoint data for any particular machine or CPU. Since conventional checkpoints are fixed size, this means that each checkpoint must be relatively large compared to the state actually modified by a particular instruction. Recovery using conventional checkpointing includes restoring state using a stored checkpoint closest to but prior to the faulting or mispredicted speculative instruction, backing up the program counter to just after the checkpointed instruction, and re-executing instructions from the checkpointed instruction forward.

Checkpointing, re-order buffers, history buffers, and the use of a future file have been described as ways to manage out-of-order execution and maintain precise state in some circumstances. Conventional Checkpointing which permits restoration of machine state at a checkpoint boundary, but not at an arbitrary instruction boundary, is described by Hwu and Patt (W. Hwu and Y. N. Patt, "Checkpoint Repair for High Performance Out-of-order Execution Machines", Proceedings of the 14th Annual Symposium on Computer Architecture, (June 1987), pp. 18-26.). Methods of implementing precise interruptions in pipelined RISC processors are described by Wang and Emneft ("Implementing Precise Interruptions in Pipelined RISC Processors", IEEE Micro, August 1993, pp.36-43.). Methods of implementing precise interrupts in pipelined processors are also described by Smith and Pleszkun (J. E. Smith and A. R. Pleszkun, "Implementation of Precise Interrupts in Pipelined Processors", Proceedings of the 12th Annual International Symposium on Computer Architecture, (June 1985), pp. 36-44.). An overview of conventional techniques for designing high-performance superscalar microprocessors is provided by Mike Johnson (M. Johnson [a.k.a. William Johnson], Superscalar Microprocessor Design, Prentice-Hall, Inc., Englewood Cliffs, N.J. 07632, 1991, ISBN 0-13-875634-1.). Each of these references are hereby incorporated by reference in their entirety.

Although at least some of these techniques improve performance, they are not entirely satisfactory because they either limit the degree of speculation or they allow only coarse machine state restoration not instruction level machine state restoration. Conventional re-order buffer techniques limit the degree of speculation because the re-order buffer length is linear with respect to the degree of speculation supported. That is, for each instruction that may have to be undone, the instruction execution result must be stored in the re-order buffer. For example, if the machine permits sixty-four outstanding and potentially speculative instructions, the re-order buffer must contain at least sixty-four locations for storing results. In conventional checkpointing, the number of checkpoints checkpointed are typically fewer than the number of outstanding instructions in the machine, but the amount of data stored in each checkpoint is very large (the entire state of the machine at the time). The checkpoint storage requirements place a burden on processor chip substrate area. Ideally, a scheme for maintaining precise state should be scalable, at a linear or lower order relationship (e.g. logarithmic) to larger numbers of concurrently outstanding speculative instructions. Reorder buffer entry storage areas must have sufficient width to store the data and these techniques also require associative look-up to all entries. This can be difficult for very large reorder buffers.

Furthermore, conventional methods of restoring precise state are incomplete in some environments. For example, where an instruction executed as the result of a speculative branch modifies the state of an external "dumb" device, where restoration of state typically involves restoring state at a checkpoint boundary prior to the point of the modification to the external device and then forward re-execution of non-faulting instructions including the instruction that modifies the external device. In such cases re-executing the faulting instruction alone will not undo the effect of the change in the state of the external device and the re-execution of the non-faulting instructions between the point where state is restored and the faulting instruction causes further problems.

Conventional checkpointing saves machine state at a limited number of different points in the instruction stream and restores the checkpointed architectural state in order to recover from branch mispredictions or execution errors. Conventional checkpointing does not checkpoint each instruction. Conventional checkpointing can restore machine state only at checkpoint boundaries, that is at the instruction for which a checkpoint was established. When a fault or execution error occurs, the checkpointed state is restored by known techniques, thereby in effect "undoing" all instructions subsequent to the checkpoint, and thus including the faulting instruction(s). Then the instructions are re-executed serially forward, for example in a single-issue mode, from the checkpointed instruction to reach the faulting instruction.

This conventional checkpointing technique allows speculative execution but is disadvantageous in many respects. For example, it does not provide robust exception recovery. On intermittent errors, conventional backup of the machine to the checkpoint prior to the exception creating instruction and then re-execution of the instructions following the checkpoint may not result in deterministic behavior and may compound machine state disruption by propagating erroneously altered state in the re-executed instructions. Conventional checkpointing and machine backup procedures do not attempt to minimize instruction re-execution. Furthermore, catastrophic machine errors such as hardware failures and machine time-outs may deadlock the processor if all of the instructions between the restored checkpointed instruction and the instruction causing the exception are re-executed.

Conventional "re-order buffers" manage speculative instructions in a re-order buffer which is a first-in-first-out (FIFO) memory structure of a predefined size. When an instruction completes execution, data value results from the execution are written into the re-order buffer and as they move through the buffer and emerge at the top, the data values are written from the re-order buffer into a register file. The size of the re-order buffer effectively defines the delay between completion of execution of the instruction and permanent modification of architectural state. Once the data values are written to the register they cannot be undone. "Associative lookup," a procedure by which entries in the re-order buffers are associated with entries in the register file, is discussed in M. Johnson, Superscalar Microprocessor Design at page 49 et seq.

Re-order buffer schemes have at least three limitations. First, in conventional re-order buffer schemes, only the results of instruction execution are saved in the re-order buffer and the Program Counter (PC) values are not saved. Therefore, any branch mispredict recovery using re-order buffers requires the additional steps of PC reconstruction, instruction fetch, and instruction issue. As a result, conventional recovery from a branch mispredict using re-order buffers is delayed.

Second, re-order buffers generally only allow speculative execution of a limited mix of instructions. For example, traps detected during the instruction issue stage (issue traps) may not be speculatively executed using re-order buffers because they generally involve control register updates. Re-order buffer techniques do not support speculative execution of an instruction involving control register updates. The inability to speculatively enter issue traps in certain instruction set architectures (e.g. "spill/fill" traps in the Sun Microsystems SPARC architecture) can impose significant performance limitations.

Third, re-order buffer size is generally a direct linear function of the number of outstanding instructions allowed in the processor. For example, in the processor that allows sixty-four outstanding instructions a reorder buffer having sixty-four entries would be required. Allocating the large number of buffer registers within the processor can be prohibitive, especially in dataflow processors where large active instruction windows, that is a relatively large number of concurrently outstanding instructions allow improved extraction of instruction level parallelism. Dataflow processors are processors where the order of instruction execution is determined by the operand or data availability, rather than based on incrementing the program counter as in conventional non-data flow processors.

Future files are a modification of the re-order buffer technique that avoids the associative lookup problem in the reorder buffer. Future Files are described in M. Johnson's Superscalar Microprocessor Design at pages 94-95. History buffers are a method and structure proposed for implementing precise interrupts in a pipelined scalar processor with out-of-order completion and is described in M. Johnson's Superscalar Microprocessor Design at pages 91-92.

"The SPARC Architecture Manual", Version 9, by D. Weaver and T. Germond, Englewood Cliffs (1994), which is hereby explicitly incorporated by reference in its entirety, describes a particular type of out-of-order speculative execution processor. The SPARC V9 architecture requires a Floating-Point State Register (FSR). The FSR contains 3 fields, the FSR.sub.-- accrued.sub.-- exception (FSR.aexc) field, the FSR.sub.-- current.sub.-- exception (FSR.cexc) field, and the FSR.sub.-- floating.sub.-- point.sub.-- trap.sub.-- type (FSR.ftt) field, which are updated when a floating point exception occurs and are used by a trap handling routine for handling a trap caused by the floating point exception. The updating of these fields is difficult in an out-of-order execution processor because instructions execute and complete in a different order than program order. Since these fields need to be updated or appear to be updated as if instructions are issued and executed in program order, an apparatus and corresponding method is required to track these exceptions and correctly update the FSR register.

For data processors which can execute instructions speculatively, branch direction (taken or not-taken), or branch address (target address or address next to the branch instruction) can be predicted before they are resolved. Later, if these predictions turn out to be wrong, the processor backs up to the previous state and re-starts executing instructions in the correct branch stream. However, most superscalar processors available in the market can evaluate only one branch per cycle to check whether or not branch predictions are correct. But, multiple predicted branches are often ready to evaluate in one cycle. Thus, branch evaluations which could otherwise be performed need to be delayed. Delaying branch evaluations affects the processor performance significantly.

Furthermore, in conventional speculative execution processors, when a trap occurs, the processor has to wait until all predicted branches have been resolved to make sure that the trap is real and not speculative. The easiest way for a processor to make sure that a trap is real is to synchronize the processor (i.e., execute and complete all instructions issued prior to the occurrence of the trap condition) before taking the trap. However, doing so for traps that occur frequently degrades the performance of the processor. This is especially true in the SPARC-V9 architecture where spill/fill issue traps and software traps (Tcc instructions) often occur. This problem needs to be resolved in order to increase processor performance.

SUMMARY OF THE INVENTION

The foregoing problems are solved by the method and structure of this invention which provides for tracking and maintaining precise state by assigning a unique identification tag to each instruction at the time of issue, associating the tag with a storage location in a first active instruction data structure, updating the data stored in the storage location in response to instruction activity status changes for each instruction, and maintaining a plurality of pointers to the storage locations that move in response to the instruction activity status. The status information includes an active-bit that is set at the time the instruction is issued and cleared when execution completes without error. Pointers are established that point to the last issued instruction (issued instruction pointer), the last instruction that has completed without error and for which all sequentially earlier instructions have completed without error (last committed instruction pointer), and the last instruction for which allocated processor resources have been reclaimed (reclaimed instruction pointer). These three pointers are moved forward toward the later issued instructions along the data structure based on comparisons of the active-bit for each location associated with one instruction in the data structure and predetermined rules. Exceptions or error conditions for any instruction prevent changing the active-bit so that movement of the pointers is controlled prevented under these conditions.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a conventional approach using a re-order buffer for maintaining precise state in a speculative processor.

FIG. 2 shows a conventional approach for maintaining precise state in a speculative processor by storing state in a checkpoint prior to executing an instructions and later restoring state from the stored checkpoint information.

FIG. 3 shows the structure and content of an exemplary sequence of instructions and conventional checkpoints in diagrammatic fashion for purposes of illustration only and do not represent actual checkpoint data for any particular machine.

FIG. 4 is a top level functional block diagram of an embodiment of the inventive data processor including a central processing unit.

FIG. 5 shows the stages of an embodiment of the novel instruction pipeline of the CPU for some typical instruction types.

FIG. 6 shows an exemplary Branch Block (BRB) of the inventive CPU which includes an instruction prefetch and cache unit and a fetch unit.

FIG. 7 shows an exemplary Issue Block (ISB) including an Issue Unit (ISU), a Precise State Unit (PSU), a Freelist Unit, and a Control Register File.

FIG. 8 shows exemplary signals received by the Issue Unit from the register renaming Freelist Unit and used to implement the inventive register renaming.

FIG. 9 shows an embodiment of the Precise State Unit (PSU).

FIG. 10 shows a functional block diagram of components of an embodiment of the Issue/Commit/Reclaim Unit (ICRU) and input and output signals associated with its operation.

FIG. 11 shows diagrammatic illustrations of embodiments of the Active Instruction Data Structure and Memory Instruction Data Structure for storing instruction status information and various pointers associated with a first set of exemplary data stored in the data structures.

FIG. 12 shows an exemplary Status Control Logic Unit including some structural components of A-Ring Bit Set/Clear Logic.

FIG. 13 is a flow-chart of an embodiment of the inventive method of using instruction tags to track status and maintain precise state.

FIG. 14 is a diagrammatic flow-chart of an embodiment of the inventive method for tracking instruction status to maintain precise state.

FIGS. 15A, 15B is a flow-chart diagram of the method for writing status information in the Active Instruction Ring and Memory Instruction Ring according to an embodiment of the inventive method for tracking instruction status to maintain precise state.

FIG. 16 shows an embodiment of the Register File and Rename Unit (RFRN) including a physical register file that includes physical registers.

FIG. 17 shows an embodiment of the Control Register file associated with register renaming.

FIG. 18 shows an embodiment of the Resource Reclaim File (RRF).

FIG. 19 illustrates the inventive register renaming method.

FIG. 20 is a diagrammatic flow-chart of an embodiment of the inventive method for maintaining precise state including instruction issue, deactivation, commit, and retirement.

FIG. 21 shows diagrammatic illustrations of embodiments of the Active Instruction Data Structure and Memory Instruction Data Structure for storing instruction status information and various pointers associated with a second set of exemplary data stored in the data structures.

FIG. 22 shows a functional block diagram of Load/Store Unit (LSU) within the Data Forward Block of the CPU.

FIG. 23 is a flow-chart of an embodiment of the inventive method for aggressively scheduling long latency instructions including load/store instructions while maintaining precise state.

FIG. 24 is a functional block diagram of Checkpoint Unit.

FIG. 25 is a functional Block diagram of Watchpoint Unit.

FIG. 26 is a diagrammatic flow-chart of an embodiment of the inventive method for checkpointing including an optional checkpointing enhancement for forming a time-out checkpoint.

FIG. 27 shows a functional block diagram of the major functional blocks within an embodiment of the Watchpoint Unit.

FIG. 28 shows an embodiment of the Watchpoint Unit showing particular embodiments of the Condition Code grabbing Logic, Evaluation logic, Watchpoint Elements Storage and Control Unit, and Target RAM.

FIG. 29 shows a schematic portion of an embodiment of Watchpoint, including Current Matching Logic, Array Matching Logic, and Condition Code Select Logic which includes Evaluation Ready and Evaluate Condition Code Logic.

FIG. 30 shows an exemplary timing chart associated with branch prediction and evaluation.

FIG. 31 shows structural detail associated with an embodiment of the Current Match Logic and Array Match Logic.

FIGS. 32A, 32B show structural detail associated with an embodiment of the Condition Code Select Logic.

FIG. 33 shows structural detail associated with an embodiment of the Evaluation Ready Logic.

FIG. 34 shows structural detail associated with an embodiment of the Evaluation True Logic.

FIG. 35 shows structural detail associated with an embodiment of the TARGET-RAM and Jump-Link Instruction Evaluation Logic.

FIG. 36 is a diagrammatic flowchart of an embodiment of the inventive Watchpoint method for plural simultaneous unresolved branch evaluation.

FIG. 37 is schematic representation of an embodiment of the Floating-Point Exception (FPEXCEP) Data Structure.

FIG. 38 show signal interfaces to the exemplary Floating-Point Exception Ring data structure.

FIG. 39 shows the floating point exception unit including the RD and AEXC logic.

FIG. 40 is a diagrammatic representation of an embodiment of the Backtrack Unit which includes Back-up and Backstep components.

FIG. 41 is a diagrammatic flow-chart of an embodiment of the inventive method for maintaining and restoring precise state at any instruction boundary including the backup and backstep procedures.

FIGS. 42A, 42B, 42C, 42D, 42E, and 42F show the manner in which the logical to physical mappings are restored in a register rename file.

FIG. 43 provides an illustrative example of a machine backtrack including a single machine backup to a checkpoint boundary followed by two machine backsteps to an instruction boundary.

FIG. 44 shows an exemplary Priority Logic and State Machine (PLSM).

FIG. 45 shows an exemplary Trap Stack Unit.

FIG. 46 shows an embodiment of Freelist Logic for storing a list of the storage elements of the Trap Stack.

FIG. 47 shows exemplary Rename Map Logic.

FIG. 48 shows an exemplary TrapStack RRF.

FIG. 49 shows the Trap Stack Checkpoint storage unit.

FIG. 50 shows Watchpoint WP.sub.-- ACTIVE.sub.-- VEC and WP.sub.-- MISPRED Logic.

FIG. 51 shows Trap Stack Backup Example.

FIG. 52 shows XICC Write Logic.

FIG. 53 shows XICC Grabbing Logic and Array Later Match Logic.

FIG. 54. shows Wait For Condition Code Logic.

DESCRIPTION OF SPECIFIC EMBODIMENTS

FIG. 4 is a top level functional block diagram of a data processor 50. The data processor includes a novel out-of-order speculative execution (superscalar) central processing unit (CPU) 51. The exemplary CPU is capable of executing the SPARC-V9
instruction set described in the SPARC-V9 Architecture Manual. "The SPARC Architecture Manual", Version 9, by D. Weaver and T. Germond, Englewood Cliffs (1994), which is hereby explicitly incorporated by reference. However, those skilled in the art will recognize that the novel design and methods described herein are not limited to the SPARC-V9 architecture.

Indeed, CPU 51 employs novel designs and methods for increasing throughput while maintaining precise machine state which are applicable to all data processors. Specifically, it employs (1) a novel design and method for tracking instructions to monitor and recover processor resources and state; (2) a novel design and method for aggressively scheduling long latency instructions such as load/store instructions; (3) a novel design and method for checkpointing machine state; and (4) a novel design and method for restoring machine state after detection of an excepton or misprediction.

An outline of the Description of Specific Embodiments Section of this Specification is provided here for convenient reference. The topical headers in this Specification are provided for convenient reference only, and are not to be interpreted as limiting the applicability of the disclosure provided in that section to any particular aspect or embodiment of the invention.

I. CPU OPERATION OVERVIEW

II. TRACKING INSTRUCTIONS

A. Instruction Fetch

B. Instruction Issue

1. Instruction Issue Overview

2. Instruction Serial Number Allocation and Instruction Status Storage

a. Instruction Issue-Commit-Reclaim Unit (ICRU)

b. Instruction Issue Related Pointers and Pointer Maintenance During Issue Stage

3. Register Renaming

4. Dispatch to Reservation Stations

C. Instruction Execution and Completion

D. PSU Participation in the Instruction Execution Stage

E. Instruction Commitment

F. Instruction Retirement and Resource Recovery

1. Updates to RRF and Rename Maps at Instruction Commitment and Retirement

2. Reading RRF at Instruction Retirement

3. Precise State Maintenance Using Pointers

III. AGGRESSIVE LONG LATENCY (LOAD/STORE) INSTRUCTION SCHEDULING

A. Memory Instruction Status Information Storage

B. NMCSN and PBSN Pointers for Tracking and Scheduling Long-Latency Information

C. Advancing NMCSN

D. Load-Store Unit (LSU) within Data Flow Block

1. Identifying In-Range Memory Referencing (long Latency) Instructions

2. Enhancement to the Basic Structure and Method For Aggressive Long Latency (Load/Store) Instruction Scheduling

IV. CHECKPOINTING

A. Structure of Checkpoint Allocation Register

B. Checkpoint Allocation

C. Checkpoint Retirement

D. Checkpoint Maintenance On Machine Backup

E. Timeout Checkpoint Enhancement

F. Maintaining and Restoring Precise State at Any Instruction Boundary

G. Method of Syncing Machine for Predetermined Instructions to Reduce amount of Checkpointed Data

H. Method of Checkpointing Register Rename Map to reduce amount of Checkpointed Data

V. RECOVERY FROM MISPREDICTS AND EXCEPTIONS

A. Misprediction Detection with Watchpoint for Simultaneous Plural Unresolved Branch/Jump-and-Link Instruction Evaluation

1. Activation of Watchpoint Elements

2. Watchpoint Grabbing of CC Data Directly from Data Forward Busses

3. Examples of Watchpoint Operation

4. Evaluation of Branch Instructions

5. Evaluation of JUMP-LINK (JMPL) Instructions

B. Exception Detection

1. Detecting Issue Traps

2. Detecting Execution Traps

C. Recovery by Backtracking the Processor to an Earlier State

1. Processor Backup to a Checkpoint Boundary

2. Mispredicted Instruction, RED Mode, and Execution Trap Initiated Backup

3. Processor Backstep to Any Instruction Boundary

D. Priority Logic and State Machine Operation During Exception & Misprediction Recovery

E. Handling of Traps with Trap Stack

I. CPU OPERATION OVERVIEW

CPU 51 fetches instructions from a level one (L1) instruction cache 52 and stores data in and retrieves data from a level one (L1) data cache 53. The L1 instruction and data caches may be physically positioned on the same chip/substrate as the CPU 51 or may be assembled on a different chip/substrate.

The CPU also interacts with a memory management control unit (MMU) 54 to determine the overall status of the data processor 50. Specifically, the MMU retrieves instructions from the external memory 56, stores data in and retrieves data from the external memory 56, receives data from and outputs data to the external I/O devices 57, provides diagnostic information to and receives diagnostic information from the external diagnostic processor 58, the CPU 51, and the caches 52 and 53, performs data address translations, and stores and tracks memory status information.

FIG. 5 shows the stages (and associated machine cycles) of the novel instruction pipeline of the CPU 51 for some typical instruction types, including predicted program control instructions, fixed and floating point instructions, and load/store instructions. To implement this pipeline, the CPU 51 includes a branch block (BRB) 59, an issue block (ISB) 61, and a data flow block (DFB) 62, as shown in FIG. 4.

Referring to FIGS. 4 and 5, the BRB 59 fetches instructions during the fetch stage and provides these instructions to the ISB 61. The BRB 59 makes target address predictions and branch taken or not taken predictions for various types of program control instructions. Thus, it is capable of speculatively fetching instructions during the fetch stage.

The ISB 61 issues the fetched instructions in predicted program counter (PC) order during the issue stage because the instructions may have been speculatively fetched. It also forms checkpoints of the machine state of the CPU 51 during the issue stage for instructions that fall at predefined intervals and for predefined types of issued instructions including program control instructions for which predictions were made by the BRB 59.

The BRB 59 executes and completes nop and predicted program control instructions in the same stage as which they are issued by the ISB 61. However, the DFB 62 executes and completes issued fixed point and floating point instructions as soon as the resources they require become available. Moreover, the DFB 62 aggressively schedules issued load/store instructions for execution when the resources they require are available and in such way that when they are executed and completed they do not have to be undone in the case of an exception (e.g., issue trap, execution trap, or interrupt) or program control misprediction. Thus, although some fixed point, floating point, and load/store instructions may have an earlier issue stage than others, they will have later execute and complete stages since the resources they require may not have become available yet. Moreover, they may have been speculatively executed and completed since they may have been speculatively fetched by the BRB 59 based on an earlier program control prediction by the BRB 59. In otherwords, instructions issued to the DFB 62 may be executed and completed out of predicted PC order.

When an instruction completes without an error occurring (e.g., issue trap, execution trap, or mispredict), it is then deactivated by the ISB 61 in the deactivate stage. Since instructions may complete out of predicted PC order, they may also be deactivated out of predicted PC order.

Deactivated instructions are then committed by the ISB 61 in actual PC order during the commit stage. This is true because a deactivated instruction may not be committed unless all preceding issued instructions were committed. As a result, once an instruction is committed, it may not be undone in the case of an exception or misprediction. This means that the actual PC order up to this point is correct and that the instruction has been committed in the actual PC order. Moreover, when an instruction for which a checkpoint was formed is committed, then the preceding checkpoint can be retired.

When an instruction is committed, it is then retired by the ISB 61 in actual PC order in the refire stage. After an instruction is retired, the resources allocated to it are reclaimed and may be re-allocated to other instructions when they are issued.

The ISB 61 handles recovery from an exception or program control misprediction that occurs at some point during the pipeline. Thus, if a program control misprediction occurs, then the ISB 61 backs up the CPU 51 to the checkpoint formed when the faulting program control instruction was issued. Similarly, if an execution trap occurs, then the ISB 61 first backs up the CPU 51 to the earliest checkpoint formed after the faulting instruction was issued, if such a checkpoint has been formed, and/or backsteps the CPU 51 to the faulting instruction. In backing up to a checkpoint and/or backstepping, the CPU 51 is returned to the correct machine state that existed just prior to the issuance and execution of the faulting instruction.

If the ISB 61 has backed up to a faulting instruction that caused a program control misprediction, then the BRB 59 begins fetching instructions with the correct program counter value and the correct machine state. But, if the ISB 61 has backed up and/or backstepped to an instruction that caused an execution trap, it provides the BRB 59 with the target program counter value for the appropriate trap handling routine. The trap handling routine then handles the trap and returns the program counter value of the faulting instruction or the next program counter value to the BRB 59 for fetching the next instruction. The program counter of the faulting instruction is returned if the trap handling routine instructs the CPU 51 to fetch the faulting instruction and retry issuing and executing it. But, the next program counter value is returned if the trap handling routine instructs the CPU 51 to fetch the next instructions after the faulting instruction so as to skip issuing and executing the faulting instruction.

II. TRACKING INSTRUCTIONS

As was alluded to earlier, the CPU 51 employs a novel design and method for tracking instructions in the instruction pipeline of the CPU 51. In order to implement this design and method, the pipeline includes the fetch, issue, execute, complete, deactivate, commit, and retire stages shown in FIG. 5 and discussed briefly earlier.

A. Instruction Fetch

Referring again to FIG. 4, the BRB 59 fetches instructions during the fetch stage and provides them to the ISB 61. As shown in FIG. 6, the BRB 59 includes an instruction prefetch and cache unit (IPCU) 100 and a fetch unit 102.

The IPCU 100 can retrieve four instructions (INSTs) from the level one (L1) instruction cache 52 at a time. The retrieved instructions are recoded by the instruction recoder 101 of the IPCU 100 to a format more suitable for use by the BRB 59, ISB 61, and DFB 62. In alternative embodiments, the IPCU 100 could be implemented to retrieve more or less than four instructions at a time.

The fetch unit 102 includes program counter (PC) logic 106 to compute a fetch program counter (FPC) value, an architectural program counter (APC) value, and a next architectural program counter (NAPC) value. The FPC, APC, and NAPC values computed each machine cycle are respectively stored in the FPC, APC, and NAPC registers 112-114 of the PC logic. The current FPC value in the FPC register points to the instructions being fetched in the current machine cycle while the current APC and NAPC values in the APC and NAPC registers 113 and 114 point to the first and next instructions which are available to be issued by the ISB in the current machine cycle and which were fetched in a previous machine cycle.

In response to the FPC value, four instructions (F.sub.-- INSTs) at a time are fetched by the IPCU 100. In alternative embodiments, it could be implemented to fetch more or less than four instructions at a time.

For each FPC value, the branch history table 104 contains a branch prediction field (BRP) for each instruction fetched with the FPC value. Thus, each of the BRP fields corresponds to one of the instructions fetched with that FPC value. If any of the fetched instructions are conditional branch instructions, such as the SPARC-V9 BPcc, Bicc, BPr, FBfcc, and FBPfcc instructions, then their corresponding BRP fields identify whether the branch should be predicted taken or not taken in the subsequent issue, execute, and complete stage of the branch instruction. The branch history table 104 outputs the BRP fields for the fetched instructions in response to the FPC value. The BRP fields are then appended to the corresponding fetched instructions (F.sub.-- INSTs) by the fetch unit 102 to provide composite instructions which are received by the fetch rotation register 110.

The fetch register has four latches for holding four instructions at a time. The latches define an issue window with four slots (0-3) containing, in predicted PC order, the next four instructions that can be issued by the ISU 200. However, some of the instructions held by the fetch register during the previous machine may have not been issued by the ISU 200 because of issue constraints such as those described shortly. To make sure that fetched instructions will be issued in predicted PC order, the ISU 200 generates fetch register control (FCH.sub.-- REG.sub.-- CNTRL) signals which control the fetch register to place the instructions that weren't issued in the last machine cycle in the appropriate latches so that they are in predicted PC order in the issue window and can therefore be issued in predicted PC order by the ISU 200. Moreover, in response to the FCH.sub.-- REG.sub.-- CNTRL signals, the fetch register places in the remaining latches those of the just fetched instructions that come next in predicted PC order.

In alternative embodiments, the fetch register could be implemented to store more or less than four instructions at a time. Moreover, although the exemplary CPU 51 has been described with one fetch register located in the fetch unit 102, a fetch register of the type just described could be used at each block in the CPU that performs decoding. And, the fetch registers could be placed in these blocks so that initial decoding takes place during the fetch cycle prior to the decoded instructions being stored in the fetch registers with the remainder of the decoding taking in the issue stage.

B. Instruction Issue

1. Instruction Issue Overview

The ISB 61 is responsible for issuing fetched instructions during each issue stage. As shown in FIG. 7, the ISB 61 includes an issue unit (ISU) 200, a precise state unit (PSU) 300, a freelist unit 700, and a control register file 800. During each issue stage, the ISU 200 receives four instructions (F.sub.-- INSTs.sub.-- BRPs) at a time from the fetch register 110 of the BRB 59. In response, it decodes them to determine how many of them to issue based on various issue constraints, such as those described next.

Since the fetch register 110 provides four instructions at a time to the ISU 200 during each issue stage, the ISU 200 can issue up to four instructions during each issue stage. But, it can only issue the four instructions in the issue window of each issue stage in predicted PC order. Therefore, if any other issue constraints require that one of the instructions in the current issue window cannot be issued, then only those of the instructions in the issue window slots prior to this instruction can be issued during the current issue stage. Since, in alternative embodiments, the fetch register 110 may be constructed to provide to the ISU 200 more or less than four instructions per issue stage, the ISU 200 may be constructed to issue up to more or less than four instructions per issue stage. Referring to FIG. 8, the FPU 600, FXU 601, FXAGU 602, and LSU 603 have reservation stations or queues for storing instruction data dispatched to them for issued instructions. However, during the issue stage, some of the reservation stations may not have enough storage elements or entries available to store dispatched instruction data. Thus, the FPU 600, FXU 601, FXAGU 602, and LSU 603 output ENTRIES.sub.-- AVAIL signals that respectively indicate how many entries are available in the reservation stations of the FPU 600, FXU 601, FXAGU 602, and LSU 603 to receive dispatched instruction data. As a result, the ISU 200 issues floating point, fixed point, and load store instructions based on how many entries are available as indicated by the ENTRIES.sub.-- AVAIL signals.

Moreover, in the exemplary CPU 51, the FXU 601 is the only execution unit in the DFB 62 that executes program control, multiply/divide, and control register read/write instructions that involve fixed point data computations. Thus, in this case, the ISU 200 issues these kinds of instructions for execution by the FXU 601. In alternative embodiments, the FXAGU 602 may also be configured to execute program control, multiply/divide, and control register read/write instructions that involve fixed point data. In this case, the ISU 200 would also issue these kinds of instructions for execution by the FXAGU 602. Moreover, each of the execution units 600-603 may comprise one or more functional units, with each functional unit capable of executing or performing an issued instruction. Thus, the number of instructions that can be issued depends on the number of functional units within each execution unit.

Referring again to FIG. 7, the PSU 300 may also impose issue constraints on the issuing of instructions by the ISU 200. The PSU 300 can output an ISSUE.sub.-- KILL signal that identifies which, if any, of the instructions received from the fetch register 110 should not be issued during the current issue stage. As will be explained in greater detail later, the ISSUE.sub.-- KILL signal is asserted when the PSU 300 is backing up or backstepping to a faulting instruction after an exception has been detected or during the execution, completion, deactivation, and retirement of a syncing instruction which is described next.

For particular types of instructions, the ISU 200 makes certain that the CPU 51 is synced (synchronized) prior to issuing one of these syncing (synchronizing) instructions. The CPU 51 is synced for a particular instruction when all of the instructions preceding it have been issued, executed, completed, deactivated, and retired. Thus, when the ISU 200 determines that one of the instructions received from the fetch register 110 is a syncing instruction, it waits until it has issued the instructions that are in the issue window slots ahead of it and until it receives a MACHINE.sub.-- SYNC signal from the PSU 300 indicating that the CPU 51 has been synced. When this occurs, the instruction is in the slot of the issue window for the instruction with the earliest PC value (i.e., slot 0) and the ISU 200 will only issue it. Although the exemplary CPU 51 implements the previously described method of syncing machine state for particular instruction types, an alternative technique would allow issue of syncing instructions after commitment of the previous instruction since a committed instruction is guaranteed to be non-speculative.

The ISU 200 also detects issue traps, such as those described in the SPARC-V9 Architecture Manual, which effect issuing of instructions. The ISU 200 receives control register (CNTRL.sub.-- REG) fields from the control register file 800 and decodes the instructions (F.sub.-- INSTs.sub.-- BRPs) received from the fetch register 110 to detect whether certain types of issue traps have occurred during an issue stage. In an embodiment based on the SPARC-V9 architecture, this is done in accordance with the SPARC-V Architecture Manual. Moreover, other types of issue traps do not require register control and are taken and detected based on the instruction opcode decoded by the ISU 200.

Only the issue trap caused by the instruction in the earliest issue window slot will be taken. Therefore, when one or more issue traps are detected, only the instructions in the issue window slots prior to the slot of the issue trap causing instruction in the earliest slot can be issued by the ISU 200. Moreover, for issue traps that occur often that occur often, the CPU 51 will be synced in the manner described earlier so that the issue trap is not taken speculatively. In alternative embodiment, the CPU 51 could be configured to allow the issue traps to only occur in slot 0 so as to reduce and simplify logic.

During each issue stage, the ISU 200 assigns (allocates) a serial number to each instruction that is issued and dispatches these serial numbers (SNs) to the DFB 62. As will be described in greater detail later, the PSU 300 keeps track of the issued instructions using the allocated serial numbers. Moreover, in the exemplary CPU 51, it can only keep track of a predetermined number of issued but not yet refired instructions at any one time. But, as issued instructions are retired, their serial numbers become available. Thus, the PSU 300 provides the ISU 200 with SN.sub.-- AVAIL signals that include a signal to indicate how many serial numbers are available for the current issue stage and signals to contain these available serial numbers. If, during the current issue stage, the SN.sub.-- AVAIL signals indicate that the number of serial numbers currently available are less than the number of instructions that can be issued, then only those instructions in the earliest slots for which serial numbers can be allocated are actually issued.

Also during the issue stage, the ISU 200 determines whether checkpoints of the machine state of the CPU 51 should be formed for any of the instructions. As will be described in greater detail later, checkpoints will be formed for certain types of instructions and at predetermined issue stage intervals. Thus, when the ISU 200 determines that any of the instructions it will issue require a checkpoint and/or the PSU 300 indicates with the TIMEOUT.sub.-- CHKPT signal that a predetermined number of issue stages have occurred since the last issue stage in which a checkpoint was formed, the ISU 200 generates a DO.sub.-- CHKPT signal instructing the PSU 300, control register file 800, the BRB 59, and the DFB 62 to form checkpoints of the machine state of the CPU 51.

Furthermore, the ISU 200 assigns a checkpoint number to each of the instructions that is issued and dispatches these checkpoint numbers (CHKPT.sub.-- Ns) to the DFB 62. In doing so, each instruction for which a checkpoint is to be formed is assigned a new checkpoint number while the instructions for which a checkpoint is not formed are assigned the checkpoint number of the previous checkpoint. The PSU 300 keeps track of the formed checkpoints using the assigned checkpoint numbers.

As with assigned serial numbers, the PSU 300 can only keep track of a predetermined number of formed but not yet retired checkpoints at any one time. However, since checkpoint numbers become available as checkpoints are retired, the PSU 300
provides the ISU 200 with CHKPT.sub.-- AVAIL signals that include a signal that indicates how many checkpoint numbers are available for the current issue stage and signals that contain the available checkpoint numbers. Thus, when the CHKPT.sub.-- AVAIL signals indicate that the number of new checkpoint numbers currently available are less than the number of instructions that require a new checkpoint to be formed, only those of the instructions in the earliest issue window slots for which new checkpoint numbers can be assigned are actually issued.

Furthermore, the CPU 51 may be configured to only form a predetermined number of checkpoints per issue stage. Thus, for example, if the CPU 51 can only form one checkpoint per issue stage and there are two instructions in an issue window that require a checkpoint to be formed, the ISU 200 will only issue the instruction which is in the earliest slot during the current issue stage and will issue the other instruction in another issue stage.

As will be discussed later, the CPU 51 employs register renaming to increase processor throughput. In order to properly implement register renaming, the ISU 200 receives REGS.sub.-- AVAIL signals from the register renaming freelist unit (FREELIST). Referring to FIG. 8, these signals include signals that indicate how many of the physical fixed point registers of the fixed point register file and rename unit (FXRFRN) 604 are available for renaming, how many of the floating point registers of the floating point register file and rename unit (FPRFRN) 605 are available for renaming, how many of the physical integer and floating point condition code registers in the floating point status and condition code register file and rename unit (FSR/CCRFRN) 606 are available for renaming. Referring back to FIG. 7, if any of the instructions received from the fetch register 110 require register renaming, and physical registers are required for renaming but are not available as indicated by the REGS.sub.-- AVAIL signals, then the ISU 200 cannot issue these instructions.

The foregoing issue constraints have been described with respect to the exemplary CPU 51 in order to demonstrate the concept of issue determination. Furthermore, implementations may exist which involve different and more or less issue constraints than those just described but do not depart from the basic concept of issue determination as described herein. Thus, those skilled in the art will recognize that issue constraints are implementation dependent and that the invention described herein is not to be limited by the issue constraints just described.

2. Instruction Serial Number Allocation and Instruction Status Storage

The exemplary CPU 51 assigns a unique identification tag that is associated with an instruction from the time the instruction is issued until the instruction completes execution and is ultimately retired. Sequential serial numbers are conveniently used as the instruction identification tags. The serial numbers are used during each stage and by virtually every block during exception-free processing and during CPU 51 recovery when exceptions or mispredictions occur. Instruction status Information is stored in CPU 51 and continually updated in response to changes in status, such as execution completion signals, execution error signals, and branch instruction mispredict signals, and the like. The Precise State Unit (PSU) 300 within ISB
61 is responsible for allocating the instruction serial number tags and for tracking instruction status in response to signals received from other CPU 51 units, particularly from DFB 62.

In reference to FIG. 9, Precise State Unit (PSU) 300 is a functionally within ISB 61 and is responsible for (1) tracking the issue, execution, completion, and retirement status of instructions in CPU 51; (2) detecting and initiating branch mispredict recovery; (3) tracking the availability of certain machine resources including instruction serial numbers, checkpoints, and watchpoints; and (4) general exception handling and control. Several units within PSU 300, as shown in FIG. 9 and as described in greater detail hereinafter, are implemented to accomplish this functionality.

PSU 300 includes an Issue/Commit/Retire Unit (ICRU) 301 which maintains activity status information for all issued instructions until they are retired. ICRU 301 supplies ISU 200 with a signal (SN.sub.-- AVAIL) that identifies up to four instruction serial numbers (SN's) for use by ISU 200 in the next instruction issue cycle. At the time ISU issues instructions, ISU 200 informs ICRU which of the up to four serial numbers provided (SN.sub.-- AVAIL) to ISU in the previous cycle were validly issued by ISU in the form of the ISSUE.sub.-- VALID signal that, as indicated earlier, identifies which serial number was allocated and which serial number was associated with each instruction. Recall that issue constraints may restrict ISU's ability to issue all of the instructions in the instruction window. ICRU 301 stores and maintains status information and address pointers to the status information to track instruction Issue, execution, completion, deactivation, commitment, and retirement phase status for every instruction that was validly issued by ISU 200, independent of the type of instruction. ISU 200 also informs ICRU 301 with signal ISSUE.sub.-- NOP which of the issued instructions are associated with nop and certain program control instructions, which allow deactivation in the same stage as which they are issued. ISSUE.sub.-- NOP is not limited to the specific "nop" instruction which may be inserted into the instruction stream, such as for example, to introduce a pause or delay. When aggressive scheduling of long latency instructions such as load/store and other instructions referencing memory external to the CPU 51, ISU 200 also informs ICRU 301 in the form of an ISSUE.sub.-- MEM signal that identifies which of the instructions in slot0-3 are long latency. (Aggressive scheduling of long latency instructions is separately described as an enhancement to the basic inventive structure and method.)

a. Issue/Commit Reclaim Unit (ICRU)

Issue/Commit/Reclaim Unit (ICRU) 301 is functionally within PSU 300 and assigns instruction tags, such as the n-bit sequential serial numbers, and maintains activity status information for all issued instructions by defining and maintaining a data storage area for the instruction status information in memory internal to CPU 51. FIG. 10 shows a functional block diagram of components of ICRU 301 and input and output signals associated with ICRU operation. ICRU 301 is functionally organized into four areas. Instruction Status Information Data Structure 308 is responsible for storing instruction status information and updating the status in response to signals Data Structure Control Logic 309 which itself is responsive to signals from other CPU 51 units, particularly DFB 62. Pointer Data Structure 310 includes data storage areas for separately storing a plurality of a serial numbers that serve as status pointers to various instruction stage milestones in the CPU 51, such as issued, completed, retired, and the like. These serial number pointers are described in greater detail hereinafter. Pointer Control Logic 311 is responsible for evaluating the status information stored in Data Structure 308 in conjunction with data received from other CPU 51 units in order to update the serial number pointer to reflect then current CPU 51 state. Pointer values are provided as global outputs from ICRU 301 and PSU 300 to other units in the CPU 51.

Instruction Status Information Data Structure 308 comprises Active Instruction Ring Registers (A-ring) 312 and may optionally comprise Memory Instruction Ring Registers (M-ring) 324. Memory Instruction Ring registers are described elsewhere in this specification in connection with Aggressive Load/Store instruction scheduling. Data Structure Control Logic 309 comprises A-Ring Set/Clear Logic 313 and may optionally comprise M-Ring Set/Clear Logic 325. Pointer Control Logic 311 comprises ISN/NISN Initialization and Advance Logic 321, Next Instruction Serial Number Allocation Logic 322, and CSN/RRP Advance Logic 323, and may optionally comprise NMCSN Advance Logic 326, Machine Sync Generation Logic 327, and Backtrack Mode Pointer Adjustment Logic 328. Optional elements pertain to enhancements of the basic inventive structure and method and are described more fully hereinafter.

Operation of Issue/Commit/Reclaim Unit during instruction issue stage is now described in reference to FIG. 10. The serial numbers (SN.sub.-- AVAIL) sent to ISU 200 for allocation to issued instructions are selected by serial number allocation logic in ICRU based on pointers values within Pointer Data Structure 310. When the CPU 51 is initialized (such as on power-up), pointers within Pointer Data Structure 310 may also be initialized so that ICRU initially allocates the first serial numbers to ISU 200 (e.g. SN's 0-3) for instructions in the issue window.

ISU 200 sends ISSUE.sub.-- VALID signal to Data Structure Control Logic 309 to inform ICRU which of the serial numbers provided in the previous cycle (e.g. which of the four SNs) where validly issued by ISU 200 and for which instruction slot. Set/Clear A-Ring Control Logic 313 uses this information and the values of pointers within Pointer Register 310 to write instruction status information Into A-Ring 312. In one embodiment of the CPU 51, ISSUE.sub.-- VALID is a four-bit vector that informs ICRU which of the instruction serial numbers (SN.sub.-- AVAIL) were actually used by ISU 200 by identifying instruction slot for which an instruction was actually issued. ICRU can then determine which SN's were used based on both SN.sub.-- AVAIL and ISSUE.sub.-- VALID.

An embodiment of Active Instruction Ring 312 is now described with reference to a diagrammatic representation in FIG. 11. The data structure in FIG. 11 is exemplary and those workers having ordinary skill In the art, in light of the teaching of this patent, will understand that a variety of data structures may be used to implant this Active Instruction Status Register. FIG. 11 shows Active Instruction Ring (A-ring) 312 which is implemented as a 64-bit data structure. (Memory Instruction Ring (M-Ring) 324 is also shown in diagrammatic form and will be described elsewhere in this specification with respect to an enhancement for aggressively scheduling long latency instructions.) Each of the sixty-four addressable locations in A-Ring 312
corresponds to an instruction serial number in the CPU 51. The set ("1") or cleared ("0") state of each Active-bit (A-bit) indicates whether the instruction is active ("1") or inactive ("0") in the CPU. Basically, instruction is actuated where it is issued and deactivated where it completes execution without error and was not mispredicted. For some instructions that are issued and effectively complete execution in a single machine cycle, the A-bit is cleared at instruction issue. The number of addressable locations in A-ring 312 limits the number of instructions that may be concurrently outstanding in the CPU.

In the exemplary processor, A-ring 312 is implemented as a circular, ring, or wrap-around data structure where pointers are moved along the data structure using "modulo-A-ring register length" type arithmetic (here, modulo-64 type arithmetic). That is, as a pointer is incrementally moved from a first data storage location (addressable bit 0) through a series of intermediate storage locations (e.g. locations 15, 16, . . . ) to the last data storage location (addressable bit 63), it will then return to the first location (addressable bit 0). Those of ordinary skilled in the art will, in light of the teachings of the present invention, realize that although a circular data structure is advantageous, it is not essential to the invention and other data structures implementing the features of the invention may be used. Furthermore, although the exemplary A-ring has sixty-four single-bit addressable locations to maintain the required status information, multi-bit addressable locations may alternatively be provided for storing these and additional status indicators. For example, in one embodiment, a 64-bit register with arbitrary rotation is used, and in a second embodiment eight 8-bit registers are used to provide the sixty-four A-ring
312 A-bits. A simple decoding circuit enables writing (set and clear) of the bits as described hereinafter.

"Instruction Issue" by the ISU 200 results in the allocation of an instruction serial number to the instruction. The number of allocatable serial numbers is limited by the number of addressable locations (e.g. bit positions) in A-ring 312. Each addressable A-ring location therefore corresponds to a unique serial number. For example, for the exemplary 64-bit circular A-ring, as many as 64 instruction serial numbers may be concurrently assigned. More or fewer serial numbers may be assigned by providing more or fewer A-ring locations. If all serial numbers are assigned, further instruction issue must be stalled by ISU 200 until serial numbers and locations on the A-ring are freed in subsequent machine cycles. ISU 200 is informed that SN are or are not available by the SN.sub.-- AVAIL signal. Each addressable location in A-ring 312 corresponds to one of the available serial numbers so that an entry in the A-ring is made for each newly issued instruction by advancement of an Issued Serial Number Pointer (ISN) 314 by one for each instruction issued. Since multiple instructions may be issued during a single machine cycle, more than one A-bit may be set during each cycle and therefore ISN may be advanced by more than one A-ring location each cycle.

"Instruction Dispatch" by ISU 200 is the activation and sending of an instruction (having an assigned serial number) for execution to DFB 62. All dispatched instructions are also issued instructions but not all issued instructions are dispatched. All "dispatched" instructions have the bit corresponding to the assigned serial number in A-RING 312 set equal to "1" to indicate the instruction is active in the machine (1=active, 0=inactive or deactivated). Instructions which are issued but not dispatched correspond to known branches or nop-type instructions and for which instructions, the A-bit is cleared (or not set) at issue. These known branches and nop-type instructions do not require an execution unit to execute, and may be issued, executed, and completed in a single stage.

b. Instruction Issue Related Pointers and Pointer Maintenance During Issue Stage

FIG. 11 also shows several pointers (ISN, NISN, CSN, RRP, NMCSN, and PBSN) that are stored in pointer storage memory area 310 and point to locations on A-Ring 312 and M-Ring 324. Each pointer is element in a pointer storage unit 210 that stores one of the A-Rings location valves 0-64. Certain combinations of pointers may point to the same A-Ring or M-Ring location but other combinations may not. For example, NISN can never equal ISN, but at machine "sync" CSN=ISN=RRP. Issued Serial Number pointer (ISN) 314 and Next Issued Serial Number pointer (NISN) 315 maintain and track instruction issue status and generally limit advantage of other pointers. Committed Serial Number pointer (CSN) 316 tracks instruction commitment which follows instruction execution, completion, and deactivation for which no pointers are specifically provided.

Retire and Reclaim Pointer (RRP) 317 tracks instruction retirement and controls resource reclaiming. Two additional pointers Earliest Predicted Branch Instruction pointer (PBSN) 318 and Non-Memory Committed Serial Number pointer (NMCSN) 319 are used to schedule and track execution of predicted branch instructions and long latency instructions such as load/store instructions. ICRU 301 provides current values for each of ISN, NISN, CSN, RRP, and NMCSN to a number of other units within the CPU
51. ICRU receives a value of PBSN 318 from Watchpoint Unit 308 within PSU 300 as described hereinafter. Each of these pointers are 30 implemented as a six-bit vector in the exemplary CPU.

Issued Serial Number pointer (ISN) 314 always points to the serial number of the last issued instruction. An instruction is considered to have been issued when one of the serial numbers supplied by ICRU has actually been allocated by ISU 200. ISN and NISN are incremented in response to ISSUE.sub.-- VALID from ISU 200 which informs ICRU of the number of instructions actually issued during that cycle. As ISN and NISN are incremented, the pointers effectively advanced around A-ring 312. The maximum number of positions the ISN can advance every machine cycle is determined by the peak issue rate of the machine, but may be limited to a smaller predetermined maximum number of instructions per cycle by other software of hardware controls.

For example, in the exemplary CPU advancement of ISN and NISN are limited to four instruction serial numbers (A-ring locations) per machine cycle. When the CPU is initialized ISN is initialized to 0 (zero) and when additional instructions issue ISN is advanced by the number of newly issued instructions.

Next Issued Serial Number pointer (NISN) 315 always points to ISN+1 (modulo A-ring length), so that if ISN=63, then NISN=0. Where A-ring 312 contains sixty-four entries, NISN=ISN+1 (modulo 64). This is essentially the serial number of the next instruction to be issued (all instructions are issued in predicted PC order). While it is convenient to provide a separate pointer NISN, ISN alone could be used in some embodiments since NISN is always one greater than ISN. When the CPU is initialized NISN is initialized to 1 (one) and when additional instructions issue, NISN is advanced by the number of newly issued instructions. Initialization and advancement of ISN and NISN are performed by ISNINISN Advance Logic Unit 321 in response to ISSUE.sub.-- VALID from ISU 200.

ICRU 301 provides ISU 200 with four available instruction serial numbers that may be assigned to instructions as they are issued. Determination of the proper four instruction serial numbers is made in the Next Instruction Serial Number Allocation Logic 322 within Pointer Control Logic unit 311 based on the current values of ISN 314 and NISN 315. RRP 317 may limit allocation of SNS if the CU is full, that is if all SNs have been allocated and not reclaimed. ICRU 301 informs ISU 200 of the next series of (four) serial numbers that should be assigned to the issued instructions if the serial numbers are available by sending SN.sub.-- AVAIL signals to ISU 200.

In one embodiment of CPU 51, SN.sub.-- AVAIL composes four seven-bit signals (SN.sub.-- SLOT.sub.-- 0, SN.sub.-- SLOT.sub.-- 1, SN.sub.-- SLOT.sub.-- 2, and SN.sub.-- SLOT.sub.-- 3) that identify the serial numbers to be assigned to the up to four instructions in slots 0-3 of the issue window. Six bits supply the serial number while the seventh bit is a serial number validity bit. If the validity bit for that serial number is not set, the instruction serial number is still active in the CPU
51 and should not be used by ISU. The validity bit may not be set if all sixty-four instructions supported by the sixty-four bit A-ring 312 are active in the CPU 51. Since ISU 200 can only issue an instruction if a serial number is available, ISU may have to wait for previously issued instructions to commit and retire before issuing additional instructions. Instruction commitment and retirement is described hereinafter.

FIG. 12 shows Status Control Logic Unit 309 in greater detail including the structure of A-Ring Bit Set/Clear Logic (ASCL) 313. ASCL 313 comprises Set Logic 331 and Clear Logic 333, as well as an Address Decode Logic Units 332, 334 associated with the Set and Clear Logic Units 331, 333, respectively. Set Logic Unit 331 receives ISSUE.sub.-- VALID signal from ISU 200, and the decoded NISN pointer address from Address Decode Logic Unit 332. A-ring Set Logic Unit 331 then sets A-bits corresponding to the number of instructions actually issued by ISU to "1" via A-ring write port 342. A-bits are cleared when instructions are deactivated and/or CPU reset. The number of write ports 342 is selected to support the maker of instructions that will be activated (or deactivated) each cycle. Other structures shown in FIG. 12 relating to M-RING 324 are discussed hereinafter in conjunction with aggressive scheduling of long latency instructions and with instruction commitment and retirement.

In one embodiment, ISSUE.sub.-- VALID is a four-bit vector received from ISU 200 where assertion (bit set "1" or high signal level as seated) at position bit-i indicates instruction in the i-th instruction issue window slot has been Issued. ICRU
301 knows which serial numbers are associated with which slot by virtue of the SN.sub.-- SLOT.sub.-- i signals described previously. Since instructions are Issued in sequential program order, there are only five possible states for ISSUE.sub.-- VALID in a four instruction issue CPU ("0000", "0001", "0011", "0111", and "1111"). ICRU also receives a "nop-type" instruction signal (ISSUE.sub.-- NOP) from ISU which informs ICRU that the instruction is a nop-type instruction as described. In one embodiment of the CPU 51, ISSUE.sub.-- NOP is a four-bit signal where assertion at position bit-i indicates i-th instruction in the slot is a "no-op"-class instruction. In implementing aggressive Local/Store Instruction scheduling, ISSUE.sub.-- MEM is a four-bit signal where assertion at position bit-i indicates i-th instruction In the slot is a memory referencing instruction.

Generally, a no-op instruction is either a control transfer instructions such as a branch, or a write ASR, write FPRS, or write PIL. The instructions "write ASI", "write FPRS" and "write PIL" are executed in the corresponding Control Register Unit 800 and not in DFB 62 and are therefore are treated as NOP-type instructions. Consequently, their corresponding ISSUE.sub.-- NOP is set to `1` leading to the A-bit be set to `0`. Tcc instructions are not issued when CC is not known. Tcc instructions are issued as a NOP instruction if CC is known and false. Tcc instructions are issued with ISSUE.sub.-- NOP=0 in other cases. These instructions do not require a DFB 62 execution unit to execute and generally complete in the same machine cycle as issued. When the corresponding instruction is of the "nop"-class such as a "branch" (br) instruction, ISSUE.sub.-- NOP is set to "1". When ISSUE.sub.-- NOP is set to "1", the corresponding A-bit is set to `0`, i.e. the instruction is instantly deactivated and committable. (Instruction commitment and modification of A-bits at commitment are described hereinafter.)

The sixty-four activity bits in the A-ring indicate the activeness of the instructions coexisting in the CPU. The A-bits do not need to be initialized because they will be set and cleared every cycle before they are used. The A-bits are set according to the ISSUE.sub.-- VALID, ISSUE.sub.-- NOP, and NISN.

Because the branch instructions have the A-bit cleared at issue, additional steps are taken to prevent CSN 316 (see description hereinafter) from advancing past a speculatively executed branch instruction. In the exemplary embodiment, a branch condition code is stored in Watchpoint Unit 304 at the time the branch instruction is issued. Watchpoint Unit 304 monitors instruction execution and when the speculatively issued branch instruction completes, Watchpoint compares the execution result with the branch condition code required to validate the speculative issue. Condition code comparison provides a determination as to whether the speculative execution was correctly predicted. If the speculative execution was incorrect, CPU 51 recovery methods (such as backup and/or backstep) are initiated to undo the execution thereby preventing CSN from advancing past the mispredicted branch in spite of the cleared A-bit, which would otherwise preclude recovery.

Special attention must be taken to initiate machine recovery fast enough to prevent instruction commitment of the Incorrectly issued instruction sequence. Prior to evaluation, the pending instruction itself prevents commitment of the mispredicted instruction sequence. Watchpoints and CPU 51 recovery are described in greater detail elsewhere in this specification.

Aspects of an embodiment of the inventive method, including the use of instruction tags to track instruction status and maintain precise state are illustrated in the flow-charts of FIGS. 13-15. FIG. 13 is a diagrammatic flow-chart of an embodiment of the method of using instruction tags to track status and maintain precise state. FIG. 14 is a diagrammatic flow-chart of an embodiment of the inventive method for tracking instruction status to maintain precise state. FIG. 15 is a flow-chart diagram of the method for writing and maintaining status information in the Active Instruction Ring and Memory Instruction Ring according to an embodiment of the present invention.

3. Register Renaming

To remove some register dependencies so that more instructions can be issued per machine cycle by the ISU 200, the CPU 51 performs register renaming during the issue stage. This may be done in a similar manner to that described in U.S. Pat. No. 5,355,457 issued on Oct. 11, 1994 to Shebanow et al. and/or that described in the copending U.S. patent application Ser. No. 08/388,364 entitled "METHOD AND APPARATUS FOR COORDINATING THE USE OF PHYSICAL REGISTERS IN A MICROPROCESSOR" referenced earlier and/or that described in M. Johnson's text Superscalar Microprocessor Design at pages 48-55.

Referring to FIG. 8, to implement register renaming, the DFB 62 includes the FXRFRN 604, FPRFRN 605, and CCRFRN 610 of the FSR/CCRFRN 606. As mentioned briefly earlier, the FXRFRN 604 contains physical fixed point registers for storing fixed point data, the FPRFRN 605 contains physical floating point registers for storing floating point data, the CCRFRN 610 contains physical integer (fixed point) and floating point condition code registers (XICC and FCCs) for storing integer and floating point condition code data. The FXRFRN 604, FPRFRN 605, and CCRFRN 610 may each be configured as shown in FIG. 16.

As shown in FIG. 16, the register file and rename unit (RFRN) 612 includes a physical register file 614 that includes physical registers. The RFRN also includes rename mapping logic 613 that maps logical and/or architected registers specified by an Instruction to physical registers in the physical register file 614. For example, with. SPARC-V9 instructions, the rename mapping logic of the FXRFRN 604 could perform a mapping of 32 architected to 80 logical to 128 physical fixed point registers, the 32 architectural or logical single and double precision floating point registers are mapped into 2 sets of 64 single precision (32-bit) renamed floating point registers (odd and even), the five architected or logical Condition Code Registers (xicc, fcc0, fcc1, fcc2, and fcc3) map to 32 renamed Condition Code Registers. The architectural to logical mapping is due to the fact that SPARC-V9 fixed point instructions specify architected registers that map to logical registers in accordance with the current window pointer (CWP) stored in the CWP register of the control register file 800 shown in FIG. 17.

The control logic 613 of the RFRN receives instructions (F.sub.-- NSTs.sub.-- BRPs) from the fetch register of the BRB 59 during each issue stage. In response, the control logic 613 decodes the instructions and determines which of the instructions requires physical registers as sources and/or destinations from the register file 614 of this particular RFRN.

The rename mapping logic 615 first maps each logical or architected source register specified by the instructions to the currently mapped physical register in the register file and provides the corresponding physical register tags to the reservation stations of the appropriate execution units (FPU 600, FXU 601, FXAGU 602, or LSU 603) in the TAGS.sub.-- R signals. For each physical register tag, the rename mapping logic has a data-valid (DV) bit that identifies whether the data in the mapped physical source register is available yet as a source. The DV bits are provided by the rename mapping logic 615 to the reservation stations of the appropriate execution units in the DV.sub.-- R signal. Moreover, if the data is available as indicated by the DV bits, then it is provided by the mapped physical source registers in the DATA.sub.-- R signals.

For each RFRN (i.e., FXRFR