Home
Patent Search
IMT Blog
REGISTER
|
SIGN IN
United States Patent
6535903
Yates , ; et al.
March 18, 2003
Title
Method and apparatus for maintaining translated routine stack in a binary translation environment
Abstract
A computer system for executing a binary image conversion system which converts instructions from a instruction set of a first, non native computer system to a second, different, native computer system, includes an run-time system which in response to a non-native image of an application program written for a non-native instruction set provides an native instruction or a native instruction routine. The run-time system collects profile data in response to execution of the native instructions to determine execution characteristics of the non-native instruction. Thereafter, the non-native instructions and the profile statistics are fed to a binary translator operating in a background mode and which is responsive to the profile data generated by the run-time system to form a translated native image. The run-time system and the binary translator are under the control of a server process. The non-native image is executed in two different enviroments with first portion executed as an interpreted image and remaining portions as a translated image. The run-time system includes an interpreter which is capable of handling condition codes corresponding to the non-native architecute. A technique is also provided to jacket calls between the two execution enviroments and to support object based services. Preferred techniques are also provide to determine interprocedural translation units. Further, intermixed translation/optimization techniques are discussed.
Inventors:
Yates; John S.
(Needham,
MA
)
, Tye; Steven Tony
(Hopkinton,
MA
)
Assignee:
Compaq Information Technologies Group, L.P.
(Houston,
TX
)
Appl. No.:
593778
Filed:
January 29, 1996
Current U.S. Class:
718/100
712/227
Field of Search:
709/100,102,107 712/227,226 717/138,139,140,136,126,124 714/38
U.S. Patent Documents
4928237
May 1990
Bealkowski et al.
5175856
December 1992
Van Dyke et al.
5179734
January 1993
Candy et al.
5247520
September 1993
Geise et al.
5287490
February 1994
Sites
5301325
April 1994
Benson
5303378
April 1994
Cohen
5307492
April 1994
Benson
5307504
April 1994
Robinson et al.
5313614
May 1994
Goettelman et al.
5317740
May 1994
Sites
5339238
August 1994
Benson
5339422
August 1994
Brender et al.
5375242
December 1994
Kumar et al.
5396631
March 1995
Hayashi et al.
5428786
June 1995
Sites
5432795
July 1995
Robinson
5450575
September 1995
Sites
5452456
September 1995
Mourey et al.
5457799
October 1995
Srivastava
5481684
January 1996
Richter et al.
5507030
April 1996
Sites et al.
5539907
July 1996
Srivastava et al.
5548717
August 1996
Wooldridge et al.
5574873
November 1996
Davidian
5579520
November 1996
Bennett et al.
5598560
January 1997
Benson
5636366
June 1997
Robinson et al.
5649203
July 1997
Sites
5652869
July 1997
Herdeg et al.
5652889
July 1997
Sites
5655122
August 1997
Wu
5740441
April 1998
Yellin et al.
5751982
May 1998
Morley
5764947
June 1998
Murphy et al.
5784638
July 1998
Goetz et al.
5805895
September 1998
Breternitz, Jr. et al.
5815720
September 1998
Buzbee
Other References
"Binary Translation", Richard L. Sites et al., Digital Technical Journal, pp. 137-152. .
"Adaptive Systems for the Dynamic Run-Time Optimization of Programs", G. Hansen, Carnegie-Mellon University, Mar., 1974, pp. 1-171. .
"Data Structures and Algorithms", A. Aho et al., Computer Science and Information Processing, pp. 110-117. .
"Compiler Design Theory", P. Lewis, II et al., pp. 559-568. .
"Method and Apparatus for Direct Conversion of Programs in Object Code Form Between Different Hardware Architecture Computer Systems", J. Goettelmann et al., pp. 3-37. .
"Data Flow Analysis for `Intractable` Imbedded System Software", H. Johnson, pp. 109-115. .
"Efficiently Computing Static Single Assignment Form and the Control Dependence Graph", R. Cytron et al., ACM Transactions on Programming Language and Systems, vol. 13, No. 4, Oct., 1991, pp. 451-490. .
"Compilers: Principles, Techniques and Tools", A. Aho et al., Computer Science, pp. 527-647. .
U.S. Ser. No. 08/580,686, filed Dec. 29, 1995. .
May, C., "MIMIC: A Fast System/370 Simulator," ACM Publication, p. 1-13, Jul. 1987. .
Halfhill, T., "Emulation: RISC's secret Weapon," Byte, Apr. 1994, pp. 119-130. .
"Selecting Predecoded Instructions with a Surrogate," IBM Technical Disclosure Bulletin, 36(06A):35-38, Jun. 1993..~
Primary Examiner:
Lao; Sue
Attorney, Agent or Firm:
Hamilton, Brook, Smith & Reynolds, P.C.
Claims
What is claimed is:
1. A memory for storing data for access by an application program being executed on a data processing system, the memory comprising: a first data structure enabling validation of when it is correct for execution of a non-native process to resume as a native process, the first data structure comprising: a series of homogenous frames, conforming to a same format within a single frame stack forming the first data structure, each frame comprising: a first field corresponding to a stack pointer address of the non-native process; a second field corresponding to a return address of the non-native process; a third field corresponding to a return address for the native process; and a second data structure corresponding to a non-native return address stack for the non-native process, the second data structure comprising: a plurality of entries corresponding to return addresses of the non-native processes.
2. The memory of claim 1 wherein
for each unique non-native called routine which is executed as a native called routine providing a return address, a corresponding native stack frame comprising the first field, the second field and the third field is provided, said fields providing a header field for the corresponding native stack frame.
3. The memory as recited in claim 2 wherein said native stack frame of said first data structure further comprises: means for providing storage for local variables associated with a state of a native machine executing a native routine in a native environment.
4. The memory as recited in claim 3 wherein said header field of said first data structure further comprises an entry corresponding to a dynamic link that identifies a previous native stack frame in the first data structure.
5. The memory as recited in claim 4 wherein said second data structure further comprises: means for providing local storage for arguments and variables passed between called and caller routines in the non-native routines.
6. A data processing system for converting a binary executable of a subject program written in a first instruction set for a first architecture to an executable in a second instruction set for a second architecture, comprising: a memory for storing data for access by an application program being executed on the data processing system, the memory comprising: a first data structure enabling validation of when it is correct for the subject program to execute as an interpreted process and when it is correct for the subject program to resume as a translated process, said first data structure comprises a series of homogenous frames, conforming to a same format within a single frame stack, each frame comprising: a first field corresponding to a stack pointer address of the interpreted process; a second field corresponding to a return address of the interpreted process; a third field corresponding to a return address for the translated process; and a second data structure corresponding to a non-native return address stack for the translated process, the second data structure comprising: a plurality of entries corresponding to return addresses of the translated process.
7. The system of claim 7 wherein said data structure further comprises: a fourth field corresponding to a dynamic link that identifies a previous native stack frame in the data structure.
8. The system of claim 7 wherein said data structure further comprises: a field for providing storage for local variables associated with a state of a native machine executing a native routine in a native environment.
9. A method of managing execution of an application program in a computer system which executes at times as a non-native process and at other times as a native process, comprises the steps of: providing a first data structure corresponding to execution of the application program as a non-native process, said first data structure including a series of homogenous frames, providing efficient access based on a predefined size, conforming to a same format within a single frame stack, each frame comprising: a field for return addresses and a pointer value corresponding to a stack pointer, and when the application program executes as a native process: using a second data structure including a series of homogenous frames within a single frame stack, which manages execution of the program as a native process;
wherein the step of using a second data structure comprises the steps of: storing in a first field of the second data structure a value corresponding to the stack pointer address of the non-native process; storing in a second field of the second data structure a value corresponding to the return address of the non-native process; storing in a third field of the second data structure a value corresponding to a return address for the native process.
10. The method of claim 9 further comprising the step of determining whether execution can continue as a native process when returning from execution of the program as a native process.
11. The method of claim 10 wherein the step of determining whether execution can continue comprises the steps of: testing that the non-native process was well behaved with respect to depth in a non-native return address stack by examining the value of a non-native stack pointer to determine whether the non-native stack pointer is equal to the contents of a non-native stack pointer field in the second data structure; and testing that the non-native process was well behaved with respect to return address by examining the contents of a non-native return address in the first data structure to determine whether it is equal to the contents of a return address in the second data structure.
12. The method of claim 11 wherein the step of determining further comprises the steps of: allowing execution of the application program as a native process if both testing steps are satisfied; and allowing execution of the application program as a non-native process if one or both of the testing steps are not satisfied.
13. The method of claim 12 wherein execution of the application program continues as a native process, as a result of the allowing step, until a return instruction is encounter that does not satisfy both of the testing steps.
14. The method of claim 12 wherein execution of the application program continues as a native process, as a result of the allowing step, until a return instruction is encounter that does not satisfy both of the testing steps or the native process calls a non-native routine that has not been translated into a native routine.
15. The method of claim 13 wherein execution of the application program continues as a non-native process if the application program can not execute as a native process.
16. The method of claim 15 wherein the non-native process is executed in an interpreter.
17. A method of managing execution of an application program in a computer system which executes at times as a non-native process and at other times as a native process, comprises the steps of: (a) providing a first data structure corresponding to execution of the application program as a non-native process, said first data structure including a series of homogenous frames, conforming to a same format within a single frame stack, each frame comprising: a field for return addresses and a pointer value corresponding to a stack pointer, and when the application program executes as a native process: using a second data structure including a series of homogenous frames within a single frame stack, which manages execution of the program as a native process; storing in a first field of the second data structure a value corresponding to the stack pointer address of the non-native process; storing in a second field of the second data structure a value corresponding to the return address of the non-native process; storing in a third field of the second data structure a value corresponding to a return address for the native process; and (b) determining whether execution can continue in the second data structure as a native process when returning from execution of the program as a native process, wherein the step of determining further comprises the steps of: testing that the non-native process was well behaved with respect to depth in a non-native return address stack by examining the value of a non-native stack pointer to determine whether the non-native stack pointer is equal to the contents of a non-native stack pointer field in the second data structure; and testing that the non-native process was well behaved with respect to return address by examining the contents of a non-native return address in the first data structure to determine whether it is equal to the contents of a return address in the second data structure.
18. The method of claim 17 wherein the step of determining further comprises the steps of: allowing execution of the application program as a native process if both testing steps are satisfied; and allowing execution of the application program as a non-native process if one or both of the testing steps are not satisfied.
19. The method of claim 18 wherein execution of the application program continues as a native process, as a result of the allowing step, until a return instruction is encountered that does not satisfy both of the testing steps.
20. The method of claim 18 wherein execution of the application program continues as a native process, as a result of the allowing step, until a return instruction is encountered that does not satisfy both of the testing steps or the native process calls a non-native routine that has not been translated into a native routine.
21. The method of claim 19 wherein execution of the application program continues as a non-native process if the application program can not execute as a native process.
22. The method of claim 21 wherein the non-native process is executed in an interpreter.
Description
BACKGROUND OF THE INVENTION
This invention relates generally to computer systems and more particularly to the execution of computer programs on non-native computer system architectures.
As is known in the art, computer systems which generally include a central processing unit, a main memory and input/output device interconnected by a system bus are used to execute computer programs to perform some useful task. One type of computer program is an operating system which is used to interface the central processing unit to one or more application programs. The aforementioned application programs are used by a user of the computer system to perform useful tasks. The operating system includes software resources needed by the computer system to interface each of the hardware elements to the computer system as well as to interface to the application programs. The application programs can include programs such as spreadsheets, word processors, electronic mail and so forth. The application programs execute on the computer system under the control of the operating system. In addition, the operating system also includes routines or libraries which the application programs use during execution.
It is generally known that application programs are compiled for a particular computer architecture, that is, a computer instruction set as well as a particular operating system. Exemplary computer architectures include the Alpha.RTM. architecture by Digital Equipment Corporation, assignee of the present invention, the so-called X86 architecture which is based upon a family of microprocessors designed and built by the Intel Corporation, as well as others such as the PowerPC.RTM. designed and built by Motorola, IBM and Apple, the VAX.RTM. architecture by Digital Equipment Corporation and the PARISC.RTM. architecture by Hewlett-Packard.
For the aforementioned Alpha architecture, the Alpha architecture supports the Windows NT.RTM. operating system by Microsoft Corporation, the OpenVMS.RTM. operating system by Digital Equipment Corporation and the OSF/UNIX.RTM. operating system. New architectures such as the Alpha architecture are developed in order to provide significant performance improvements.
One problem which occurs with a new architecture is that often programs written for an older architecture can not directly run on the new architecture because the instruction sets of the new architecture and the old architecture are different and the programs are not directly transferrable.
Several approaches have been developed to assist users to migrate from an old architecture to a new architecture. One such approach is so-called "on-line interpretation or emulation." In on-line interpretation or emulation, a software emulator module is used to provide an environment which emulates the instruction environment of the old architecture using the instruction set of the new architecture. The emulator or interpreter as it is sometimes referred to thus interprets instructions from an executable version of the application program written in a non-native instruction set and converts those instructions to an executable instruction or series of instructions or routines.
As part of the process, the interpreter tests the instructions to determine the resources needed by the instruction and analyzes the instruction to determine the function of the instruction. From this testing and analysis, the interpreter maps the instruction to a routine that performs the same function only written for instructions executable on the new architecture. The native instruction of routine is executed in the computer system to provide the equivalent function called for in the application program written in the non-native instruction set.
Interpreters are useful to convert application programs between architectures at run time. One of the significant drawbacks of interpreters, however, is that they are exceedingly slow. Thus, performance advantages of the fast or high performance architectures are often lost with using the interpreter to convert instructions from the old architecture to a new architecture.
A second technique commonly used is so-called "on-line binary translation." In on-line binary translation, a software program called a translator receives the non-native instructions of the old architecture and for each instruction provides an instruction sequence in the new architecture to accomplish what the original instruction does in the old architecture.
While a on-line binary translator is generally faster than an interpreter, one problem with an on-line translator is that often times non-native statements in the executable image are translated assuming that the statements were instructions when in fact the statements were not instructions but rather data or some other noninstruction information. While this drawback may not cause problems in the execution of the application program since the translator of non-native instructions are never reached. Nevertheless, this extra translation increases the size of the memory file required to store the translated image and reduces the overall performance of the translation. Furthermore, the translation of non-native code minimizes the opportunity to run the translator code through an optimizer and provide a more optimized routine for the translator.
Optimizers are compiler types of routines which generally operate on source code to determine opportunities for reordering and otherwise optimizing the source code to optimized object code. Optimizers have generally limited use with on-line translators because of the aforementioned problems regarding translation of non-native code. In particular, present optimization routines of translated code generally operate on a so-called "basic block" of instructions. A basic block of instructions can be viewed as a series of instructions which has a single entry point and which ends in a control transfer with a guarantee that none of the instructions between the entry point and the control transfer are themselves control transfers.
Thus, when an optimizer operates on a basic block of instructions, the opportunities for reordering and otherwise optimizing the executable code are limited to those opportunities which occur in the basic block. Thus, unlike the situation in compilers, an optimizer for an on-line binary translator is generally limited to optimization within a basic block.
SUMMARY OF THE INVENTION
An application program can be converted into a compound image which includes the original non-native image of the application program as well as selected, translated portions of the application program provided in native image code. Such a converted application program can be provided by a binary image conversion system which includes an on-line system which may be an emulator, interpreter or translator. In response to non-native instructions of an application program, the binary image conversion system provides a native instruction or a native instruction routine. The native instructions or routines are executed. In response to their execution profile data related to the execution characteristics of the native instructions and routines are collected. The image conversion system also includes a background translator system which in accordance with the profile data generated by the on-line emulator system, provides a translated and preferably optimized native image of the non-native application program image.
In general, the translated native image of the application program is executed on the computer system except for those portions of the program which do not have translated images. In those situations the non-native image of the application program is executed by the on-line system here an interpreter. Those portions of the non-native image for which execution characteristics of the instructions cannot be explicitly predicted or determined are not translated.
One problem associated with execution of an application program using native image and non-native image portions of the program is maintaining a stack of return addresses in the non-native image for routines which are executed in the native image without being invasive to the non-native image. That is, in the above arrangement when a computer system executes interpreted code at some point in the execution or subsequent executions, the computer system will be asked to execute translated native code. Once the translated native code is executed, it might not be possible for the computer system to call subsequent translated native code routines of the application program. This could occur if the translated image calls other translated images prior to returning to the non-native image or if the translated image branches to an unknown target, that is a computed branch or switch that is a non-PC relative target.
In each of these circumstances, therefore, it would require that the interpreter be used to interpret execution code. However, since the return address of the original instruction is not stored, it is impossible for the interpreter to return to execution of translated image code. This problem is most manifest in translated images which are optimized over more than a basic block. As mentioned above, a basic block of instructions is a sequence of instructions having a single entry point and a single control transfer with a guarantee of no control transfers occurring between the entry point and the control transfer point of the instruction. In background translator routines it will be possible to optimize translated images over larger regions than just basic blocks. That is, entire routines or possibly even application programs can be optimized. Therefore, it is highly likely that when a translated image of an application program is executed, a call into the native image code in the translator image will result in other calls into the translated image.
It would be unacceptable to permit the translated image to call other routines in the translator image unless a mechanism is provided to allow these other calls to return to the original execution point of the image. Otherwise, in order to return to execution of the program it would be necessary for the interpreter to provide return instructions to the non-native image.
BRIEF DESCRIPTION OF THE DRAWINGS
The foregoing features and other aspects of the invention will now become apparent when the accompanying description is read in conjunction with the following drawings in which:
FIG. 1 is a block diagram of a computer system;
FIG. 2 is a block diagram of a dual stage instruction conversion system including a run-time system and a background system;
FIG. 3 is a block diagram of the run-time system portion of the instruction conversion system of FIG. 2;
FIG. 3A is a flow chart depicting the steps performed at run-time to execute a non-native image on the system of FIG. 1;
FIG. 4 is a more detailed block diagram of a binary translator used in the background system portion of the conversion system of FIG. 2;
FIG. 5 is block diagram of a data structure representing a profile record structure;
FIG. 6 is a block diagram of a representative profile record of the profile record structure of FIG. 5;
FIG. 7 a diagram showing a typical arrangement for a instruction for a complex instruction set computer (CISC);
FIG. 8 is block diagram of a register file in the computer system of FIG. 1 showing assignment of registers corresponding to the non-native architecture;
FIG. 9 is a diagram showing a typical construct for one of the registers in the register file of FIG. 8
FIG. 10 is a pictorial representation of connections of various data structures including a dispatch table to determine an equivalent routine for the interpreter;
FIG. 11 is a pictorial representation of the process for activating an alternate dispatch table;
FIG. 12 is a diagram showing an arrangement of an entry from the dispatch table of FIG. 10;
FIG. 13 is diagram showing a typical arrangement of condition codes of a CISC architecture which implements condition codes;
FIG. 14 is a block diagram of an arrangement to determine evaluation routines for condition codes;
FIG. 15 is a block diagram of an arrangement to determine evaluation routines for current and previous values of condition codes;
FIGS. 16-18 are a series of diagrams useful in understanding how condition codes are handled in the run-time system of FIG. 3;
FIGS. 19 and 20 are diagrams showing relationship between address spaces;
FIG. 21 is a diagram of a context data structure used in the interpreter of FIG. 4;
FIG. 22 is a block diagram of a pair of data structures stored in memory which represents a return address stack for a non-native image of a program as well as shadow stack for a native image of the program;
FIG. 23 is a diagram showing the relationship between the data structures of FIG. 22 and execution of non-native and native routines with calls into corresponding non-native and native routines;
FIG. 24 is a diagram of a data structure including translated or native-image routines and call address translation table;
FIG. 25 is a diagram depicting the relationship of the routine call tables in the translated image and the shadow stack to the on-line and background systems;
FIG. 26 is a flow diagram of a typical application program instruction sequence used to illustrate aspects of the invention;
FIG. 27 is a block diagram showing an example of an object;
FIG. 28 is a block diagram showing an example of cross process calling of object methods;
FIG. 29 is a block diagram showing an example of an interface structure;
FIG. 30 is a flow chart showing an example of steps leading to the use of an object in an object oriented service system;
FIG. 31 is a flow chart showing steps in an example embodiment of a method for intercepting functions to perform interface structure replacement;
FIG. 32 is a flow chart showing an example replacement interface structure;
FIG. 33 shows an example embodiment of a template for a jacket function;
FIG. 34 is a flow chart showing steps performed in an example embodiment of a PBJA jacket function when called from non-native code;
FIG. 35 is a flow chart showing steps performed by an example embodiment of a PBJA jacket function when called from native code;
FIG. 36 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from native code;
FIG. 37 is a flow chart showing steps performed by an example embodiment of a PAJB jacket function when called from non-native code;
FIG. 38 is a block diagram showing an example of a system for load time processing to support interception of functions which take a pointer to an object as a parameter;
FIG. 39 is a flow chart showing an example of steps performed at run time to support interception of functions which take a pointer to an object as a parameter;
FIG. 40 is a flow chart showing an example embodiment of steps performed during general function jacketing;
FIG. 41 is a flow chart showing steps to determine and use translation units when performing a binary translation;
FIG. 41A is a flow chart showing steps to form translation units of a non-native binary image;
FIG. 42 is a flow chart showing steps of flow path determination;
FIG. 42A is a flow chart showing steps to determine transfer of control target locations for an indirect transfer instruction;
FIG. 43 is a block diagram showing two types of entries included in the profile statistics;
FIG. 44 is a flow chart showing steps for determining regions;
FIG. 45 is a block diagram of a list of code cells;
FIG. 46 is a diagram which shows the relationship between FIGS. 47 and 48;
FIGS. 47 and 48 are block diagrams which illustrate an arrangement of local data flow analysis information;
FIG. 49 is a block diagram of an opcode table;
FIG. 50 is a block diagram of a data flow analysis arrangement illustrating the use of read-modify and modify-write fields of the basic block value (BBV) data structure of FIG. 47;
FIG. 51 is a block diagram which depicts the BBSC summary information field of FIG. 48;
FIG. 52 is a block diagram of an arrangement comprising global data flow analysis information;
FIG. 53 is a more detailed block diagram of the global data flow connections of FIG. 52;
FIG. 54 is a block diagram of the control flow edge (CFE) data structure;
FIG. 55 is a flowchart that sets forth steps of performing a global data flow analysis;
FIGS. 56A and 56B are flowcharts that set forth method steps for determining merge points during global data flow analysis;
FIG. 57 is a block diagram of a global data flow analysis arrangement illustrating a merge point.
FIGS. 58A-58D are block diagrams depicting different variations of the binary image transformer;
FIG. 59 is a flow chart of steps of translating the binary image;
FIG. 60 is a flow chart of the step for one method for selecting the translation unit to be processed;
FIG. 60A is a representation of a call graph used in the method steps of FIG. 60;
FIG. 61 is a flow chart depicting an alternative method for selecting a translation unit to be processed;
FIG. 62A is a flow chart listing steps for forming an initial intermediate representation (IR) of a binary image;
FIG. 62B is a block diagram of a data structure illustrating a transformation of a source instruction to an IR with memory operands removed;
FIG. 62C is a block diagram of a data structure used to indicate whether an IR instruction corresponds to a machine instruction which can generate an exception;
FIG. 63 is a flow chart showing steps for translating and optimizing an initial IR to produce the final IR for a given translation unit;
FIG. 64 is a flow chart showing steps for performing condition code processing;
FIG. 65A is a block diagram of a bit mask associated with an IR instruction code cell used to represent condition codes that can be affected by the corresponding IR instruction code cell;
FIG. 65B is a block diagram which depicts an example transformation from source instructions comprising the first binary image as affected by condition code processing;
FIG. 66 is a flow chart depicting steps for register processing;
FIG. 67A is a block diagram which depicts a 32 bit register in an architecture which has partial register operands;
FIG. 67B is a block diagram which depicts a transformation of an initial IR as a result of register processing;
FIG. 68A is a block diagram which depicts a code pattern which is detected by early floating point optimization processing;
FIG. 68B is a block diagram which is a table indicating a replacement instruction for a specific code pattern detected in early floating point optimization processing;
FIG. 69 is a flow chart depicting steps for local basic block and global routine optimization processing;
FIG. 70 is a flow chart depicting steps of code selection and operand processing which place the IR in final form;
FIG. 70A is a flow chart depicting steps of intra image call processing;
FIG. 71A is a block diagram depicting a translated image comprising tables used in exception handling;
FIG. 71B is a block diagram depicting a table entry in a translator exception table; and
FIG. 71C is a block diagram depicting run time transfer of control when a translated image is executed and an exception occurs.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
Computer System
Referring now to FIG. 1, a computer system 10, is shown to include a processor module 12 which has a high performance processor 12a. The computer system 10 further includes, in addition to the processor module 12, a main memory 14, an disk adaptor 15 and an I/O user interface 18, as well as a monitor 19 all coupled by a system bus 13, as shown. Here the processor 12a is a high performance microprocessor such as an Alpha.RTM. microprocessor manufactured by Digital Equipment Corporation, assignee of the present invention, or other high performance processor.
The main memory 14 is comprised of dynamic random access memory and is used to store instructions and data for use by the microprocessor 12a on the processor module 12. The disk adaptor 15 is used to couple the system bus 20 to a disk bus which itself is coupled to disk storage device 17.
The disk storage device 17 is here illustratively partitioned into a plurality of segments or blocks of data which are here represented for convenience as being self-contained and contiguous, but which may be scattered across the disk 17 and be non-contiguous. The disk 17 includes a first storage segment 17a storing an operating system for the computer system 10 as well as an application program stored in segment 17b.
The application program stored in segment 17b is a non-native executable image. That is, the application program is comprised of instructions from a different instruction set than that used in the computer system 10 (i.e. a different computer architecture). Also the application program could have been written for a different operating system than that stored in 17a. Since the instructions provided in the program stored in segment 17b are different from the instruction set executed on the microprocessor 12a the program in segment 17b can not be directly executed on the system 10.
The disk also includes a storage segment 17c which here represents an native executable image of the application program stored in segment 17b. This native image is generated in the computer system via a binary image conversion system (16, FIG.
2) which is here stored with the operating system in the segment 17a as will be described. The image stored in segment 17c corresponds to instructions which can be executed on the microprocessor 12a and thus conforms to the architecture of the computer system 10.
Also stored in a segment 17d are profile statistics which are collected during execution of a portion of the non-native application program stored in 17b. The profile statistics are provided by execution of a run-time routine which converts non-native instructions into native instructions. These profile statistics are used in a background process to convert portions of the non-native image into a native image corresponding to the operation and function of those portions of the non-native application program. In addition, data which are used for the particular application program are also stored on the disk in segment 17e.
The computer system 10 further includes an I/O user interface 18 which is here an interface used to couple a mouse 19a, for example, to the system bus 20 as well as a monitor 19.
The computer system 10 operates in a generally conventional manner. That is, at "power on", selected portions (not numbered) of the operating system stored in segment 17a are loaded into main memory 14 and occupy a particular address space in main memory 14, such as, address space 14a. As a user of the computer system 10 executes application programs on the system 10, the application programs are run under the control of the operating system.
A typical operating system represented by that stored in 17a is the so-called Windows NT.RTM. operating system of Microsoft Corporation Redmond, Washington. In Windows NT.RTM. or other window type operating systems, displayable images called "icons" are presented to a user on the monitor 19. These icons represent an executable command to initiate execution of a program. When pointed to by a cursor controlled by a mouse, for example, and clicked on this user action activates the command and causes the represented computer program to execute.
Here, however, the application program stored in segment 17b is written in a non native instruction set. That is, the instruction set of the application program is not the same as the instruction set of the computer system 10. Thus, the executable image of the application program stored in segment 17b is comprised of non-native instructions which can not be directly executed on the computer system 10. Nevertheless, the non-native application has a corresponding icon (not shown) which is represented in the window provided by the operating system.
Each non-native application image has a unique identification name (ID) or image key. The identification name or image key is included in the non-native image file and is a unique identifier for the non-native application image. During installation of the file containing the image, typically a server process portion of the operating system determines the unique ID or key to the non-native application image. The ID number is generally assigned by concatenating together unique information of the file. Examples of the types of information include, the time stamp of the file, the file name, the file size and the date that the file was originally produced. Thus, the same non-native image if loaded a multiplicity of times on the computer system will have the same I.D. number. The statistics as well as the translated code associated with each one of the non-native images will be the union of all prior executions of the non-native images for each instance of the non-native application. Other arrangements are of course possible.
When the user clicks on the icon for the program stored in 17b, a portion of the operating system recognizes the ID of the executable image represented by that icon as being comprised of instructions that are non-native to the instruction set and architecture of computer system 10. In general a software module called a loader in the operating system will recognize that the identification name (ID) of the file represented by the selected icon as being non-native to the architecture. Thus, the operating system initiates the execution of an instruction conversion program 16 or feeds the file instruction by an instruction to an instruction pre-processor. Alternatively, a loader can be provided which handles the non-native image by examining the image to determine all files, libraries and resources needed by the image. The loader will thus prepare the non-native image for execution. Part of the preparation is the initiation of the instruction conversion program 16 or alternatively instruction pre-processor, as will now be described.
Binary Image Conversion System
Referring now to FIG. 2, the binary image conversion system 16 is shown to include a run-time system 32 which is responsive to instructions provided from the disk segment 17b. As mentioned, the run-time system 32 can be implemented as software to emulate the non-native architecture or as a hardware preprocessor to convert the non-native instructions into native instructions. When implemented as software, the run time system 32 consumes more disk space on disk 17 and occupies more main memory storage in main memory 14. Whereas, when implemented in hardware, the run time system 32 requires more chip space in the high performance microprocessor 12a. Here the run-time system will be described as a software implementation which operates in an execution address space 20 of the computer system 10.
As mentioned above, disk segment 17b stores instructions of an application program complied and/or written for an instruction set which is different from the instruction set of system 10. The run-time system 32 receives portions of a non-native executable image from segment 17b comprised of the non-native instructions. The run-time system 32 provides a native instruction or a native instruction routine comprised of a plurality of instructions which are executed by the computer system 10 to provide the same functionality as the non-native image. That is, the functionality called for in the instruction in the executable image of the non-native instruction set is equivalently provided by the routines determined by the run-time system 32. The run-time system executes the equivalent routines on the computer system 10. This provides the equivalent function to provide the same result in computer system 10 which implements the new architecture as would occur in a new or old computer system (not shown) implementing the non-native architecture.
In a preferred embodiment of the run time system 32, the run-time system 32 examines and tests the code from the segment 17b to determine what resources are used by the instruction and the function of the instruction. The run-time system 32
provides the equivalent instructions corresponding to the architecture of the computer system 10.
As the equivalent instructions are determined they are executed in the system 10 and profile data or statistics, as will be described, are collected in response to their execution. The profile statistics describe various execution characteristics of the instruction sequence. These profile data are fed to a server process 36 via a datapath 32b.
Prior to performing a conversion by the run time system 32, the run-time system 32 interrogates the server process 36 via a path 32a to determine from the server process whether there is a native image corresponding to the routine of the application program stored in segment 17b whose execution has just been requested by a user. If a native image does not exist (as would occur the first time the non-native image is executed), the run-time system initiates an interpretation process. If there is code in existence for the particular instruction reached in the application program, due to a prior execution in the run-time system and subsequent conversion by a background system, the run-time system 32 will request and execute the native code.
As mentioned, in general, the first time the application program 17b is executed by a user there will be no native image code in existence. As the program executes, however, native code will be generated by the background process in a manner to be described, and over time as substantial portions of the non-native image are executed, convertible portions of the non-native image will be converted by the background process into native image code. As native image code is generated, it is also stored in segment 17c in a manner that is transparent to the user.
In addition the native image file 17c contains an address correlation table which is used to track the segments of native code corresponding to segments of non-native code. This table is used at run time of the program in segment 17b to determine whether and which non-native segments have equivalent translated native segments.
Translation into the native image is provided via a background system 34 which operates in one embodiment after the interpreter has finished execution of the instructions to provide translated code dependant upon the execution characteristics of the run-time converted instructions. Alternatively, the background system operates while there is a pause in CPU utilization by the run-time system 32. Alternatively, the background system can make translated code available to the run-time system 32
during execution to permit substitution of translated code for a subsequent occurrence of the non-native image during the current execution of the application program. Further still, the run-time system can be implemented as a preprocessor which provides the profile statistics for use by the background process. The background process can be implemented in hardware or software or a combination of both.
The background system 34 receives the profile data generated by the run-time system 32. In accordance with the characteristics of the profile data, the background system 34 forms a native image of at least portions of the instructions of the non-native image stored in segment 17b of disk 17. A preferred arrangement is to have the background system implemented as a binary translator to produce translated code. The native image portions are stored in logical disk drive 17'for use if needed in subsequent executions of the application program from segment 17b. Here it should be understood that the logical disk drive 17' is a logical partition of the disk drive 17 and is here referred to as being a logical disk drive, because in general, it is transparent to the user, but it physically represents space storage such as segment 17c on the actual disk drive 17. Alternatively, the logical disk drive 17 could be a separate disk drive.
The run-time system 32 and the background system are each under the control of the server process 36. The server process 36 is present throughout the operation of the computer system 10. The server process 36 is a software service process which, amongst other things, is used to schedule various transactions within and between the run-time 32 and background systems 34.
After generation of native image code such as by the binary translator, the image translated code is stored on logical disk drive 17' in logical segment 17 with the profile statistics being stored in logical segment 17d. These locations correspond to segments 17c and 17d in FIG. 2.
Each time there is a new execution of the application program stored in segment 17b, the run-time system will send a request to the server process 36 for native code corresponding to the non-native code currently in the run-time system 32. The translated code is code which was generated by a previous execution of the background system 34 in accordance with the profile statistics collected by execution of the routines furnished by the run-time system 32. The server process 36 supplies corresponding translated code (if any) to the run-time system 32. If there is translated code, the run-time system 32 will have the translated code execute in place of interpreting the code. Otherwise if there is no translated code, the run-time system
32 will interpret, translate, or otherwise convert the relevant portions of the non-native code currently executed in the computer system 10.
As more code of the program stored in segment 17b is executed, more sections of the program are interpreted producing as a result of the execution, profile statistics which are fed to the server process 36.
The server process 36 controls inter alia the storage of the profile statistics. That is, the server process 36 will merge new (raw) statistics with previously stored merged statistics to provide a new merged profile. The server process will compare the new merged profile with the stored merger profile and will initiate a translation process in the background system 34 when there is a difference between the two statistics. The degree of difference needed to initiate execution is selectable. Such a difference indicates that heretofore never executed code was interpreted and executed in the run-time system. This process will be ongoing until all portions of the non-native image have been encountered by the user and all of the portions which can be translated by the background system 34 have been translated.
The server process also determines the unique key or I.D. number to uniquely identify the non-native image stored in segment 17b. As mentioned above, the attributes of the image comprising the I.D. include the file size, the date of creation of the image, the time stamp and so forth. This key is also used to identify the profile statistics with the non-native program.
The background system 34 will, in general, translate nearly all instructions provided from the non-native applications stored in 17b. Certain types of instructions are preferably not translated. In general those instructions which are not translated are ones in which the execution of the instruction is not predictable. For example, instructions which are self modifying (i.e. are not in read only sections, that is, are on a writtable page) will not be translated. For these instructions the run-time system will execute them via the interpretation routines. Further, instructions for which in the non-native architecture there is no easily produced analog in the native architecture will not be translated. For example, in the X86
architecture of Intel, floating point instructions use a floating point control register to determine inter. alia. rounding modes etc. Although for many executions of the instructions the contents of the register may be in a normal state, this can not be guaranteed. Rather than have the translator determine the state it is more economical to handle these instructions in the interpreter.
Since execution or profile statistics in part determines what code is translated by the background translator non-instruction code is not mistaken for instructions by the translator. Therefore, the translated code can be optimized without fear of optimizing non-instructions.
Referring now to FIG. 3, the run-time system 32 is shown to include an execution address space containing run-time system 32 which includes a run-time interpreter 44, a non-native loader 42 which is fed the ID corresponding to the non-native application image provided from segment 17b of the disk 17, a native image loader 43, native operating system dll's (dynamic link libraries) 45 and a return address stack management arrangement 210. The non-native loader 42 is similar to the native image loader 43 except it is capable of handling non-native images and interrogates the server process to determine whether there is any native code corresponding to the non-native code awaiting execution. The non-native loader 42 receives instructions corresponding to a non-native image of the application segment 46a and a native image of the application 46b corresponding to translated instructions provided from the background translator 34, and segment 46c corresponding to data. The non-native loader 42 is used to initially load the non-native file. The native loader 43 is used to initially load the native file if any.
Referring now also to FIG. 3A, at the initiation of an execution of the program stored in segment 17b, (via selection of the appropriate icon) (step 50a) the native loader 43 determines whether an architecture number associated with the non-native image is a native or a non-native image. If the image is a native image execution continues as normal. If however the image is a non-native image, the native loader 43 calls the non-native loader 42 at step 50b. The non-native loader 42
loads the non-native image at step 50c and also recognizes that this architecture number associated with the program represents an application program written for a non-native instruction set. The non-native loader starts the binary image conversion system 16. The non-native loader 42 initially queries the server 36 at step 50d to respond with native code to accelerate execution of the image represented by the code stored in 17b. It should be appreciated that the function of the native loader 43
and the non-native loader 42 can be combined into a single loader.
If this is the first time running the application, the server 36 responds at step 50e by indicating that there is no corresponding native image to execute in place of the non-native image. Therefore, the non-native loader 42 instructs the interpreter 44 to begin an interpretation at step 50f of the instructions from the non-native image. The interpreter 44, for each instruction, determines the length or number of bytes comprising the instruction, identifies the opcode portion of the instruction, and determines the resources needed by the instruction. The interpreter maps the non-native instruction to a native instruction or a native sequence of instructions based upon inter alia the opcode. These instructions are executed by the computer system 10 in the address space 20 (FIG. 3). The run-time interpreter 44 collects data resulting from the execution of the instructions as will be described in conjunction with FIG. 6. These "profile statistics" are stored by the server 36 on the logical disk drive 17'.
The run-time interpreter 44 examines and analyzes the instructions to determine the proper native instruction sequence to replace for the non-native instructions provided from the executable image 46a. These native instructions as they are executed continue to generate profile statistics which are collected and stored in logical disk drive storage 17c. This process continues until execution of the program 17b is terminated by the user.
After termination of the execution of the non-native program, a background process 34 is initiated (not shown). Alternatively, the background process 34 could be initiated to steal execution cycles from the run-time process 32 or alternatively could be used to substitute into the run-time process translated native image code for routines which are subsequently called during execution of the program 17b, as explained above. The exact sequence of which the background processor is used in conjunction with the run-time processor is an implementation detail.
For subsequent executions of the program the interpreter 44 will only provide interpreter code if the server process 36 does not return a native image equivalent of the sequence which is provided from the background process 34 as will be described.
Thus, if at step 50e the server responds with native code, the native image loader 42 at step 50g loads the native code. After the native image code is loaded, the non-native image loader 42 is called at step 50h to fix up the image. In general the non-native image will provide address tables corresponding to inter alia variables in the non-native image which are needed in the execution of the native image. That is, at step 50h the native and non-native images are stitched together to enable the native image to use information in the non-native image. At step 50i the native code is executed. In general, the native code that is executed corresponds to one or more basic blocks or routines of instruction which terminate by a return statement. After execution, a determination is made based upon characteristics of the return instruction execution and by use of a shadow stack as will be described, whether native image code can continue to be executed. If not then control is transferred to the interpreter. The interpreter continues to interpret and execute until it determines as at step 50k that it can resume using native code.
As also shown in FIG. 3, a jacketing routine 48 is used to jacket functions leaving the execution address space 20 to the native execution space of the computer process of computer system 10 as well as those arising from the native execution space of the computer processor 10 into the execution address space 20 as will be further described in conjunction with FIGS. 27-40.
Referring now to FIG. 4, a preferred embodiment of the background system 34 is shown under the control of the server 36 (FIG.2). The server 36 determines, responsive to the profile statistics data provided from the server 36, via logical disk drive 17', whether to initiate a translation process in the background. Preferably, the background system 34 translates only portions of the non-native instructions of the application program which were actually executed (via the interpreter 32) in responsive to a session invoking the program.
The non-native image code is examined at 52 in the server and if the code is the type that should be translated, it is fed to the translator 54. In a preferred environment, the translated code 54 is also fed to an optimizer 58, and again, if the type of code is of a type which can be optimized, it is fed through to the optimizer 58 or else, the process exits or terminates to await the submission of new code from executed portions of the non-native image stored at 17b. Other, techniques for performing translation and translation/optimization will be described. After the translator process 54 and/or the optimization processor 58, either translated code is stored in segment 17b' or optimized translated code is stored in segment 17b'.
Profile File Data Structure
Referring now to FIG. 5, a profile file data structure 60 used to store information gathered at execution time by instructions in the interpreter 34 is shown. The data structure 60 has records which contain information about the execution of a non-native architecture program when the program executes control transfer instructions. The profile record can include other information. That is, the profile records contain information about a target address encountered in the non-native image.
The data structure 60 is shown to include two principal sections. The first section is a profile header section 62 which comprises an image key field 62a. The image key field 62a is used to store information regarding the ID or identification of the profile record. The information in this field 62a is used to associate the profile statistics with a corresponding non-native image and its associated translated code, if any. Thus, the image key field 62a corresponds to the image ID or key field as mentioned above. The profile header 62 also includes a version field 63 comprised of a major version field 63a' and a minor version field 63b". The major version field 63a' and minor version field 63b" are each here 16 bit or 2 bytes in length and their union provides a resulting 32 bit version field 63. The version fields are used to keep track of which version of the interpreter was used to generate the profile statistics in the table and the profile file format.
The profile file 60 also includes a plurality of raw profile records, here 64.sub.a -64.sub.n Each of the profile records 64.sub.a -64.sub.n maintains information about run-time execution of control transfer instructions in the non-native image. Each of these records are variable length records as is each of the unique profile files 60. Thus, for each control transfer encountered during execution of the non-native image in the interpreter 34 a raw profile record is produced. The interpreter 34
will place into the raw profile record information regarding the execution of the control transfer instruction. The information which is included in the raw profile record is as described below. Suffice it here to say, however, that the raw profile records are used by the server process to provide a profile record which is then used during translation of the associated routines in the background system.
Referring now to FIG. 6, an exemplary one of the raw profile records here 64.sub.n is shown. The raw profile record 64.sub.n includes a profile record structure 66 including an address field 66a, a flag field 66b and a count field which tracks the number of indirect targets of control transfer 66c. The address field 66a contains the actual target address in the non-native image, as determined by the interpreter 44. This address is the actual target address of the instruction that caused a control transfer during execution of the non-native image. The address field 66a is generally the address length of the non-native architecture or here 32 bits or 4 bytes long. The flags field 66b contains the states of the flags at the target address. The flags field 66b is here 2 bytes or 16 bits long. The n_direct field 66c is a counter field which keeps track of the number of indirect target or computed target addresses contained in the remainder of the profile record 64.sub.n as will be described below.
There are additional optional fields 70 which comprise the record. One field is a count field 70a which corresponds to either the number of times a control transfer occurred to the address contained in field 66a or a count branch taken field counter which keeps track of the number of times a branch was taken by the instruction corresponding to the address contained in field 66a. Fields 70b.sub.0 -70b.sub.n correspond to addresses which are the targets of the control transfer and are cumulatively maintained in the profile record structure.
The optional fields 70 are used to keep track or maintain a count of the targets of the control transfer instruction in the image. The count field 70a is either a control transfer field count of the number of times control was transferred to the target address or a branch taken field corresponding to the count of the number of times a conditional control transfer of a branch instruction was taken. The type of field 70a is determined by the flags field 66b being "ANDED" or masked with a value which tests the state of the associated flag. This test determines whether the target address was a result a control transfer instruction or a branch instruction. This optional field is also a long word.
The target of control transfer fields 70b.sub.1 -70b.sub.n are the target addresses of the control transfer which occurred at the control transfer instruction. These fields keep track of the addresses for indirect transfers, that is, transfers to a run-time computed target address.
The profile statistics are managed by the server process 36. The profile statistics are collected by the interpreter 44 during the course of execution of the emulated code. For each execution the server 36 searches for a profile record corresponding the target address. The server 36 merges the new run-time statistics with the existing statistics to produce a new profile file.
The server 36 makes use of a software cache and hash table (not shown) to keep track of the profile records. For an address which needs to be looked up the address is looked up in the cache in 4 different locations that is by using a four way associative cache (not shown). If the address is not there it is looked up in a conventional hash table. The information in the hash table is the count values for the fields.
Run-time Interpreter
Details of an interpreter used to convert non-native instructions to native instructions and provide profile or run-time statistics will now be described. In particular the interpreter 44 interprets instructions of the so-called X86 architecture by Intel Corporation San Francisco, Calif.) into ALPHA instructions by Digital Equipment Corp. will be described.
Referring now to FIG. 7, an X86 instruction 100 is shown to include as many as six different fields. These fields are an opcode 100a, an rm byte 100b, a scaled index and base (sib) byte 100c, a displacement 100d, any immediate data 100e, and any one of six types of prefixes 100f.
The opcode 100a defines the task or operation which will be executed by the instruction 100. The rm byte 100b is an effective address specification and is used in conjunction with the opcode 100a to specify a general operand as a memory or register location and, in some cases, also participates in defining the operation. The sib byte 100c is used in conjunction with the rm byte 100b to provide additional flexibility in addressing memory locations. The displacement field 100d provides a displacement from the base register or from virtual zero of a segment. The immediate data field 100e provides immediate data to the opcode 100a.
The prefixes 100f are located before the opcode 100a in the instruction 100. Possible prefixes 100f are a segment override which implements a second (or multiple) addressing space, a repeat specifier value to repeat a specific instruction n times, a lock assertion for synchronization in multiple CPU environments, an address size prefix which selects between 16 and 32 bit addressing, an operand size prefix which selects between 16 and 32 bit operands, and an opcode prefix which selects an alternative opcode set.
From the opcode 100a it can be determined whether an rm byte 100b, an unconditional displacement, or the immediate data field is provided in the instruction 100. It can be determined from the rm byte 100b whether a sib byte 100c and/or a conditional displacement field 100d is included in the instruction 100. As all fields are not required by each Intel instruction 100, Intel instructions are not of a fixed length, but rather are of varying lengths.
The run-time interpreter 44 (FIG. 3) is, in the preferred embodiment, implemented on a computer system 10 (FIG. 1) which conforms to the Alpha architecture. An Alpha architecture computer system operates using the Alpha instruction set which is comprised of fixed length instructions. The run-time interpreter 44 operates on a single Intel instruction at a time. For each Intel instruction a single Alpha instruction or multiple Alpha instructions forming a corresponding Alpha routine, is provided which is an operational equivalent to the Intel instruction.
To transparently emulate the execution of an Intel or other non-native instruction 100 the run-time interpreter 44 should be capable of emulating the operation of the Intel or non-native memory, registers, condition codes and a program counter which, on a 32 bit Intel machine is referred to as an extended instruction pointer, EIP. In this way, a result of the execution of the instruction 100 is recorded accurately.
The run-time interpreter 44 uses the same memory space for data while executing Alpha routines corresponding to Intel instructions as is used when executing native Alpha instructions. This is possible because the strict standards to which Win32
software applications adhere allow for differences in calling conventions but not in the representation of the data. The maintenance of the Intel registers, condition codes and EIP are discussed below.
Referring now to FIG. 8, a table 101 depicting Intel or non-native values assigned to the registers of computer system 10 is shown to include eight registers which are assigned to emulate the operation of the eight Intel integer registers, EAX
104a, EBX 104b, ECX 104c, EDX 104d, EDI 104e, ESI 104f, EBP 104g, and ESP 104h. A single register, CONTEXT 105, is assigned to serve as a pointer to the emulator state context maintained in memory which is used to manage each thread executing in a multitasking environment. An additional register, FSP 106, stores a floating point stack pointer for addressing an eight entry stack of floating operands.
Three registers, CCR 107a, CCS 107b, and CCD 107c are assigned to store information which allow condition code bits to be maintained in an unevaluated state by the on-line interpreter 44. The SHADOW 108 register provides a pointer to the shadow stack (as will be described) which maintains activation records for translated code. The SEGOFF 109 register maintains an offset from address zero in the native architecture memory permitting the native architecture to emulate multiple addressing spaces which are possible in the Intel architecture and other non-native architectures. Four additional registers T0110a, T1110b, T2110c and T3110d are assigned as temporary registers.
The frame 112 register identifies the activation record at the most recent activation of the run-time interpreter 44. The Emulator's Return Address, ERA 114, register stores the return address when the run-time interpreter 44 calls a private sub-routine. The Effective Address, EA 116, register stores the result of evaluating an RM byte 100b and to specify a memory address to a memory access routine.
Seven of the remaining registers, NXTEIP 118a, NXTQ_LO 118b, NXTQ_HI 118c, NXTJMP 118d, Q0118e, Q1118f and QUAD 120 retain values which are used by the interpreter 44 to identify a complete Intel instruction 100 from the instruction stream and to provide pipelining capabilities.
To identify an Intel instruction 100, the run-time interpreter 44 assembles an eight byte (64-bit) snapshot of the instruction stream beginning at the start of the current Intel instruction number. This quadword is retained in QUAD 120.
To assemble QUAD 120, the run-time interpreter 44 captures two quadwords of information from the instruction stream. The run-time interpreter 44 uses the address in the instruction stream identified by the next extended instruction pointer, NXTEIP 118a, as the starting address for the first quadword. NXTEIP 118a identifies a random byte in the instruction stream at which the next instruction to be executed begins. Here, computer system 10 (FIG. 1) requires a quadword aligned address for this initial capture. Accordingly, if NXTEIP 118a is not a quadword aligned address, the three low order bits are first zeroed thus forcing the capture to occur beginning at a quadword boundary. The quadword captured beginning at this quadword aligned address is stored in register Q0118e. By executing the capture in this manner, the quadword stored in register Q0118e will at least provide the low byte of the next instruction.
The second quadword capture occurs at an address identified by NXTEIP 118a incremented by seven bytes. Here again, computer system 10 requires a quadword aligned address for this second capture. If the address identified by NXTEIP 118a incremented by seven bytes is not quadword aligned, the run-time interpreter 44 forces the three low order bits to zero thus forcing the address to be quadword aligned. From this quadword aligned address, the capture is performed and the quadword is stored in register Q1118f. Here, the quadword stored in register Q1118f contains at least the high order byte of the quadword beginning at the next instruction as identified by NXTEIP 118a.
To extract the low order bytes of the quadword beginning at NXTEIP 118a, the run-time interpreter 44 executes an instruction which, using the three low bits of NXTEIP 118a, determines a byte in register Q0118e which is identified by NXTEIP 118a, whether or not this byte is quadword aligned. The data in register Q0118e is copied to register NXTQ_LO 118band shifted right to locate the byte identified by NXTEIP 118a in the low order byte register NXTQ_LO 118b. The high order bytes of NXTQ_LO 118b which, after the shift, no longer contain valid information are zeroed.
The three low bits of the address identified by NXTEIP 118a incremented by seven bytes is used to determine the high order byte of the quadword beginning at NXTEIP 118a. Here, the data in register Q1118f is copied to register NXTQ_HI 118c shifted left to locate the byte identified by NXTEIP 118a incremented by seven bytes in the high order byte of register NXTQ_HI 118c. Here, the low order bytes of NXTQ_HI 118c which no longer contain valid information as a result of the shift are zeroed. The result of ORing the contents of registers NXTQ_L0118b and NXTQ_HI 118c is stored in QUAD 120.
Referring now to FIG. 9, the low bit of QUAD 120 is shown to be aligned with an Extended Instruction Pointer, EIP 121. In an Intel machine, the EIP 121 identifies a location in the instruction stream which corresponds to the beginning of the current instruction. As each instruction in the instruction stream is executed, the EIP 121 is incremented in the instruction stream to point to the beginning of the next instruction. QUAD 120, therefore, holds a quadword of information beginning at the byte identified by EIP 121.
To determine the operation of the Intel instruction 100 and a corresponding Alpha routine which performs the operational equivalent of the Intel instruction 100, the interpreter uses the information contained in QUAD 120. Typically, the first byte of an Intel instruction is the opcode 100a as shown in FIG. 7. The run-time interpreter 44 extracts the first and second low bytes 120a, 120b of QUAD 120 to provide a two byte instruction fragment 122. From this two byte instruction fragment 122
(FIG. 10), a corresponding Alpha routine and the length of the instruction 100 are determined.
Referring now to FIG. 10, an arrangement 130 to determine the length of the Intel instruction 100 and the corresponding Alpha routine which implements the operational equivalent of the Intel instruction 100, is shown. The arrangement 130
extracts the two low bytes 120a, 120b from QUAD 120 to provide the two byte instruction fragment 122. This two byte instruction fragment 122 is used as an index into a dispatch table 131 which resides in system memory 14 (FIG. 1).
The dispatch table 131 includes 2.sup.16 =64K (65536), 32 bit entries of which entry 131i is representative. Each entry corresponds to each instruction in a set of instructions available in the Intel instruction set. The contents of these 32
bit entries 131i include a field 131a containing an address at which the corresponding Alpha routine resides in system memory 14 as well as a field 131b containing the length of the instruction.
The dispatch table 131 is generated by a tool which identifies each instruction in the Intel instruction set such that the two byte instruction fragment 122 is sufficient information to identify the proper entry which corresponds to the current Intel instruction 100. The tool also provides the complete length of the Intel instruction 100 and includes this information in the dispatch table in the length field 131b along with the location of the Alpha routine which will provide the functional equivalent of the Intel instruction 100 in the address field 131a. The run-time interpreter 44 chooses among eight dispatch tables based upon the sequence of prefix elements 100f preceding the actual opcode 100a.
As discussed above in conjunction with FIG. 7, an Intel instruction 100 may be comprised of multiple elements 100a-100f. Multiple dispatch tables are provided by run-time interpreter 44 to handle the different values and combination of values which can be selected by the prefix element 100f. As discussed above, three possible prefix 100f are addressing size (16 or 32 bits), operand size (16 or 32 bits) and two byte opcode, which selects an alternative opcode set. Any one or combination of these prefixes 100f may be present in an Intel instruction 100.
The addressing size prefix toggles between an addressing size for the Intel system which truncates address arithmetic to 16 bits or to 32 bits. Typically, the address size is 32 bits. The operand size prefix is similar wherein an operand expected by the system is 16 bits under a 16 bit operand size or 32 bits when the operand size is set for 32 bits. Here again, the typical operand size is 32 bits. The final prefix toggles between two alternative opcode sets. The first is a one byte opcode set and the second is a two byte opcode set. Here, a one byte opcode set is typically selected. A dispatch table similar to the dispatch table 131 in FIG. 10 is provided in system memory 14 for each of the eight possible combinations of prefixes
100f, the default dispatch table is dispatch table 131 having a 1 byte opcode with a 32 bit addressing size and a 32 bit operand size.
In addition to an entry for each instruction, also included in dispatch table 131 is an entry for each prefix 100f and prefix 100f combination. The 32 bit entry 131j, corresponding to a prefix 100f, activates a different dispatch table in memory
14 in which the subsequent opcode 100a in the instruction stream and its corresponding two byte instruction fragment 122 may be used to index the proper 32 bit entry 131i.
Referring now to FIG. 11, a process for activating an alternate dispatch table 131' is shown to include extracting a two byte instruction fragment 122 from QUAD 120. The two byte instruction fragment 122 is used as an index into the dispatch table 131.
Here, the two byte instruction fragment 122 identifies an entry in the dispatch table 131j. The dispatch table entry 131j includes a native routine address 131a in memory 14 and the length 131b of the Intel instruction 100 which here, is 001 or one byte. The first byte of the two byte instruction fragment 122 is a prefix 100f to instruction 100 which selects 16 bit addressing. Accordingly, the native routine 132 identified by the native routine address 131a, instructs the run-time interpreter
44 to activate the dispatch table 131' which corresponds to an instruction set implementing 16 bit addressing.
The length 131b of the Intel instruction 100 is provided to the run-time interpreter 44 which increments EIP 121 (FIG. 9) one byte in QUAD 120 to identify the beginning of the next instruct on. A new two byte instruction fragment 122' is extracted from QUAD beginning at the new location identified by EIP 121. This two byte instruction fragment 122' identifies an entry 131i' in dispatch table 131'. Again, the two portions of the dispatch table entry 131i' identify the native routine address 131a' in memory 14 of the native routine 134 which is the operational equivalent of the Intel instruction 100 and the length 131b' of instruction 100.
The run-time interpreter 44 executes the native routine 134 which provides the operational equivalent of Intel instruction 100. Once complete, the on-line interpreter activates the default dispatch table 131 for 32 bit addressing and operands and one byte opcodes. While the run-time interpreter 44 is executing the native routine 134 for Intel instruction 100, the process just described allows the run-time interpreter 44 to identify the beginning of the subsequent instruction by incrementing EIP 121. In addition, the entry in the active dispatch table 131 which corresponds to the subsequent instruction is also identified. From this entry 131n, the address of the native routine 131na corresponding to the subsequent instruction as well as the length 131nb of the subsequent instruction are determined. This arrangement allows the on-line interpreter to operate in a pipelined fashion, executing multiple instructions in parallel.
Referring now to FIG. 12, a 32 bit entry 131i from dispatch table 131 is shown to be divided into two sections, the first section 131a corresponding to bits 3-31 of the 32 bit entry and the second section 131b corresponding to bits 0-2 of the 32
bit entry. Bits 3-31, section 131a are used to address the Alpha routines which execute the operational equivalent of the Intel instruction 100 and bits 0-2131b signify the length of the Intel instruction 100.
The dispatch table targets are aligned on quadword boundaries. That is, the Alpha instructions which the entries in the dispatch table 131 point to and execute the operational equivalent of Intel instruction 100, are located in system memory 14
on quadword boundaries. In this way, bits 0-2 of the address of the Alpha instructions are always zero. As a result, bits 0-2131b' may be used to convey additional information about the instruction as here, where these bits are used to signify the length of the instruction. As the addresses of the Alpha routines are always 000 in bits 0-2 field 131b', a full 32 bit address is recreated by appending these zeros to bits 3-311012a to provide a complete 32 bit address.
As control is passed to the Alpha routine identified by the 32 bit address, bits 0-2 are used to increment EIP 121 so that EIP 121 is pointing to the beginning of the next instruction. Here, if the length of the Intel instruction 100 is from 1-6
bytes in length, QUAD 120 contains sufficient information to form a second, two byte instruction fragment 122 which may be used to index the current dispatch table to determine the corresponding Alpha routine for the next Intel instruction. This arrangement allows the run-time interpreter 44 to pipeline instructions and thus execute the application program more quickly and efficiently. While an Alpha routine is being accessed corresponding to a current instruction, the run-time interpreter 44
is able to determine the address and length of the next Intel instruction 100 in the instruction stream. A value of zero returned from bits 0-2 field 131b of the 32 bit entry 131i for the length of the Intel instruction 100 however, indicates that the instruction was longer than 6 bytes and hence, pipelining is not possible for this Intel instruction and accordingly, the EIP 121 is not incremented. It is then the responsibility of the Alpha routine to increment EIP 121 and to refill the pipeline.
Condition Code Processing
Referring now to FIG. 13, general purpose registers 135 of an Intel X86 machine are shown to include a single register, EFLAGS 135a, in which condition codes are maintained. This register, EFLAGS 135a, maintains the six condition code bits, the Carry bit 136a (C), the Negative bit 136b (N), the Zero bit 136c (Z), the Overflow bit 136d(0), the Parity bit 136e (P), and the Low Nibble Carry bit 136f (A). Each of these bits may be cleared or set as a result of the execution of an Intel instruction
100. To completely emulate the operation of the Intel application, the run-time interpreter 44 also maintains, in an unevaluated state, the current state of the condition codes resulting from the execution of an Alpha routine which corresponds to the Intel instruction 100.
As is often the case in systems which maintain condition codes, a subsequent condition code modifying instruction may be executed, thus overwriting the changes made to the condition code bits by a prior condition code modifying instruction, before the state of the condition codes is required by a subsequent instruction. In addition, many of the condition code modifying instructions effect only a partial set of the condition code bits. Accordingly, a complete evaluation of the condition code bits after execution of every condition code modifying instruction would be wasteful at CPU time. Nevertheless, the state of the condition code bits needs to be readily ascertainable throughout the execution of the X86 image should the current state of the condition codes be required.
Referring now to FIG. 14, the run-time interpreter 44 is shown to include a set of data storage locations 138, a table of methods 139, and evaluation routines 140 which are used to emulate the X86 condition codes during execution of an X86 image in computer system 10.
The set of data storage locations 138 is shown to include three locations 138a, 138b, 138c which are updated upon execution of an instruction which would have modified the condition codes in an X86 system. The first location, data1138a, and the second location, data2138b, store data used in the execution of the instruction, for example, an operand and a result of the instruction. This information is used later during execution of the application program should it become necessary to evaluate the condition codes.
The third location, pointer 138c, contains a pointer to the table of methods 139 which is a dispatch table used to evaluate the condition codes should the system require the current value of the condition codes. The table of methods 139 contains an entry for each of the eight predicates available in X86 conditional branches (and equivalent SETcc instructions), an entry to obtain the nibble carry, A 136f, bit and an entry to obtain a complete image at the EFLAGS 135a register. The set of methods includes one for each of the six condition codes.
Each entry in the table of methods 139, identifies an evaluation routine 140 which evaluates the condition described in the method table entry. Data1138a and data2138b are provided to the evaluation routines to determine the state of the condition code bits should a subsequent instruction require the current state of the condition codes.
When an Alpha routine is executed for an Intel instruction which would have modified one or more of the condition codes, the run-time interpreter 44 stores zero to two pieces of information from the instruction in the first two storage locations, data1138a and data2138b. These pieces of information, possibly an operand and a result of the operation, are used by the evaluation routines to compute the condition codes. In the third storage location, pointer 138c, a pointer is placed which, in accordance with the type of instruction which was executed, identifies the entry in the table of methods 139 which will identify the evaluation routines 140 which are to be called if and when the condition codes are evaluated.
The table of methods 139 is specific to the type of instruction executed. That is, if the instruction modifies all of the condition codes, the table of methods includes an entry pointing to a routine for each of the six condition codes. if the instruction modifies only the C bit, the only entry in the table of methods 139 is a entry pointing to an evaluation routine which will evaluate the C bit. Other possibilities include instructions which modify all of the condition code bits except for the C bit (ALL_BUT_C) instructions which modify only the Z bit (ONLY_Z) and instructions which modify only the C and O bits (C_AND_O). The table of methods 139 for instructions of these-types would include entries pointing to routines which correspond to all but the C bit, only the Z bit and only the C and O bits respectively.
Each entry in the table of methods 139 identifies a separate evaluation routine 140 which computes that specific condition code predicate or image of EFLAGS 135. Because these routines are only executed when necessary, the condition codes are maintained in an unevaluated state and accordingly, only minimally effect the execution speed of the application. Data1138a and data2138b are provided to the evaluation routine 1024 to determine the effect the instruction had, or should have had, on the condition codes. Later, when a subsequent instruction is encountered by the run-time interpreter 44 which requires the current value of one or all of the condition code bits as input to the instruction, for example, as a condition in a conditional instruction, the run-time interpreter 44 uses the information provided in the data storage locations 138a and 138b, the table of methods 139 and the evaluation routines 140 to determine the current values of the condition code bits.
As discussed above, an Intel instruction can modify all condition code bits, or a subset of those bits. If the current instruction which modified the condition code bits modifies only the C bit and the previous instruction modified all of the condition code bits it would be wasteful to gather the data necessary to evaluate all but the C bit and copy it into the table of methods 139 which is provided for the current C bit modifying instruction. As a result, the run-time interpreter 44
maintains information to evaluate the previous state of the condition code bits based upon a previous condition code modifying instruction as well as the current condition code modifying instruction.
Referring now to FIG. 15, the interpreter is shown to include two sets of data storage locations 138 and 138', two corresponding tables of methods 139 and 139' and corresponding evaluation routines 140 and 140'. A first condition code evaluation grouping 137 corresponds to a current condition code modifying instruction and a second condition code evaluation grouping 137' corresponds to a previously executed condition code modifying instruction. Further, a finite state machine (FSM) is provided which determines how the previous and current states of the condition codes are maintained. The states and transitions of the FSM are the five types of condition code updates: ALL_BUT_C, ONLY_C, C_AND_O, ONLY_Z and ALL. Each transition has associated with it one of three actions: replace, push or resolve.
Provided below is a table, TABLE 1, which describes the action taken to maintain the condition code bits. The action is contingent upon which condition code bits the current instruction will modify as well as which condition code bits were modified by a previously executed condition code modifying instruction. In addition, the actions have been carefully selected to provide an action for the transition which entails a minimal amount of work yet still provides the run-time interpreter 44 a complete up-to-date set of condition code bits at any time.
In a replace action, the contents of the current condition code evaluation grouping are replaced by the values resulting from the next instruction. That is, the contents of the data storage locations 138, the corresponding table of methods 139
and the evaluation routines 140 are replaced with values which will enable the run-time interpreter 44 to evaluate the condition codes modified as a result of the next instruction. A replace action does not modify the contents of the previous condition code evaluation grouping. A replace action is appropriate when the set of condition code bits modified by the next condition code modifying instruction includes at least all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction.
A push action however, replaces the contents of the previous condition code evaluation grouping 137' with the contents of the current condition code evaluation grouping 137. The current condition code evaluation grouping 137 is used to provide the necessary information to evaluate the condition code bits modified by the next instruction. A push action is appropriate when the set of condition code bits modified by the next condition code modifying instruction does not include all of the condition code bits in the set of condition code bits modified by the most recent condition code modifying instruction. In addition, a union of the two condition code bit sets results in a complete set of condition code bits.
The final action is a resolve. The resolve is the most complicated of all the actions. In a resolve, the state of the condition codes, as represented by the current and previous condition code evaluation groupings 137 and 137', is evaluated resulting in a complete set of condition code bits, or an ALL, in the current condition code evaluation grouping 137. A push is then performed for the next instruction. A resolve action is appropriate when more than two condition code evaluation groupings would be necessary to maintain a complete set of condition code bits.
TABLE I Next CC Most Recent CC State State ALL_BUT_C ONLY_C C_AND_O ONLY_Z ALL ALL_BUT_C replace push push replace push ONLY_C push replace resolve resolve push C_AND_O push replace replace resolve push ONLY_Z resolve resolve resolve replace push ALL replace replace replace replace replace
As mentioned above, the first condition code evaluation grouping 137 maintains in an unevaluated state the state of the condition codes corresponding to the execution of a current instruction. The second condition code evaluation grouping 138
maintains in an unevaluated state the state of the condition codes corresponding to the execution of a previous instruction.
The first set of data storage locations 138 here, registers CCR 107a, CCS 107b and CCD 107c retain three values. CCR 107a and CCS 107b contain data used by the current, non-native instruction such as an operand and a result of the instruction. CCD 107c contains a pointer to the dispatch table 139 provided to evaluate the state of the condition codes which are modified as a result of the execution of the current instruction. The second set of data storage locations 138' retain similar values corresponding to a previous condition code modifying instruction.
Here, each condition code evaluation grouping 137, 137' is shown to include a location in the respective table of methods 139, 139' which indicates the category of instruction which was executed. That is, whether the instruction modifies all of the condition code bits or a subset of the condition code bits. Using this value and the information in the FSM of TABLE I, the run-time interpreter 44 maintains in an unevaluated state, the complete set of condition code bits.
To illustrate how this works, an example is provided in conjunction with FIG. 15, in which a current instruction modifies all of the condition code bits (ALL) and a next instruction modifies only the C bit (ONLY_C). In this simple example, the contents of the second condition code evaluation grouping 137', which provides the previous condition code state, is immaterial as will be shown.
As the current instruction modifies all of the condition code bits, the category location 139a of dispatch table 139 would indicate an ALL value. Accordingly, an entry for each of the six condition code bits is provided in dispatch table 139a to access evaluation routines 140 for each condition code bit.
When the corresponding Alpha routine for the next instruction is executed, the category location 139a of the current dispatch table is accessed to determine the category of the previous instruction. Using the category information provided and the information contained in TABLE 1 the run-time interpreter 44 manipulates the contents of each condition code evaluation grouping 137, 137' accordingly.
Here, the category of the most recently executed instruction is ALL while the category of the next instruction is ONLY_C. As shown in TABLE I, when the most recent condition code state is an ALL and the next instruction is an ONLY_C, the action which is to be taken is a push. Here, a push is an appropriate action because the set of bits modified by the next condition code modifying instruction, {C}, does not include all of the bits modified by the most recently executed condition code modifying instruction, {C, N, O, P, A}. Moreover, a union at the two condition code bit sets results in a complete set of condition code bits, {C, N, Z, O, P, A}.
The information retained in the current condition code evaluation grouping 137 is pushed or copied into the storage locations for the previous condition code evaluation grouping 137'. That is, the data in CCR 138a and CCS 138b are copied to pdata1138a' and pdata2138b' respectively and CCD 138c is copied to pptr 138c'. The current condition code evaluation grouping 137 is then used to store the data used to evaluate the C bit which is the only condition code bit modified by the next instruction. An example is provided below in conjunction with FIGS. 16 and 17 which describes a resolve action.
Referring now to FIG. 16, a set of condition code state diagrams 150 includes a condition code state 152 diagram for a previously executed condition code modifying instruction, a condition code state 154 diagram for a most recently executed condition code modifying instruction and a condition code state 156 diagram for a next condition code modifying instruction. Here, the previous condition code state 152 is ALL_BUT_C in which all but the C bit is modified. The most recent condition code state 154 is C_AND_O in which only the C and O bits are modified as a result of the execution of the most recently executed condition code modifying instruction. The next condition code state 156 is ONLY_C in which only the C bit is modified.
Referring back to TABLE 1, it may be seen that when the most recent state is C_AND_O and the next state is ONLY_C the appropriate action to be taken is a resolve action. It can be seen from FIG. 16 a replace action would not preserve the most recent state of the O bit as the current condition code state would be overwritten by information only capable of determining the C bit. A push however would lose the information necessary to determine the most recent values of the N, Z, P and A bits. As discussed above, more than two condition code evaluation groupings would be required to fully preserve the current states of each of the condition code bits. Accordingly, the information stored in the first and second condition code evaluation groupings 137, 137' is resolved resulting in a complete set of condition code bits.
Referring now to FIG. 17, the most recent condition code state 154' diagram is shown to contain a complete set of condition code bits. As a result of the resolve action, the most recent condition code state 154' is ALL and the next condition code state 156' is an ONLY_C. Referring again to TABLE 1, the appropriate action to be taken is a push when the most recent condition code state is ALL and the next condition code state is ONLY_C. Accordingly, the run-time interpreter 44 can push the condition code information resulting from execution of the next instruction without losing any condition code bit information.
Referring now to FIG. 18, the previous condition code state 152" diagram is shown to indicate a complete set of condition code bits which was pushed from the most recent condition code state 154' in FIG. 17. The most recent condition code state
154" diagram of FIG. 18 now indicates execution of a condition code modifying instruction which modified only the C bit. As may be seen, all information relating to the most current state of each of the condition code bits has been preserved.
Multiple Address Spaces
Referring now to FIG. 19, an implementation of multiple address spaces on an Intel machine is shown to include segments CS 160, DS 162, and SS 164 identifying address 0166 of a first address space 168 and segment FS 170 identifying address 0172
of a second address space 174. Data X 168i is located within the first address space 168 and data Y 174i is located within the second address space
It should be noted that the first address space 168 and the second address space 174 exist independently from each other. Accordingly, there is no relationship between the location identified by segments CS 160, DS 162, and SS 164 and segment FS
170. Nor is there any relationship between the address of the location of data X 168i in the first address space 168 and address of the location of data Y 174i in the second address space 174.
Referring now to FIG. 20, emulation of multiple address spaces on a native architecture is shown to include segments CS 160', DS 162', and SS 164' identifying address 0166' of a first address space 168' and segment FS 170' identifying address
0172' of a second address space 174' where segment FS 170' has an offset 175 from address 0166' of the first address space 168'. The value of the offset 175 is stored in SEGOFF 109 (FIG. 8).
Context Data Structure
Referring now to FIG. 21, a context data structure 180 which resides in memory is shown. The context data structure 180 is used by the on-line interpreter 44 to handle multitasking capabilities of the non-native software application. When, due to multitasking, an additional thread is executed during operation of the non-native software application, a snap-shot of the current state of the run-time interpreter 44 is saved in context data structure 180. The context data structure 180 is used by the new thread to provide the run-time interpreter 44 executing in the new thread the state of the run-time interpreter 44 executing in the thread which initialized the new thread.
Values which are saved in the context data structure 180 include the current condition code state in field 181. Thus, this field includes subfields (not shown) to provide copies of the values stored in registers CCR 138a, CCS 138b and CCD and
138c. Values are provided in field 182 to store the previous state of the condition code bits. The context data structure also includes copies of the integer registers EAX 104a, EBX 104b, ECS 104c, EDX 104d EDI 104e, ESI 104f, EBP 104g and ESP 104h in field 183.
In field 183 values for the six segments (seldomly used in WIN32 applications) are provided. The six segments, four of which are depicted in FIGS. 19 and 20 are cs, ds, es, fs, gs and ss. A copy of the floating stack pointer 106 (FIG. 8) is also provided in field 185 in addition to a starting value for the floating stack pointer as well as the floating stack entries.
Field 186 of the context data structure 180 provides pointers to each of the eight possible dispatch tables. Exemplary dispatch tables 131 and 131' are depicted in FIGS. 10 and 11. The context data structure 180 also provides in field 187 the Extended Instruction Pointer, EIP 121.
A repeat specifier value, as designated by one of the possible prefixes 100f (FIG. 7), is provided in field 188. Values relating to the Emulator Return Address, ERA 114, register are stored in field 189. In fields 190 and 191 pointers used to maintain the profile table as well as pointers to portable math routines are also provided respectively. Values of selected constants are also provided in the context data structure 180 in field 192 while pointers to maintain a linked list of context data structures is provided in field 193.
An additional aspect of a prefers d embodiment includes structuring the order of the software which implements the run-time interpreter 44 such that critical blocks of the software code exist in a single cache block. In this way, the run-time interpreter 44 is able to execute more efficiently as the portions of the interpreter 44 which are executed most often are resident in the cache.
Non-native Return Address Stack and Shadow Stack
Referring now to FIG. 22, a return address stack arrangement 210 is shown to include a non-native return address stack 211 and a shadow stack 212. The non-native return address stack 211 is an address stack which is produced as if the non-native image were executing in the non-native environment. The non-native return address stack 211 comprises a plurality of frames 219, each of said frames including a corresponding one of non-native return address fields 213a-213c, as well as fields 215a-215c for local storage, as shown. The non-native return address stored in locations 213a-213c corresponds to the routine return address that is pushed onto the stack by the program when it executes a call instruction. That is, the non-native program when executing in a native environment would place on the stack 211 a particular return address corresponding to the address space as if the non-native program was executing in its native environment.
As also mentioned, the return stack arrangement 210 also includes a shadow stack 212. The shadow stack 212 likewise is comprised of a plurality of frames 214, each of said frames 214 comprising a header field 216a-216c and corresponding or associated local storage fields 218a-218c.
The return address arrangement 210 also includes a pair of stack pointers, one for the non-native return stack 211 and one for the shadow stack 212. The non-native return address stack pointer 217 also referred to as SP points to the bottom or most recent entry in the non-native return address stack. Here the non-native return address stack 211 has an initial address A.sub.0 of <7FFFFFFF>. The initial address of <7FFFFFFF> insures that as the stack pointer SP is decremented, the largest stack pointer value will not be sign extended by an LDL instruction as will be described. Likewise, the shadow stack 212 has a stack pointer 221 referred to as SSP and has an initial address A.sub.0 =<0000000077FFFFFFF>.
The header portion 216a-216c of the shadow stack 214 here comprises four sub-fields. The first sub-field 220a also referred to as SP is the contents of the non-native stack pointer 217 corresponding to the return address in the non-native stack pointer for the particular shadow stack frame 214. Here the non-native stack pointer corresponds to the size of the emulated operating system. Thus, for a 32 bit operating system, the non-native stack pointer 220a would comprise four bytes.
The second entry 220b in the header 216a-216c is the non-native instruction pointer value 220b. The non-native instruction pointer is the address that is pushed onto the non-native return address stack 211. This address also comprises the same number of bytes as the number of bytes supported in the operating system. Thus, again for a 32 bit operating system, the number of bytes is 4.
The third entry 220c in the header portion 216a-216c is a native return address field 220c. The native return address field 220c comprises the native return address which is placed on the shadow stack if a translated routine executes a call instruction. This corresponds to the address of the native instruction which is to resume execution in the translated routine after the called routine has completed.
The fourth entry in the header 216a-216c is the native dynamic link 220d. The native dynamic link field is a pointer to the previous shadow frame header 214. Thus, in FIG. 22, the value stored in the field "dylnk" corresponds to the location of the next shadow frame header 216b. This value is preferably included in the shadow stack 212 to allow the shadow stack 212 to make provisions for a variable amount of local storage in fields 218a-218c. In situations where the local storage fields are not provided or their size is fixed, it is not necessary to have a dynamic link field.
The local storage fields 218a-218c in the non-native register stack 211 comprise routine calls and routine arguments of the non-native system and is provided to faithfully replicate that which would occur in the non-native system were it being executed on its native architecture. The routine locals and routine arguments stored in the non-native return stack are passed to translated routines via the translation process described above and as will be further described in detail below. In the shadow stack 212, however, provision is also provided for local storage in fields 218a-218c. For example, often when a compiler is used to compile a program, the actual instructions of the program use more logical registers than physically exist in the machine on which the program is to be executed. Accordingly, the compiler often provides temporary storage for logical register manipulations and uses the program stack to store these registers.
Non-native Return Stack and Shadow Stack Management
The non-native return address stack 211 is managed exactly as dictated by the non-native code being emulated in the interpreter 44. When the interpreter 44 is executing the non-native or non-native code of a particular thread, there is only one native frame on the shadow stack 212 for the interpreter. This permits the interpreter to transfer execution into translated code in the event that there is corresponding translated code to be executed. The interpreter does not push frames onto the shadow stack 212. Further, when transferring into and out of translated routines, the interpreter does not push data onto the native system stack. Rather, when transferring into and out of translated routines, shadow frames 214 are pushed onto the shadow stack 212 to record the state associated with the translated routines.
The shadow stack 212 tends to be synchronous with the routine frames on the non-native return stack. Although calling jackets (48FIG. 3) may cause another instance of the interpreter 44 to be produced if a callback is performed, and thus push another interpreter frame onto the non-native return address stack 211, once the jacketed operation has been completed this extra frame is removed from the non-native or non-native stack 211.
With a translated routine, however, a shadow frame 214 is pushed onto the shadow stack 212 each time a translated routine is called. The shadow frame 214 includes the space necessary for the translated routine's locals such as the spilled registers mentioned above, and the shadow frame header.
Referring now to FIG. 23, an example of the operation of the shadow stack 212 is shown. The program 230 includes a routine A which has a plurality of instructions, one of which is a call to a routine B (call B) at 233. Routine B, likewise, has a plurality of instructions with